Discussion:
gem5 versus MARSS
(too old to reply)
Payne, Benjamin
2012-10-22 21:06:30 UTC
Permalink
Hello,

What is the difference between gem5
http://gem5.org/Main_Page
and MARSS (Micro-ARchitectural and System Simulator for x86-based Systems)
http://marss86.org/~marss86/index.php/Home

As far as I can tell,
-gem5 can support Alpha, ARM, SPARC, and x86 instruction set architectures, whereas MARSS is only for x86.
-gem5 can be integrated into Structural Simulation Toolkit, whereas MARSS has not been
-both gem5 and MARSS can simulate multiple cores
-both gem5 and MARSS can use DRAMSim2

Please correct me if any of these statements are incorrect.

Are there any other considerations?

Thank you,


Ben Payne
http://mst.edu/~bhpxc9/
Suite 450, Room S452
5520 Research Park Drive
Catonsville, MD 21228-4870
Laboratory for Physical Sciences
http://www.lps.umd.edu/
office: 443-654-7890
cell: 608-308-2413
Andreas Hansson
2012-10-22 22:56:30 UTC
Permalink
Hi Benjamin,

The list is long…gem5 has (amongst other things):

a variety of CPU models that are orthogonal to the ISA, atomic for speed, in order and O3 for details uarch models

BSD license (thus both academia and companies involved and contributing)

full-system ready-to-run Android disk images and configurations, not just your average chip-multi-processor, but also heterogeneous application-processor-like systems with state-of-the-art CPU models

a very active (and large) user community


Ultimately using one or the other really depends on what problem it is you want to address.

Andreas

From: <Payne>, Benjamin <***@lps.umd.edu<mailto:***@lps.umd.edu>>
Reply-To: gem5 users mailing list <gem5-***@gem5.org<mailto:gem5-***@gem5.org>>
Date: Monday, 22 October 2012 22:06
To: "gem5-***@gem5.org<mailto:gem5-***@gem5.org>" <gem5-***@gem5.org<mailto:gem5-***@gem5.org>>
Subject: [gem5-users] gem5 versus MARSS

Hello,

What is the difference between gem5
http://gem5.org/Main_Page
and MARSS (Micro-ARchitectural and System Simulator for x86-based Systems)
http://marss86.org/~marss86/index.php/Home

As far as I can tell,
-gem5 can support Alpha, ARM, SPARC, and x86 instruction set architectures, whereas MARSS is only for x86.
-gem5 can be integrated into Structural Simulation Toolkit, whereas MARSS has not been
-both gem5 and MARSS can simulate multiple cores
-both gem5 and MARSS can use DRAMSim2

Please correct me if any of these statements are incorrect.

Are there any other considerations?

Thank you,


Ben Payne
http://mst.edu/~bhpxc9/
Suite 450, Room S452
5520 Research Park Drive
Catonsville, MD 21228-4870
Laboratory for Physical Sciences
http://www.lps.umd.edu/
office: 443-654-7890
cell: 608-308-2413


-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Hamid Reza Khaleghzadeh
2012-10-23 13:30:57 UTC
Permalink
I have a question about MARSS. As you know GEM5 simulation speed with ruby
module is very slow. May I know MARSS simulation speed?

Thanks
Post by Andreas Hansson
Hi Benjamin,
a variety of CPU models that are orthogonal to the ISA, atomic for speed,
in order and O3 for details uarch models
BSD license (thus both academia and companies involved and contributing)
full-system ready-to-run Android disk images and configurations, not just
your average chip-multi-processor, but also heterogeneous
application-processor-like systems with state-of-the-art CPU models
a very active (and large) user community
Ultimately using one or the other really depends on what problem it is you want to address.
Andreas
Date: Monday, 22 October 2012 22:06
Subject: [gem5-users] gem5 versus MARSS
Hello,
What is the difference between gem5
http://gem5.org/Main_Page
and MARSS (Micro-ARchitectural and System Simulator for x86-based Systems)
http://marss86.org/~marss86/index.php/Home
As far as I can tell,
-gem5 can support Alpha, ARM, SPARC, and x86 instruction set
architectures, whereas MARSS is only for x86.
-gem5 can be integrated into Structural Simulation Toolkit, whereas MARSS has not been
-both gem5 and MARSS can simulate multiple cores
-both gem5 and MARSS can use DRAMSim2
Please correct me if any of these statements are incorrect.
Are there any other considerations?
Thank you,
Ben Payne
http://mst.edu/~bhpxc9/
Suite 450, Room S452
5520 Research Park Drive
Catonsville, MD 21228-4870
Laboratory for Physical Sciences
http://www.lps.umd.edu/
office: 443-654-7890
cell: 608-308-2413
-- IMPORTANT NOTICE: The contents of this email and any attachments are
confidential and may also be privileged. If you are not the intended
recipient, please notify the sender immediately and do not disclose the
contents to any other person, use it for any purpose, or store or copy the
information in any medium. Thank you.
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
--
Hamid Reza Khaleghzadeh
http://hkhaleghzadeh.webs.com
Payne, Benjamin
2012-10-23 13:57:12 UTC
Permalink
Hello,

I'm not familiar with what you are referring to by the ruby module - is that an addon for Gem5?

You have a good question, but how would I quantify the difference in simulation speeds between MARSS and Gem5? Is there an established benchmark to run?

Kindly,


Ben Payne

From: gem5-users-***@gem5.org [mailto:gem5-users-***@gem5.org] On Behalf Of Hamid Reza Khaleghzadeh
Sent: Tuesday, October 23, 2012 9:31 AM
To: gem5 users mailing list
Subject: Re: [gem5-users] gem5 versus MARSS

I have a question about MARSS. As you know GEM5 simulation speed with ruby module is very slow. May I know MARSS simulation speed?

Thanks
On Tue, Oct 23, 2012 at 2:26 AM, Andreas Hansson <***@arm.com<mailto:***@arm.com>> wrote:
Hi Benjamin,

The list is long...gem5 has (amongst other things):

a variety of CPU models that are orthogonal to the ISA, atomic for speed, in order and O3 for details uarch models

BSD license (thus both academia and companies involved and contributing)

full-system ready-to-run Android disk images and configurations, not just your average chip-multi-processor, but also heterogeneous application-processor-like systems with state-of-the-art CPU models

a very active (and large) user community


Ultimately using one or the other really depends on what problem it is you want to address.

Andreas

From: <Payne>, Benjamin <***@lps.umd.edu<mailto:***@lps.umd.edu><mailto:***@lps.umd.edu<mailto:***@lps.umd.edu>>>
Reply-To: gem5 users mailing list <gem5-***@gem5.org<mailto:gem5-***@gem5.org><mailto:gem5-***@gem5.org<mailto:gem5-***@gem5.org>>>
Date: Monday, 22 October 2012 22:06
To: "gem5-***@gem5.org<mailto:gem5-***@gem5.org><mailto:gem5-***@gem5.org<mailto:gem5-***@gem5.org>>" <gem5-***@gem5.org<mailto:gem5-***@gem5.org><mailto:gem5-***@gem5.org<mailto:gem5-***@gem5.org>>>
Subject: [gem5-users] gem5 versus MARSS

Hello,

What is the difference between gem5
http://gem5.org/Main_Page
and MARSS (Micro-ARchitectural and System Simulator for x86-based Systems)
http://marss86.org/~marss86/index.php/Home<http://marss86.org/%7Emarss86/index.php/Home>

As far as I can tell,
-gem5 can support Alpha, ARM, SPARC, and x86 instruction set architectures, whereas MARSS is only for x86.
-gem5 can be integrated into Structural Simulation Toolkit, whereas MARSS has not been
-both gem5 and MARSS can simulate multiple cores
-both gem5 and MARSS can use DRAMSim2

Please correct me if any of these statements are incorrect.

Are there any other considerations?

Thank you,


Ben Payne
http://mst.edu/~bhpxc9/<http://mst.edu/%7Ebhpxc9/>
Suite 450, Room S452
5520 Research Park Drive
Catonsville, MD 21228-4870
Laboratory for Physical Sciences
http://www.lps.umd.edu/
office: 443-654-7890
cell: 608-308-2413

-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

_______________________________________________
gem5-users mailing list
gem5-***@gem5.org<mailto:gem5-***@gem5.org>
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users



--
Hamid Reza Khaleghzadeh
http://hkhaleghzadeh.webs.com
Hamid Reza Khaleghzadeh
2012-10-23 14:25:33 UTC
Permalink
Thanks for your answer. Ruby is a module in GEM5 which simulate memory
hierarchy. Suppose there is an application that its execution time is 20 ms
on a real system. GEM5 simulate the application in about 15 min. Hos is
MARSS86 simulation speed?
Hello,****
** **
I’m not familiar with what you are referring to by the ruby module – is
that an addon for Gem5?****
** **
You have a good question, but how would I quantify the difference in
simulation speeds between MARSS and Gem5? Is there an established benchmark
to run?****
** **
Kindly,****
** **
** **
Ben Payne****
** **
Behalf Of *Hamid Reza Khaleghzadeh
*Sent:* Tuesday, October 23, 2012 9:31 AM
*To:* gem5 users mailing list
*Subject:* Re: [gem5-users] gem5 versus MARSS****
** **
I have a question about MARSS. As you know GEM5 simulation speed with ruby
module is very slow. May I know MARSS simulation speed?
Thanks ****
wrote:****
Hi Benjamin,
a variety of CPU models that are orthogonal to the ISA, atomic for speed,
in order and O3 for details uarch models
BSD license (thus both academia and companies involved and contributing)
full-system ready-to-run Android disk images and configurations, not just
your average chip-multi-processor, but also heterogeneous
application-processor-like systems with state-of-the-art CPU models
a very active (and large) user community
Ultimately using one or the other really depends on what problem it is you want to address.
Andreas
Date: Monday, 22 October 2012 22:06
Subject: [gem5-users] gem5 versus MARSS****
Hello,
What is the difference between gem5
http://gem5.org/Main_Page
and MARSS (Micro-ARchitectural and System Simulator for x86-based Systems)
http://marss86.org/~marss86/index.php/Home
As far as I can tell,
-gem5 can support Alpha, ARM, SPARC, and x86 instruction set
architectures, whereas MARSS is only for x86.
-gem5 can be integrated into Structural Simulation Toolkit, whereas MARSS has not been
-both gem5 and MARSS can simulate multiple cores
-both gem5 and MARSS can use DRAMSim2
Please correct me if any of these statements are incorrect.
Are there any other considerations?
Thank you,
Ben Payne
http://mst.edu/~bhpxc9/
Suite 450, Room S452
5520 Research Park Drive
Catonsville, MD 21228-4870
Laboratory for Physical Sciences
http://www.lps.umd.edu/
office: 443-654-7890
cell: 608-308-2413
****
-- IMPORTANT NOTICE: The contents of this email and any attachments are
confidential and may also be privileged. If you are not the intended
recipient, please notify the sender immediately and do not disclose the
contents to any other person, use it for any purpose, or store or copy the
information in any medium. Thank you.
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users****
--
Hamid Reza Khaleghzadeh
http://hkhaleghzadeh.webs.com
****
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
--
Hamid Reza Khaleghzadeh
http://hkhaleghzadeh.webs.com
Payne, Benjamin
2012-10-24 14:46:39 UTC
Permalink
Prompted by Hamid's question about simulation speed comparison with MARSS, I wrote a small benchmark (see bottom of this email), then compiled and ran it within the gem5 full system emulation using the disk image
http://www.gem5.org/dist/current/arm/arm-system-2011-08.tar.bz2
The gem5 configuration is with all the defaults,
build/ARM/gem5.opt configs/example/fs.py --disk-image=/home/bpayne/full_system_for_gem5/disks/arm-ubuntu-natty-headless.img

The boot time for full simulation mode (how long until I'm at the login terminal via telnet) is 23 minutes.

In full simulation mode, I see the following output (my binary is called "a.out")

***@gem5sim:~# date; time ./a.out; date
date; time ./a.out; date
Wed Dec 31 20:49:26 CST 1969
CPU time= 0.210000 seconds
real 0m0.216s
user 0m0.060s
sys 0m0.150s
Wed Dec 31 20:49:27 CST 1969
***@gem5sim:~#

The wall clock time (how long I wait for the simulated system) is about 4 minutes. Thus the slowdown is a factor of (4*60)/.2=1200, which is consistent with previous runs I've done.

Next I ran the same code in syscall emulation mode, cross compiled using Linaro for ARM. This took 168 seconds of wall clock time and 0.07 seconds of simulated time, a ratio of (2*60+48)/0.07=2400 [twice as fast as full system emulation!]. I repeated the same measure with bench.c cross-compiled for ARM using Mentor Graphics Sourcery Tools. The syscall emulation took 162 wall clock seconds and 0.06 simulation seconds, a ratio of 2700. [These numbers may be somewhat inaccurate due to the low simulation time.] Below is how I captured the times in syscall emulation mode.

***@bpayne-VirtualBox64:~/gem5$ date; time build/ARM/gem5.opt configs/example/se.py -c tests/test-progs/bens_benchmark/bin/arm/bench_linaro.lex ; date
Wed Oct 24 08:24:13 EDT 2012
gem5 Simulator System. http://gem5.org
gem5 is copyrighted software; use the --copyright option for details.
gem5 compiled Oct 16 2012 13:57:10
gem5 started Oct 24 2012 08:24:13
gem5 executing on bpayne-VirtualBox64
command line: build/ARM/gem5.opt configs/example/se.py -c tests/test-progs/bens_benchmark/bin/arm/readwrite_linaro.lex
Global frequency set at 1000000000000 ticks per second
0: system.remote_gdb.listener: listening for remote gdb #0 on port 7000
**** REAL SIMULATION ****
info: Entering event queue @ 0. Starting simulation...
CPU time= 0.070000 seconds
hack: be nice to actually delete the event here
Exiting @ tick 73563381000 because target called exit()
real 2m48.492s
user 1m47.319s
sys 0m2.792s
Wed Oct 24 08:27:01 EDT 2012
***@bpayne-VirtualBox64:~/gem5$

**************************************

Next I ran the same bench.c code in MARSS using the system image
http://bertha.cs.binghamton.edu/downloads/ubuntu-natty.tar.bz2

The boot time for full simulation mode of MARSS (how long until I'm at the login terminal via VNC) is 42 seconds (33 times faster than gem5).
I compiled a static binary of bench.c and ran it in MARSS:

***@ubuntu:~# date; time ./bench.lex ; date
Wed Oct 24 14:38:25 UTC 2012
CPU time= 8.66 seconds
real 0m8.752s
user 0m1.200s
sys 0m7.490s
Wed Oct 24 14:38:34 UTC 2012
***@ubuntu:~#

The wall clock time for this simulation is roughly 9 seconds. The CPUs are different, so it doesn't make sense to compare MARSS's 8.66 seconds to gem5's 0.07 seconds. What is relevant is the slowdown factor -- 1 for MARSS, between 1200 and 2700 for gem5.

**************************************

All of these timings were carried out in Ubuntu 12.04 64bit running in a single CPU VirtualBox, hosted by Ubuntu 12.04 64bit with Intel Core i7 930 @ 2.80 Ghz. The host system has 6GB of RAM, and the VirtualBox has 2GB.

"bench.c" is a program to load the CPU and file I/O

/* benchmark
* 20121018
* Ben Payne
* load CPU and file I/O
*/

#include <stdio.h>
#include <time.h>
main()
{
int number_of_computes;
int number_of_read_writes;
int number_of_iterations;
int iteration_indx;
int read_write_indx;
int compute_indx;
int valu;
int temp_read;
clock_t time_start, time_end;
double cpuTime;
FILE *outfile;
FILE *infile;
time_start = clock();
number_of_computes=500;
number_of_read_writes=100;
number_of_iterations=100;

for (iteration_indx = 1; iteration_indx <= number_of_iterations ; iteration_indx++)
{
for (read_write_indx = 1; read_write_indx <= number_of_read_writes ; read_write_indx++)
{
outfile = fopen("out.dat","a+"); /* apend file (add text to a file or create a file if it does not exist.*/
fprintf(outfile,"%u\n",read_write_indx); /*writes*/
fclose(outfile);
for (compute_indx = 1; compute_indx <= number_of_computes ; compute_indx++)
{
valu=(compute_indx+1)*23;
}
infile = fopen("out.dat","r");
fscanf(infile,"%d",&temp_read);
fclose(infile);
}
}
time_end = clock();
cpuTime= ((double)(time_end-time_start))/ (CLOCKS_PER_SEC);
printf("CPU time= %f seconds\n",cpuTime);
return 0;
}




From: gem5-users-***@gem5.org [mailto:gem5-users-***@gem5.org] On Behalf Of Hamid Reza Khaleghzadeh
Sent: Tuesday, October 23, 2012 10:26 AM
To: gem5 users mailing list
Subject: Re: [gem5-users] gem5 versus MARSS

Thanks for your answer. Ruby is a module in GEM5 which simulate memory hierarchy. Suppose there is an application that its execution time is 20 ms on a real system. GEM5 simulate the application in about 15 min. Hos is MARSS86 simulation speed?
On Tue, Oct 23, 2012 at 5:27 PM, Payne, Benjamin <***@lps.umd.edu> wrote:
Hello,
 
I'm not familiar with what you are referring to by the ruby module - is that an addon for Gem5?
 
You have a good question, but how would I quantify the difference in simulation speeds between MARSS and Gem5? Is there an established benchmark to run?
 
Kindly,
 
 
Ben Payne
 
From: gem5-users-***@gem5.org [mailto:gem5-users-***@gem5.org] On Behalf Of Hamid Reza Khaleghzadeh
Sent: Tuesday, October 23, 2012 9:31 AM
To: gem5 users mailing list
Subject: Re: [gem5-users] gem5 versus MARSS
 
I have a question about MARSS. As you know GEM5 simulation speed with ruby module is very slow. May I know MARSS simulation speed?

Thanks
On Tue, Oct 23, 2012 at 2:26 AM, Andreas Hansson <***@arm.com> wrote:
Hi Benjamin,

The list is long.gem5 has (amongst other things):

a variety of CPU models that are orthogonal to the ISA, atomic for speed, in order and O3 for details uarch models

BSD license (thus both academia and companies involved and contributing)

full-system ready-to-run Android disk images and configurations, not just your average chip-multi-processor, but also heterogeneous application-processor-like systems with state-of-the-art CPU models

a very active (and large) user community


Ultimately using one or the other really depends on what problem it is you want to address.

Andreas

From: <Payne>, Benjamin <***@lps.umd.edu<mailto:***@lps.umd.edu>>
Reply-To: gem5 users mailing list <gem5-***@gem5.org<mailto:gem5-***@gem5.org>>
Date: Monday, 22 October 2012 22:06
To: "gem5-***@gem5.org<mailto:gem5-***@gem5.org>" <gem5-***@gem5.org<mailto:gem5-***@gem5.org>>
Subject: [gem5-users] gem5 versus MARSS

Hello,

What is the difference between gem5
http://gem5.org/Main_Page
and MARSS (Micro-ARchitectural and System Simulator for x86-based Systems)
http://marss86.org/~marss86/index.php/Home

As far as I can tell,
-gem5 can support Alpha, ARM, SPARC, and x86 instruction set architectures, whereas MARSS is only for x86.
-gem5 can be integrated into Structural Simulation Toolkit, whereas MARSS has not been
-both gem5 and MARSS can simulate multiple cores
-both gem5 and MARSS can use DRAMSim2

Please correct me if any of these statements are incorrect.

Are there any other considerations?

Thank you,


Ben Payne
http://mst.edu/~bhpxc9/
Suite 450, Room S452
5520 Research Park Drive
Catonsville, MD 21228-4870
Laboratory for Physical Sciences
http://www.lps.umd.edu/
office: 443-654-7890
cell: 608-308-2413
-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

_______________________________________________
gem5-users mailing list
gem5-***@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
--
Hamid Reza Khaleghzadeh
http://hkhaleghzadeh.webs.com


_______________________________________________
gem5-users mailing list
gem5-***@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
--
Hamid Reza Khaleghzadeh
http://hkhaleghzadeh.webs.com
Steve Reinhardt
2012-10-24 15:18:14 UTC
Permalink
Thanks for the benchmarking effort, Ben. These are interesting numbers,
but before people read too much into them I thought I'd throw out some
caveats:

- A much better way to measure slowdown is to compare the execution time in
the simulator with the execution time on a real system. The reported
simulated runtime (i.e., what you're getting from running 'time' in the
simulator itself) reflects whatever configuration you're modeling, which
may or may not be realistic (and if you're not running a detailed timing
model, it's unlikely to be realistic). That is, the wall-clock simulation
runtime is going to be the same whether I configure the simulated CPU to
run at a simulated 2 GHz or 2 kHz, but the slowdown/speedup as you've
calculated it would be very different.

- OS boot speed is a useful number to have, but not a representative
workload for looking at typical simulation jobs. Generally when people use
FS mode in gem5 they boot the OS once, take a checkpoint, and run their
simulations from there. Also, though I haven't run FS mode myself
recently, 23 minutes sounds extremely slow; my recollection is that boot is
pretty fast (just a few minutes). Part of that is also that we typically
boot a more stripped-down image and not a full install (which is typically
unnecessary for benchmarking). Also, there are delay loops that we skip
that might not be properly skipped if you're using a different kernel image.

- You need to make sure that the level of detail of the simulation model is
the same in both cases, and probably do a comparison at multiple levels
(e.g., fast functional simulation vs. detailed out-of-order CPU and caches).

I don't mean to sound overly critical or like I'm making excuses... I
expect MARSS probably is faster than gem5, particularly for fast functional
simulation, because they seem to focus a lot on speed while we focus more
on flexibility and modularity. (Though there has been some work on using
KVM to provide extremely fast functional modeling for gem5, which should
make up a lot of the difference for that mode of operation.) I just want
to make sure that the comparisons are fair and meaningful.

Thanks,

Steve
Post by Payne, Benjamin
Prompted by Hamid's question about simulation speed comparison with MARSS,
I wrote a small benchmark (see bottom of this email), then compiled and ran
it within the gem5 full system emulation using the disk image
http://www.gem5.org/dist/current/arm/arm-system-2011-08.tar.bz2
The gem5 configuration is with all the defaults,
build/ARM/gem5.opt configs/example/fs.py
--disk-image=/home/bpayne/full_system_for_gem5/disks/arm-ubuntu-natty-headless.img
The boot time for full simulation mode (how long until I'm at the login
terminal via telnet) is 23 minutes.
In full simulation mode, I see the following output (my binary is called "a.out")
date; time ./a.out; date
Wed Dec 31 20:49:26 CST 1969
CPU time= 0.210000 seconds
real 0m0.216s
user 0m0.060s
sys 0m0.150s
Wed Dec 31 20:49:27 CST 1969
The wall clock time (how long I wait for the simulated system) is about 4
minutes. Thus the slowdown is a factor of (4*60)/.2=1200, which is
consistent with previous runs I've done.
Next I ran the same code in syscall emulation mode, cross compiled using
Linaro for ARM. This took 168 seconds of wall clock time and 0.07 seconds
of simulated time, a ratio of (2*60+48)/0.07=2400 [twice as fast as full
system emulation!]. I repeated the same measure with bench.c cross-compiled
for ARM using Mentor Graphics Sourcery Tools. The syscall emulation took
162 wall clock seconds and 0.06 simulation seconds, a ratio of 2700. [These
numbers may be somewhat inaccurate due to the low simulation time.] Below
is how I captured the times in syscall emulation mode.
configs/example/se.py -c
tests/test-progs/bens_benchmark/bin/arm/bench_linaro.lex ; date
Wed Oct 24 08:24:13 EDT 2012
gem5 Simulator System. http://gem5.org
gem5 is copyrighted software; use the --copyright option for details.
gem5 compiled Oct 16 2012 13:57:10
gem5 started Oct 24 2012 08:24:13
gem5 executing on bpayne-VirtualBox64
command line: build/ARM/gem5.opt configs/example/se.py -c
tests/test-progs/bens_benchmark/bin/arm/readwrite_linaro.lex
Global frequency set at 1000000000000 ticks per second
0: system.remote_gdb.listener: listening for remote gdb #0 on port 7000
**** REAL SIMULATION ****
CPU time= 0.070000 seconds
hack: be nice to actually delete the event here
real 2m48.492s
user 1m47.319s
sys 0m2.792s
Wed Oct 24 08:27:01 EDT 2012
**************************************
Next I ran the same bench.c code in MARSS using the system image
http://bertha.cs.binghamton.edu/downloads/ubuntu-natty.tar.bz2
The boot time for full simulation mode of MARSS (how long until I'm at the
login terminal via VNC) is 42 seconds (33 times faster than gem5).
Wed Oct 24 14:38:25 UTC 2012
CPU time= 8.66 seconds
real 0m8.752s
user 0m1.200s
sys 0m7.490s
Wed Oct 24 14:38:34 UTC 2012
The wall clock time for this simulation is roughly 9 seconds. The CPUs
are different, so it doesn't make sense to compare MARSS's 8.66 seconds to
gem5's 0.07 seconds. What is relevant is the slowdown factor -- 1 for
MARSS, between 1200 and 2700 for gem5.
**************************************
All of these timings were carried out in Ubuntu 12.04 64bit running in a
single CPU VirtualBox, hosted by Ubuntu 12.04 64bit with Intel Core i7 930
@ 2.80 Ghz. The host system has 6GB of RAM, and the VirtualBox has 2GB.
"bench.c" is a program to load the CPU and file I/O
/* benchmark
* 20121018
* Ben Payne
* load CPU and file I/O
*/
#include <stdio.h>
#include <time.h>
main()
{
int number_of_computes;
int number_of_read_writes;
int number_of_iterations;
int iteration_indx;
int read_write_indx;
int compute_indx;
int valu;
int temp_read;
clock_t time_start, time_end;
double cpuTime;
FILE *outfile;
FILE *infile;
time_start = clock();
number_of_computes=500;
number_of_read_writes=100;
number_of_iterations=100;
for (iteration_indx = 1; iteration_indx <= number_of_iterations ;
iteration_indx++)
{
for (read_write_indx = 1; read_write_indx <= number_of_read_writes ;
read_write_indx++)
{
outfile = fopen("out.dat","a+"); /* apend file (add text to a file
or create a file if it does not exist.*/
fprintf(outfile,"%u\n",read_write_indx); /*writes*/
fclose(outfile);
for (compute_indx = 1; compute_indx <= number_of_computes ;
compute_indx++)
{
valu=(compute_indx+1)*23;
}
infile = fopen("out.dat","r");
fscanf(infile,"%d",&temp_read);
fclose(infile);
}
}
time_end = clock();
cpuTime= ((double)(time_end-time_start))/ (CLOCKS_PER_SEC);
printf("CPU time= %f seconds\n",cpuTime);
return 0;
}
Behalf Of Hamid Reza Khaleghzadeh
Sent: Tuesday, October 23, 2012 10:26 AM
To: gem5 users mailing list
Subject: Re: [gem5-users] gem5 versus MARSS
Thanks for your answer. Ruby is a module in GEM5 which simulate memory
hierarchy. Suppose there is an application that its execution time is 20 ms
on a real system. GEM5 simulate the application in about 15 min. Hos is
MARSS86 simulation speed?
Hello,
I'm not familiar with what you are referring to by the ruby module - is
that an addon for Gem5?
You have a good question, but how would I quantify the difference in
simulation speeds between MARSS and Gem5? Is there an established benchmark
to run?
Kindly,
Ben Payne
Behalf Of Hamid Reza Khaleghzadeh
Sent: Tuesday, October 23, 2012 9:31 AM
To: gem5 users mailing list
Subject: Re: [gem5-users] gem5 versus MARSS
I have a question about MARSS. As you know GEM5 simulation speed with ruby
module is very slow. May I know MARSS simulation speed?
Thanks
Hi Benjamin,
a variety of CPU models that are orthogonal to the ISA, atomic for speed,
in order and O3 for details uarch models
BSD license (thus both academia and companies involved and contributing)
full-system ready-to-run Android disk images and configurations, not just
your average chip-multi-processor, but also heterogeneous
application-processor-like systems with state-of-the-art CPU models
a very active (and large) user community
Ultimately using one or the other really depends on what problem it is you want to address.
Andreas
Date: Monday, 22 October 2012 22:06
Subject: [gem5-users] gem5 versus MARSS
Hello,
What is the difference between gem5
http://gem5.org/Main_Page
and MARSS (Micro-ARchitectural and System Simulator for x86-based Systems)
http://marss86.org/~marss86/index.php/Home
As far as I can tell,
-gem5 can support Alpha, ARM, SPARC, and x86 instruction set
architectures, whereas MARSS is only for x86.
-gem5 can be integrated into Structural Simulation Toolkit, whereas MARSS has not been
-both gem5 and MARSS can simulate multiple cores
-both gem5 and MARSS can use DRAMSim2
Please correct me if any of these statements are incorrect.
Are there any other considerations?
Thank you,
Ben Payne
http://mst.edu/~bhpxc9/
Suite 450, Room S452
5520 Research Park Drive
Catonsville, MD 21228-4870
Laboratory for Physical Sciences
http://www.lps.umd.edu/
office: 443-654-7890
cell: 608-308-2413
-- IMPORTANT NOTICE: The contents of this email and any attachments are
confidential and may also be privileged. If you are not the intended
recipient, please notify the sender immediately and do not disclose the
contents to any other person, use it for any purpose, or store or copy the
information in any medium. Thank you.
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
--
Hamid Reza Khaleghzadeh
http://hkhaleghzadeh.webs.com
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
--
Hamid Reza Khaleghzadeh
http://hkhaleghzadeh.webs.com
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
Paul Rosenfeld
2012-10-24 15:43:11 UTC
Permalink
Another thing to note is that in the master branch of marss, you can expect
the slowdown for running more cores to be pretty much linear (not sure if
this is the case with gem5). QEMU emulates each core in sequence so as you
add cores, the simulation time goes up linearly. They do have some
experimental extensions for multithreading the core execution, but I'm not
sure how much speedup you can claw back (I haven't used it myself).

Overall, I think the decision comes down to how flexible your modelling
needs are. If you need to run your experiment across multiple ISAs or on
some non-x86 ISA (or if you need the full coherence modeling power of
ruby), I'd say gem5 is your best bet. However, for x86 simulation, I
personally tend to lean towards marss -- it has the benefit of picking one
target and trying to do it well, which can make it much easier to
understand how to change the code.

One more thing to consider is that if your simulation is device-centric
(hard disk, network card), you might want to find out the finer points of
how marss handles these things. IIRC, since QEMU handles device emulation,
it might be a bit difficult to get good simulation data on the effects of
things like NICs and disks without doing some work first.

Also, to comment on Steve's point about the level of CPU model detail being
the same, that is also another difference between marss and gem5: there
isn't really a way to do a functional simulation in marss -- you're pretty
much always stuck with the full-on detailed model. They have an out of
order model which models the full superscalar out of order pipeline and
they have a simple "intel Atom-like" model which is much simpler (in
order), but that's pretty much the only knob you get in terms of detail.

I'd agree with Steve's point about the boot time being a smaller issue
since both simulators support the "checkpoint at the start of the
simulation" option. That said, I found myself screwing around with the
actual disk images and benchmarks more than I thought I'd have to (mostly
in things like tweaking the parameters to benchmarks, trying to write new
micro benchmarks that would inevitably end up doing something incorrectly
and I'd have to recompile them and re-checkpoint them).

I hope I don't sound like I'm a marss cheerleader, but since you asked this
question on the gem5 list, I feel like someone should try to balance out
the picture a bit.

-Paul
Post by Steve Reinhardt
Thanks for the benchmarking effort, Ben. These are interesting numbers,
but before people read too much into them I thought I'd throw out some
- A much better way to measure slowdown is to compare the execution time
in the simulator with the execution time on a real system. The reported
simulated runtime (i.e., what you're getting from running 'time' in the
simulator itself) reflects whatever configuration you're modeling, which
may or may not be realistic (and if you're not running a detailed timing
model, it's unlikely to be realistic). That is, the wall-clock simulation
runtime is going to be the same whether I configure the simulated CPU to
run at a simulated 2 GHz or 2 kHz, but the slowdown/speedup as you've
calculated it would be very different.
- OS boot speed is a useful number to have, but not a representative
workload for looking at typical simulation jobs. Generally when people use
FS mode in gem5 they boot the OS once, take a checkpoint, and run their
simulations from there. Also, though I haven't run FS mode myself
recently, 23 minutes sounds extremely slow; my recollection is that boot is
pretty fast (just a few minutes). Part of that is also that we typically
boot a more stripped-down image and not a full install (which is typically
unnecessary for benchmarking). Also, there are delay loops that we skip
that might not be properly skipped if you're using a different kernel image.
- You need to make sure that the level of detail of the simulation model
is the same in both cases, and probably do a comparison at multiple levels
(e.g., fast functional simulation vs. detailed out-of-order CPU and caches).
I don't mean to sound overly critical or like I'm making excuses... I
expect MARSS probably is faster than gem5, particularly for fast functional
simulation, because they seem to focus a lot on speed while we focus more
on flexibility and modularity. (Though there has been some work on using
KVM to provide extremely fast functional modeling for gem5, which should
make up a lot of the difference for that mode of operation.) I just want
to make sure that the comparisons are fair and meaningful.
Thanks,
Steve
Post by Payne, Benjamin
Prompted by Hamid's question about simulation speed comparison with
MARSS, I wrote a small benchmark (see bottom of this email), then compiled
and ran it within the gem5 full system emulation using the disk image
http://www.gem5.org/dist/current/arm/arm-system-2011-08.tar.bz2
The gem5 configuration is with all the defaults,
build/ARM/gem5.opt configs/example/fs.py
--disk-image=/home/bpayne/full_system_for_gem5/disks/arm-ubuntu-natty-headless.img
The boot time for full simulation mode (how long until I'm at the login
terminal via telnet) is 23 minutes.
In full simulation mode, I see the following output (my binary is called "a.out")
date; time ./a.out; date
Wed Dec 31 20:49:26 CST 1969
CPU time= 0.210000 seconds
real 0m0.216s
user 0m0.060s
sys 0m0.150s
Wed Dec 31 20:49:27 CST 1969
The wall clock time (how long I wait for the simulated system) is about 4
minutes. Thus the slowdown is a factor of (4*60)/.2=1200, which is
consistent with previous runs I've done.
Next I ran the same code in syscall emulation mode, cross compiled using
Linaro for ARM. This took 168 seconds of wall clock time and 0.07 seconds
of simulated time, a ratio of (2*60+48)/0.07=2400 [twice as fast as full
system emulation!]. I repeated the same measure with bench.c cross-compiled
for ARM using Mentor Graphics Sourcery Tools. The syscall emulation took
162 wall clock seconds and 0.06 simulation seconds, a ratio of 2700. [These
numbers may be somewhat inaccurate due to the low simulation time.] Below
is how I captured the times in syscall emulation mode.
configs/example/se.py -c
tests/test-progs/bens_benchmark/bin/arm/bench_linaro.lex ; date
Wed Oct 24 08:24:13 EDT 2012
gem5 Simulator System. http://gem5.org
gem5 is copyrighted software; use the --copyright option for details.
gem5 compiled Oct 16 2012 13:57:10
gem5 started Oct 24 2012 08:24:13
gem5 executing on bpayne-VirtualBox64
command line: build/ARM/gem5.opt configs/example/se.py -c
tests/test-progs/bens_benchmark/bin/arm/readwrite_linaro.lex
Global frequency set at 1000000000000 ticks per second
0: system.remote_gdb.listener: listening for remote gdb #0 on port 7000
**** REAL SIMULATION ****
CPU time= 0.070000 seconds
hack: be nice to actually delete the event here
real 2m48.492s
user 1m47.319s
sys 0m2.792s
Wed Oct 24 08:27:01 EDT 2012
**************************************
Next I ran the same bench.c code in MARSS using the system image
http://bertha.cs.binghamton.edu/downloads/ubuntu-natty.tar.bz2
The boot time for full simulation mode of MARSS (how long until I'm at
the login terminal via VNC) is 42 seconds (33 times faster than gem5).
Wed Oct 24 14:38:25 UTC 2012
CPU time= 8.66 seconds
real 0m8.752s
user 0m1.200s
sys 0m7.490s
Wed Oct 24 14:38:34 UTC 2012
The wall clock time for this simulation is roughly 9 seconds. The CPUs
are different, so it doesn't make sense to compare MARSS's 8.66 seconds to
gem5's 0.07 seconds. What is relevant is the slowdown factor -- 1 for
MARSS, between 1200 and 2700 for gem5.
**************************************
All of these timings were carried out in Ubuntu 12.04 64bit running in a
single CPU VirtualBox, hosted by Ubuntu 12.04 64bit with Intel Core i7 930
@ 2.80 Ghz. The host system has 6GB of RAM, and the VirtualBox has 2GB.
"bench.c" is a program to load the CPU and file I/O
/* benchmark
* 20121018
* Ben Payne
* load CPU and file I/O
*/
#include <stdio.h>
#include <time.h>
main()
{
int number_of_computes;
int number_of_read_writes;
int number_of_iterations;
int iteration_indx;
int read_write_indx;
int compute_indx;
int valu;
int temp_read;
clock_t time_start, time_end;
double cpuTime;
FILE *outfile;
FILE *infile;
time_start = clock();
number_of_computes=500;
number_of_read_writes=100;
number_of_iterations=100;
for (iteration_indx = 1; iteration_indx <= number_of_iterations ;
iteration_indx++)
{
for (read_write_indx = 1; read_write_indx <= number_of_read_writes ;
read_write_indx++)
{
outfile = fopen("out.dat","a+"); /* apend file (add text to a file
or create a file if it does not exist.*/
fprintf(outfile,"%u\n",read_write_indx); /*writes*/
fclose(outfile);
for (compute_indx = 1; compute_indx <= number_of_computes ;
compute_indx++)
{
valu=(compute_indx+1)*23;
}
infile = fopen("out.dat","r");
fscanf(infile,"%d",&temp_read);
fclose(infile);
}
}
time_end = clock();
cpuTime= ((double)(time_end-time_start))/ (CLOCKS_PER_SEC);
printf("CPU time= %f seconds\n",cpuTime);
return 0;
}
On Behalf Of Hamid Reza Khaleghzadeh
Sent: Tuesday, October 23, 2012 10:26 AM
To: gem5 users mailing list
Subject: Re: [gem5-users] gem5 versus MARSS
Thanks for your answer. Ruby is a module in GEM5 which simulate memory
hierarchy. Suppose there is an application that its execution time is 20 ms
on a real system. GEM5 simulate the application in about 15 min. Hos is
MARSS86 simulation speed?
Hello,
I'm not familiar with what you are referring to by the ruby module - is
that an addon for Gem5?
You have a good question, but how would I quantify the difference in
simulation speeds between MARSS and Gem5? Is there an established benchmark
to run?
Kindly,
Ben Payne
On Behalf Of Hamid Reza Khaleghzadeh
Sent: Tuesday, October 23, 2012 9:31 AM
To: gem5 users mailing list
Subject: Re: [gem5-users] gem5 versus MARSS
I have a question about MARSS. As you know GEM5 simulation speed with
ruby module is very slow. May I know MARSS simulation speed?
Thanks
Hi Benjamin,
a variety of CPU models that are orthogonal to the ISA, atomic for speed,
in order and O3 for details uarch models
BSD license (thus both academia and companies involved and contributing)
full-system ready-to-run Android disk images and configurations, not just
your average chip-multi-processor, but also heterogeneous
application-processor-like systems with state-of-the-art CPU models
a very active (and large) user community
Ultimately using one or the other really depends on what problem it is
you want to address.
Andreas
Date: Monday, 22 October 2012 22:06
Subject: [gem5-users] gem5 versus MARSS
Hello,
What is the difference between gem5
http://gem5.org/Main_Page
and MARSS (Micro-ARchitectural and System Simulator for x86-based Systems)
http://marss86.org/~marss86/index.php/Home
As far as I can tell,
-gem5 can support Alpha, ARM, SPARC, and x86 instruction set
architectures, whereas MARSS is only for x86.
-gem5 can be integrated into Structural Simulation Toolkit, whereas MARSS has not been
-both gem5 and MARSS can simulate multiple cores
-both gem5 and MARSS can use DRAMSim2
Please correct me if any of these statements are incorrect.
Are there any other considerations?
Thank you,
Ben Payne
http://mst.edu/~bhpxc9/
Suite 450, Room S452
5520 Research Park Drive
Catonsville, MD 21228-4870
Laboratory for Physical Sciences
http://www.lps.umd.edu/
office: 443-654-7890
cell: 608-308-2413
-- IMPORTANT NOTICE: The contents of this email and any attachments are
confidential and may also be privileged. If you are not the intended
recipient, please notify the sender immediately and do not disclose the
contents to any other person, use it for any purpose, or store or copy the
information in any medium. Thank you.
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
--
Hamid Reza Khaleghzadeh
http://hkhaleghzadeh.webs.com
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
--
Hamid Reza Khaleghzadeh
http://hkhaleghzadeh.webs.com
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
Steve Reinhardt
2012-10-24 16:14:01 UTC
Permalink
Thanks for the input, Paul. No need to apologize for being pro-marss;
different tools have different strengths and there's no point in someone
using gem5 if it's not the best tool for the job. (It's not like we're
losing revenue...) I'm also very interested in their perceived
strengths---a little friendly competition keeps us all on our toes.

As far as multithreading, gem5 is also single-threaded so you'll typically
see linear slowdown when modeling MP systems as well. We have plans to
multithread the simulation engine, but much time has passed and they still
remain mostly just plans, so don't hold your breath on that.

Steve
Post by Paul Rosenfeld
Another thing to note is that in the master branch of marss, you can
expect the slowdown for running more cores to be pretty much linear (not
sure if this is the case with gem5). QEMU emulates each core in sequence so
as you add cores, the simulation time goes up linearly. They do have some
experimental extensions for multithreading the core execution, but I'm not
sure how much speedup you can claw back (I haven't used it myself).
Overall, I think the decision comes down to how flexible your modelling
needs are. If you need to run your experiment across multiple ISAs or on
some non-x86 ISA (or if you need the full coherence modeling power of
ruby), I'd say gem5 is your best bet. However, for x86 simulation, I
personally tend to lean towards marss -- it has the benefit of picking one
target and trying to do it well, which can make it much easier to
understand how to change the code.
One more thing to consider is that if your simulation is device-centric
(hard disk, network card), you might want to find out the finer points of
how marss handles these things. IIRC, since QEMU handles device emulation,
it might be a bit difficult to get good simulation data on the effects of
things like NICs and disks without doing some work first.
Also, to comment on Steve's point about the level of CPU model detail
there isn't really a way to do a functional simulation in marss -- you're
pretty much always stuck with the full-on detailed model. They have an out
of order model which models the full superscalar out of order pipeline and
they have a simple "intel Atom-like" model which is much simpler (in
order), but that's pretty much the only knob you get in terms of detail.
I'd agree with Steve's point about the boot time being a smaller issue
since both simulators support the "checkpoint at the start of the
simulation" option. That said, I found myself screwing around with the
actual disk images and benchmarks more than I thought I'd have to (mostly
in things like tweaking the parameters to benchmarks, trying to write new
micro benchmarks that would inevitably end up doing something incorrectly
and I'd have to recompile them and re-checkpoint them).
I hope I don't sound like I'm a marss cheerleader, but since you asked
this question on the gem5 list, I feel like someone should try to balance
out the picture a bit.
-Paul
Post by Steve Reinhardt
Thanks for the benchmarking effort, Ben. These are interesting numbers,
but before people read too much into them I thought I'd throw out some
- A much better way to measure slowdown is to compare the execution time
in the simulator with the execution time on a real system. The reported
simulated runtime (i.e., what you're getting from running 'time' in the
simulator itself) reflects whatever configuration you're modeling, which
may or may not be realistic (and if you're not running a detailed timing
model, it's unlikely to be realistic). That is, the wall-clock simulation
runtime is going to be the same whether I configure the simulated CPU to
run at a simulated 2 GHz or 2 kHz, but the slowdown/speedup as you've
calculated it would be very different.
- OS boot speed is a useful number to have, but not a representative
workload for looking at typical simulation jobs. Generally when people use
FS mode in gem5 they boot the OS once, take a checkpoint, and run their
simulations from there. Also, though I haven't run FS mode myself
recently, 23 minutes sounds extremely slow; my recollection is that boot is
pretty fast (just a few minutes). Part of that is also that we typically
boot a more stripped-down image and not a full install (which is typically
unnecessary for benchmarking). Also, there are delay loops that we skip
that might not be properly skipped if you're using a different kernel image.
- You need to make sure that the level of detail of the simulation model
is the same in both cases, and probably do a comparison at multiple levels
(e.g., fast functional simulation vs. detailed out-of-order CPU and caches).
I don't mean to sound overly critical or like I'm making excuses... I
expect MARSS probably is faster than gem5, particularly for fast functional
simulation, because they seem to focus a lot on speed while we focus more
on flexibility and modularity. (Though there has been some work on using
KVM to provide extremely fast functional modeling for gem5, which should
make up a lot of the difference for that mode of operation.) I just want
to make sure that the comparisons are fair and meaningful.
Thanks,
Steve
Post by Payne, Benjamin
Prompted by Hamid's question about simulation speed comparison with
MARSS, I wrote a small benchmark (see bottom of this email), then compiled
and ran it within the gem5 full system emulation using the disk image
http://www.gem5.org/dist/current/arm/arm-system-2011-08.tar.bz2
The gem5 configuration is with all the defaults,
build/ARM/gem5.opt configs/example/fs.py
--disk-image=/home/bpayne/full_system_for_gem5/disks/arm-ubuntu-natty-headless.img
The boot time for full simulation mode (how long until I'm at the login
terminal via telnet) is 23 minutes.
In full simulation mode, I see the following output (my binary is called "a.out")
date; time ./a.out; date
Wed Dec 31 20:49:26 CST 1969
CPU time= 0.210000 seconds
real 0m0.216s
user 0m0.060s
sys 0m0.150s
Wed Dec 31 20:49:27 CST 1969
The wall clock time (how long I wait for the simulated system) is about
4 minutes. Thus the slowdown is a factor of (4*60)/.2=1200, which is
consistent with previous runs I've done.
Next I ran the same code in syscall emulation mode, cross compiled using
Linaro for ARM. This took 168 seconds of wall clock time and 0.07 seconds
of simulated time, a ratio of (2*60+48)/0.07=2400 [twice as fast as full
system emulation!]. I repeated the same measure with bench.c cross-compiled
for ARM using Mentor Graphics Sourcery Tools. The syscall emulation took
162 wall clock seconds and 0.06 simulation seconds, a ratio of 2700. [These
numbers may be somewhat inaccurate due to the low simulation time.] Below
is how I captured the times in syscall emulation mode.
configs/example/se.py -c
tests/test-progs/bens_benchmark/bin/arm/bench_linaro.lex ; date
Wed Oct 24 08:24:13 EDT 2012
gem5 Simulator System. http://gem5.org
gem5 is copyrighted software; use the --copyright option for details.
gem5 compiled Oct 16 2012 13:57:10
gem5 started Oct 24 2012 08:24:13
gem5 executing on bpayne-VirtualBox64
command line: build/ARM/gem5.opt configs/example/se.py -c
tests/test-progs/bens_benchmark/bin/arm/readwrite_linaro.lex
Global frequency set at 1000000000000 ticks per second
0: system.remote_gdb.listener: listening for remote gdb #0 on port 7000
**** REAL SIMULATION ****
CPU time= 0.070000 seconds
hack: be nice to actually delete the event here
real 2m48.492s
user 1m47.319s
sys 0m2.792s
Wed Oct 24 08:27:01 EDT 2012
**************************************
Next I ran the same bench.c code in MARSS using the system image
http://bertha.cs.binghamton.edu/downloads/ubuntu-natty.tar.bz2
The boot time for full simulation mode of MARSS (how long until I'm at
the login terminal via VNC) is 42 seconds (33 times faster than gem5).
Wed Oct 24 14:38:25 UTC 2012
CPU time= 8.66 seconds
real 0m8.752s
user 0m1.200s
sys 0m7.490s
Wed Oct 24 14:38:34 UTC 2012
The wall clock time for this simulation is roughly 9 seconds. The CPUs
are different, so it doesn't make sense to compare MARSS's 8.66 seconds to
gem5's 0.07 seconds. What is relevant is the slowdown factor -- 1 for
MARSS, between 1200 and 2700 for gem5.
**************************************
All of these timings were carried out in Ubuntu 12.04 64bit running in a
single CPU VirtualBox, hosted by Ubuntu 12.04 64bit with Intel Core i7 930
@ 2.80 Ghz. The host system has 6GB of RAM, and the VirtualBox has 2GB.
"bench.c" is a program to load the CPU and file I/O
/* benchmark
* 20121018
* Ben Payne
* load CPU and file I/O
*/
#include <stdio.h>
#include <time.h>
main()
{
int number_of_computes;
int number_of_read_writes;
int number_of_iterations;
int iteration_indx;
int read_write_indx;
int compute_indx;
int valu;
int temp_read;
clock_t time_start, time_end;
double cpuTime;
FILE *outfile;
FILE *infile;
time_start = clock();
number_of_computes=500;
number_of_read_writes=100;
number_of_iterations=100;
for (iteration_indx = 1; iteration_indx <= number_of_iterations ;
iteration_indx++)
{
for (read_write_indx = 1; read_write_indx <= number_of_read_writes ;
read_write_indx++)
{
outfile = fopen("out.dat","a+"); /* apend file (add text to a file
or create a file if it does not exist.*/
fprintf(outfile,"%u\n",read_write_indx); /*writes*/
fclose(outfile);
for (compute_indx = 1; compute_indx <= number_of_computes ;
compute_indx++)
{
valu=(compute_indx+1)*23;
}
infile = fopen("out.dat","r");
fscanf(infile,"%d",&temp_read);
fclose(infile);
}
}
time_end = clock();
cpuTime= ((double)(time_end-time_start))/ (CLOCKS_PER_SEC);
printf("CPU time= %f seconds\n",cpuTime);
return 0;
}
On Behalf Of Hamid Reza Khaleghzadeh
Sent: Tuesday, October 23, 2012 10:26 AM
To: gem5 users mailing list
Subject: Re: [gem5-users] gem5 versus MARSS
Thanks for your answer. Ruby is a module in GEM5 which simulate memory
hierarchy. Suppose there is an application that its execution time is 20 ms
on a real system. GEM5 simulate the application in about 15 min. Hos is
MARSS86 simulation speed?
Hello,
I'm not familiar with what you are referring to by the ruby module - is
that an addon for Gem5?
You have a good question, but how would I quantify the difference in
simulation speeds between MARSS and Gem5? Is there an established benchmark
to run?
Kindly,
Ben Payne
On Behalf Of Hamid Reza Khaleghzadeh
Sent: Tuesday, October 23, 2012 9:31 AM
To: gem5 users mailing list
Subject: Re: [gem5-users] gem5 versus MARSS
I have a question about MARSS. As you know GEM5 simulation speed with
ruby module is very slow. May I know MARSS simulation speed?
Thanks
On Tue, Oct 23, 2012 at 2:26 AM, Andreas Hansson <
Hi Benjamin,
a variety of CPU models that are orthogonal to the ISA, atomic for
speed, in order and O3 for details uarch models
BSD license (thus both academia and companies involved and contributing)
full-system ready-to-run Android disk images and configurations, not
just your average chip-multi-processor, but also heterogeneous
application-processor-like systems with state-of-the-art CPU models
a very active (and large) user community
Ultimately using one or the other really depends on what problem it is
you want to address.
Andreas
Date: Monday, 22 October 2012 22:06
Subject: [gem5-users] gem5 versus MARSS
Hello,
What is the difference between gem5
http://gem5.org/Main_Page
and MARSS (Micro-ARchitectural and System Simulator for x86-based Systems)
http://marss86.org/~marss86/index.php/Home
As far as I can tell,
-gem5 can support Alpha, ARM, SPARC, and x86 instruction set
architectures, whereas MARSS is only for x86.
-gem5 can be integrated into Structural Simulation Toolkit, whereas MARSS has not been
-both gem5 and MARSS can simulate multiple cores
-both gem5 and MARSS can use DRAMSim2
Please correct me if any of these statements are incorrect.
Are there any other considerations?
Thank you,
Ben Payne
http://mst.edu/~bhpxc9/
Suite 450, Room S452
5520 Research Park Drive
Catonsville, MD 21228-4870
Laboratory for Physical Sciences
http://www.lps.umd.edu/
office: 443-654-7890
cell: 608-308-2413
-- IMPORTANT NOTICE: The contents of this email and any attachments are
confidential and may also be privileged. If you are not the intended
recipient, please notify the sender immediately and do not disclose the
contents to any other person, use it for any purpose, or store or copy the
information in any medium. Thank you.
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
--
Hamid Reza Khaleghzadeh
http://hkhaleghzadeh.webs.com
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
--
Hamid Reza Khaleghzadeh
http://hkhaleghzadeh.webs.com
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
Payne, Benjamin
2012-10-24 17:28:33 UTC
Permalink
Thanks to everyone for your helpful responses. I know the timing comparisons aren't that meaningful for measurement, but they are the initial response provided by a default setup of MARSS and gem5. (I haven't started moving towards simulating real systems yet, so I didn't bother tuning the configurations).

Ben

From: gem5-users-***@gem5.org [mailto:gem5-users-***@gem5.org] On Behalf Of Steve Reinhardt
Sent: Wednesday, October 24, 2012 12:14 PM
To: gem5 users mailing list
Subject: Re: [gem5-users] gem5 versus MARSS

Thanks for the input, Paul. No need to apologize for being pro-marss; different tools have different strengths and there's no point in someone using gem5 if it's not the best tool for the job. (It's not like we're losing revenue...) I'm also very interested in their perceived strengths---a little friendly competition keeps us all on our toes.

As far as multithreading, gem5 is also single-threaded so you'll typically see linear slowdown when modeling MP systems as well. We have plans to multithread the simulation engine, but much time has passed and they still remain mostly just plans, so don't hold your breath on that.

Steve
On Wed, Oct 24, 2012 at 8:43 AM, Paul Rosenfeld <***@gmail.com<mailto:***@gmail.com>> wrote:
Another thing to note is that in the master branch of marss, you can expect the slowdown for running more cores to be pretty much linear (not sure if this is the case with gem5). QEMU emulates each core in sequence so as you add cores, the simulation time goes up linearly. They do have some experimental extensions for multithreading the core execution, but I'm not sure how much speedup you can claw back (I haven't used it myself).

Overall, I think the decision comes down to how flexible your modelling needs are. If you need to run your experiment across multiple ISAs or on some non-x86 ISA (or if you need the full coherence modeling power of ruby), I'd say gem5 is your best bet. However, for x86 simulation, I personally tend to lean towards marss -- it has the benefit of picking one target and trying to do it well, which can make it much easier to understand how to change the code.

One more thing to consider is that if your simulation is device-centric (hard disk, network card), you might want to find out the finer points of how marss handles these things. IIRC, since QEMU handles device emulation, it might be a bit difficult to get good simulation data on the effects of things like NICs and disks without doing some work first.

Also, to comment on Steve's point about the level of CPU model detail being the same, that is also another difference between marss and gem5: there isn't really a way to do a functional simulation in marss -- you're pretty much always stuck with the full-on detailed model. They have an out of order model which models the full superscalar out of order pipeline and they have a simple "intel Atom-like" model which is much simpler (in order), but that's pretty much the only knob you get in terms of detail.

I'd agree with Steve's point about the boot time being a smaller issue since both simulators support the "checkpoint at the start of the simulation" option. That said, I found myself screwing around with the actual disk images and benchmarks more than I thought I'd have to (mostly in things like tweaking the parameters to benchmarks, trying to write new micro benchmarks that would inevitably end up doing something incorrectly and I'd have to recompile them and re-checkpoint them).

I hope I don't sound like I'm a marss cheerleader, but since you asked this question on the gem5 list, I feel like someone should try to balance out the picture a bit.

-Paul



On Wed, Oct 24, 2012 at 11:18 AM, Steve Reinhardt <***@gmail.com<mailto:***@gmail.com>> wrote:
Thanks for the benchmarking effort, Ben. These are interesting numbers, but before people read too much into them I thought I'd throw out some caveats:

- A much better way to measure slowdown is to compare the execution time in the simulator with the execution time on a real system. The reported simulated runtime (i.e., what you're getting from running 'time' in the simulator itself) reflects whatever configuration you're modeling, which may or may not be realistic (and if you're not running a detailed timing model, it's unlikely to be realistic). That is, the wall-clock simulation runtime is going to be the same whether I configure the simulated CPU to run at a simulated 2 GHz or 2 kHz, but the slowdown/speedup as you've calculated it would be very different.

- OS boot speed is a useful number to have, but not a representative workload for looking at typical simulation jobs. Generally when people use FS mode in gem5 they boot the OS once, take a checkpoint, and run their simulations from there. Also, though I haven't run FS mode myself recently, 23 minutes sounds extremely slow; my recollection is that boot is pretty fast (just a few minutes). Part of that is also that we typically boot a more stripped-down image and not a full install (which is typically unnecessary for benchmarking). Also, there are delay loops that we skip that might not be properly skipped if you're using a different kernel image.

- You need to make sure that the level of detail of the simulation model is the same in both cases, and probably do a comparison at multiple levels (e.g., fast functional simulation vs. detailed out-of-order CPU and caches).

I don't mean to sound overly critical or like I'm making excuses... I expect MARSS probably is faster than gem5, particularly for fast functional simulation, because they seem to focus a lot on speed while we focus more on flexibility and modularity. (Though there has been some work on using KVM to provide extremely fast functional modeling for gem5, which should make up a lot of the difference for that mode of operation.) I just want to make sure that the comparisons are fair and meaningful.

Thanks,

Steve

On Wed, Oct 24, 2012 at 7:46 AM, Payne, Benjamin <***@lps.umd.edu<mailto:***@lps.umd.edu>> wrote:
Prompted by Hamid's question about simulation speed comparison with MARSS, I wrote a small benchmark (see bottom of this email), then compiled and ran it within the gem5 full system emulation using the disk image
http://www.gem5.org/dist/current/arm/arm-system-2011-08.tar.bz2
The gem5 configuration is with all the defaults,
build/ARM/gem5.opt configs/example/fs.py --disk-image=/home/bpayne/full_system_for_gem5/disks/arm-ubuntu-natty-headless.img

The boot time for full simulation mode (how long until I'm at the login terminal via telnet) is 23 minutes.

In full simulation mode, I see the following output (my binary is called "a.out")

***@gem5sim:~# date; time ./a.out; date
date; time ./a.out; date
Wed Dec 31 20:49:26 CST 1969
CPU time= 0.210000 seconds
real 0m0.216s
user 0m0.060s
sys 0m0.150s
Wed Dec 31 20:49:27 CST 1969
***@gem5sim:~#

The wall clock time (how long I wait for the simulated system) is about 4 minutes. Thus the slowdown is a factor of (4*60)/.2=1200, which is consistent with previous runs I've done.

Next I ran the same code in syscall emulation mode, cross compiled using Linaro for ARM. This took 168 seconds of wall clock time and 0.07 seconds of simulated time, a ratio of (2*60+48)/0.07=2400 [twice as fast as full system emulation!]. I repeated the same measure with bench.c cross-compiled for ARM using Mentor Graphics Sourcery Tools. The syscall emulation took 162 wall clock seconds and 0.06 simulation seconds, a ratio of 2700. [These numbers may be somewhat inaccurate due to the low simulation time.] Below is how I captured the times in syscall emulation mode.

***@bpayne-VirtualBox64:~/gem5$ date; time build/ARM/gem5.opt configs/example/se.py -c tests/test-progs/bens_benchmark/bin/arm/bench_linaro.lex ; date
Wed Oct 24 08:24:13 EDT 2012
gem5 Simulator System. http://gem5.org
gem5 is copyrighted software; use the --copyright option for details.
gem5 compiled Oct 16 2012 13:57:10
gem5 started Oct 24 2012 08:24:13
gem5 executing on bpayne-VirtualBox64
command line: build/ARM/gem5.opt configs/example/se.py -c tests/test-progs/bens_benchmark/bin/arm/readwrite_linaro.lex
Global frequency set at 1000000000000 ticks per second
0: system.remote_gdb.listener: listening for remote gdb #0 on port 7000
**** REAL SIMULATION ****
info: Entering event queue @ 0. Starting simulation...
CPU time= 0.070000 seconds
hack: be nice to actually delete the event here
Exiting @ tick 73563381000 because target called exit()
real 2m48.492s
user 1m47.319s
sys 0m2.792s
Wed Oct 24 08:27:01 EDT 2012
***@bpayne-VirtualBox64:~/gem5$

**************************************

Next I ran the same bench.c code in MARSS using the system image
http://bertha.cs.binghamton.edu/downloads/ubuntu-natty.tar.bz2

The boot time for full simulation mode of MARSS (how long until I'm at the login terminal via VNC) is 42 seconds (33 times faster than gem5).
I compiled a static binary of bench.c and ran it in MARSS:

***@ubuntu:~# date; time ./bench.lex ; date
Wed Oct 24 14:38:25 UTC 2012
CPU time= 8.66 seconds
real 0m8.752s
user 0m1.200s
sys 0m7.490s
Wed Oct 24 14:38:34 UTC 2012
***@ubuntu:~#

The wall clock time for this simulation is roughly 9 seconds. The CPUs are different, so it doesn't make sense to compare MARSS's 8.66 seconds to gem5's 0.07 seconds. What is relevant is the slowdown factor -- 1 for MARSS, between 1200 and 2700 for gem5.

**************************************

All of these timings were carried out in Ubuntu 12.04 64bit running in a single CPU VirtualBox, hosted by Ubuntu 12.04 64bit with Intel Core i7 930 @ 2.80 Ghz. The host system has 6GB of RAM, and the VirtualBox has 2GB.

"bench.c" is a program to load the CPU and file I/O

/* benchmark
* 20121018
* Ben Payne
* load CPU and file I/O
*/

#include <stdio.h>
#include <time.h>
main()
{
int number_of_computes;
int number_of_read_writes;
int number_of_iterations;
int iteration_indx;
int read_write_indx;
int compute_indx;
int valu;
int temp_read;
clock_t time_start, time_end;
double cpuTime;
FILE *outfile;
FILE *infile;
time_start = clock();
number_of_computes=500;
number_of_read_writes=100;
number_of_iterations=100;

for (iteration_indx = 1; iteration_indx <= number_of_iterations ; iteration_indx++)
{
for (read_write_indx = 1; read_write_indx <= number_of_read_writes ; read_write_indx++)
{
outfile = fopen("out.dat","a+"); /* apend file (add text to a file or create a file if it does not exist.*/
fprintf(outfile,"%u\n",read_write_indx); /*writes*/
fclose(outfile);
for (compute_indx = 1; compute_indx <= number_of_computes ; compute_indx++)
{
valu=(compute_indx+1)*23;
}
infile = fopen("out.dat","r");
fscanf(infile,"%d",&temp_read);
fclose(infile);
}
}
time_end = clock();
cpuTime= ((double)(time_end-time_start))/ (CLOCKS_PER_SEC);
printf("CPU time= %f seconds\n",cpuTime);
return 0;
}




From: gem5-users-***@gem5.org<mailto:gem5-users-***@gem5.org> [mailto:gem5-users-***@gem5.org<mailto:gem5-users-***@gem5.org>] On Behalf Of Hamid Reza Khaleghzadeh
Sent: Tuesday, October 23, 2012 10:26 AM
To: gem5 users mailing list
Subject: Re: [gem5-users] gem5 versus MARSS
Thanks for your answer. Ruby is a module in GEM5 which simulate memory hierarchy. Suppose there is an application that its execution time is 20 ms on a real system. GEM5 simulate the application in about 15 min. Hos is MARSS86 simulation speed?
On Tue, Oct 23, 2012 at 5:27 PM, Payne, Benjamin <***@lps.umd.edu<mailto:***@lps.umd.edu>> wrote:
Hello,

I'm not familiar with what you are referring to by the ruby module - is that an addon for Gem5?

You have a good question, but how would I quantify the difference in simulation speeds between MARSS and Gem5? Is there an established benchmark to run?

Kindly,


Ben Payne

From: gem5-users-***@gem5.org<mailto:gem5-users-***@gem5.org> [mailto:gem5-users-***@gem5.org<mailto:gem5-users-***@gem5.org>] On Behalf Of Hamid Reza Khaleghzadeh
Sent: Tuesday, October 23, 2012 9:31 AM
To: gem5 users mailing list
Subject: Re: [gem5-users] gem5 versus MARSS

I have a question about MARSS. As you know GEM5 simulation speed with ruby module is very slow. May I know MARSS simulation speed?

Thanks
On Tue, Oct 23, 2012 at 2:26 AM, Andreas Hansson <***@arm.com<mailto:***@arm.com>> wrote:
Hi Benjamin,
The list is long.gem5 has (amongst other things):

a variety of CPU models that are orthogonal to the ISA, atomic for speed, in order and O3 for details uarch models

BSD license (thus both academia and companies involved and contributing)

full-system ready-to-run Android disk images and configurations, not just your average chip-multi-processor, but also heterogeneous application-processor-like systems with state-of-the-art CPU models

a very active (and large) user community


Ultimately using one or the other really depends on what problem it is you want to address.

Andreas

From: <Payne>, Benjamin <***@lps.umd.edu<mailto:***@lps.umd.edu><mailto:***@lps.umd.edu<mailto:***@lps.umd.edu>>>
Reply-To: gem5 users mailing list <gem5-***@gem5.org<mailto:gem5-***@gem5.org><mailto:gem5-***@gem5.org<mailto:gem5-***@gem5.org>>>
Date: Monday, 22 October 2012 22:06
To: "gem5-***@gem5.org<mailto:gem5-***@gem5.org><mailto:gem5-***@gem5.org<mailto:gem5-***@gem5.org>>" <gem5-***@gem5.org<mailto:gem5-***@gem5.org><mailto:gem5-***@gem5.org<mailto:gem5-***@gem5.org>>>
Subject: [gem5-users] gem5 versus MARSS

Hello,

What is the difference between gem5
http://gem5.org/Main_Page
and MARSS (Micro-ARchitectural and System Simulator for x86-based Systems)
http://marss86.org/~marss86/index.php/Home

As far as I can tell,
-gem5 can support Alpha, ARM, SPARC, and x86 instruction set architectures, whereas MARSS is only for x86.
-gem5 can be integrated into Structural Simulation Toolkit, whereas MARSS has not been
-both gem5 and MARSS can simulate multiple cores
-both gem5 and MARSS can use DRAMSim2

Please correct me if any of these statements are incorrect.

Are there any other considerations?

Thank you,


Ben Payne
http://mst.edu/~bhpxc9/
Suite 450, Room S452
5520 Research Park Drive
Catonsville, MD 21228-4870
Laboratory for Physical Sciences
http://www.lps.umd.edu/
office: 443-654-7890<tel:443-654-7890>
cell: 608-308-2413<tel:608-308-2413>
-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

_______________________________________________
gem5-users mailing list
gem5-***@gem5.org<mailto:gem5-***@gem5.org>
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users



--
Hamid Reza Khaleghzadeh
http://hkhaleghzadeh.webs.com


_______________________________________________
gem5-users mailing list
gem5-***@gem5.org<mailto:gem5-***@gem5.org>
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users



--
Hamid Reza Khaleghzadeh
http://hkhaleghzadeh.webs.com


_______________________________________________
gem5-users mailing list
gem5-***@gem5.org<mailto:gem5-***@gem5.org>
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
Paul Rosenfeld
2012-10-23 19:49:48 UTC
Permalink
As someone who has used (and tried to modify) both marssx86 and gem5, I
would like to add one (potential) benefit to the marssx86 side of things:
the emulation mode (via QEMU) allows you to boot the system very quickly up
to a region of interest and take a checkpoint right before the simulation
launch point. From what I understand, the full system boot process is much
slower in gem5 even if booting with a simple CPU model and then picking up
with the O3 model.

Additionally, since you mentioned DRAMSim2, it might be worthwhile to note
that even with DRAMSim2, gem5 does not (currently) support putting
backpressure on the CPU (see:
http://www.mail-archive.com/gem5-***@gem5.org/msg03792.html), which,
depending on what you're trying to do, this may or may not be an important
consideration.

Finally, one last thing to add to your list, marss doesn't support any kind
of syscall emulation mode (i.e. you have to run full system mode all the
time) whereas gem5 does.

-Paul
Hello,****
** **
What is the difference between gem5****
http://gem5.org/Main_Page****
and MARSS (Micro-ARchitectural and System Simulator for x86-based Systems)
****
http://marss86.org/~marss86/index.php/Home****
** **
As far as I can tell, ****
-gem5 can support Alpha, ARM, SPARC, and x86 instruction set
architectures, whereas MARSS is only for x86. ****
-gem5 can be integrated into Structural Simulation Toolkit, whereas MARSS
has not been****
-both gem5 and MARSS can simulate multiple cores****
-both gem5 and MARSS can use DRAMSim2****
** **
Please correct me if any of these statements are incorrect. ****
** **
Are there any other considerations?****
** **
Thank you,****
** **
** **
Ben Payne****
http://mst.edu/~bhpxc9/****
Suite 450, Room S452****
5520 Research Park Drive****
Catonsville, MD 21228-4870****
Laboratory for Physical Sciences****
http://www.lps.umd.edu/****
office: 443-654-7890****
cell: 608-308-2413****
** **
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
Steve Reinhardt
2012-10-23 23:48:54 UTC
Permalink
Just a clarification on Paul's second point: the issue described in the
email he's linked to strictly refers to backpressure for TLB misses (i.e.,
pagetable walks). There definitely is backpressure on the CPU for regular
memory accesses.

Steve
Post by Paul Rosenfeld
As someone who has used (and tried to modify) both marssx86 and gem5, I
the emulation mode (via QEMU) allows you to boot the system very quickly up
to a region of interest and take a checkpoint right before the simulation
launch point. From what I understand, the full system boot process is much
slower in gem5 even if booting with a simple CPU model and then picking up
with the O3 model.
Additionally, since you mentioned DRAMSim2, it might be worthwhile to note
that even with DRAMSim2, gem5 does not (currently) support putting
depending on what you're trying to do, this may or may not be an important
consideration.
Finally, one last thing to add to your list, marss doesn't support any
kind of syscall emulation mode (i.e. you have to run full system mode all
the time) whereas gem5 does.
-Paul
Hello,****
** **
What is the difference between gem5****
http://gem5.org/Main_Page****
and MARSS (Micro-ARchitectural and System Simulator for x86-based Systems)
****
http://marss86.org/~marss86/index.php/Home****
** **
As far as I can tell, ****
-gem5 can support Alpha, ARM, SPARC, and x86 instruction set
architectures, whereas MARSS is only for x86. ****
-gem5 can be integrated into Structural Simulation Toolkit, whereas MARSS
has not been****
-both gem5 and MARSS can simulate multiple cores****
-both gem5 and MARSS can use DRAMSim2****
** **
Please correct me if any of these statements are incorrect. ****
** **
Are there any other considerations?****
** **
Thank you,****
** **
** **
Ben Payne****
http://mst.edu/~bhpxc9/****
Suite 450, Room S452****
5520 Research Park Drive****
Catonsville, MD 21228-4870****
Laboratory for Physical Sciences****
http://www.lps.umd.edu/****
office: 443-654-7890****
cell: 608-308-2413****
** **
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
Paul Rosenfeld
2012-10-24 00:33:03 UTC
Permalink
Ah, I see. Sorry, I misunderstood that point.
Post by Steve Reinhardt
Just a clarification on Paul's second point: the issue described in the
email he's linked to strictly refers to backpressure for TLB misses (i.e.,
pagetable walks). There definitely is backpressure on the CPU for regular
memory accesses.
Steve
Post by Paul Rosenfeld
As someone who has used (and tried to modify) both marssx86 and gem5, I
the emulation mode (via QEMU) allows you to boot the system very quickly up
to a region of interest and take a checkpoint right before the simulation
launch point. From what I understand, the full system boot process is much
slower in gem5 even if booting with a simple CPU model and then picking up
with the O3 model.
Additionally, since you mentioned DRAMSim2, it might be worthwhile to
note that even with DRAMSim2, gem5 does not (currently) support putting
depending on what you're trying to do, this may or may not be an important
consideration.
Finally, one last thing to add to your list, marss doesn't support any
kind of syscall emulation mode (i.e. you have to run full system mode all
the time) whereas gem5 does.
-Paul
Hello,****
** **
What is the difference between gem5****
http://gem5.org/Main_Page****
and MARSS (Micro-ARchitectural and System Simulator for x86-based
Systems)****
http://marss86.org/~marss86/index.php/Home****
** **
As far as I can tell, ****
-gem5 can support Alpha, ARM, SPARC, and x86 instruction set
architectures, whereas MARSS is only for x86. ****
-gem5 can be integrated into Structural Simulation Toolkit, whereas
MARSS has not been****
-both gem5 and MARSS can simulate multiple cores****
-both gem5 and MARSS can use DRAMSim2****
** **
Please correct me if any of these statements are incorrect. ****
** **
Are there any other considerations?****
** **
Thank you,****
** **
** **
Ben Payne****
http://mst.edu/~bhpxc9/****
Suite 450, Room S452****
5520 Research Park Drive****
Catonsville, MD 21228-4870****
Laboratory for Physical Sciences****
http://www.lps.umd.edu/****
office: 443-654-7890****
cell: 608-308-2413****
** **
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
David Roberts
2013-01-03 18:19:00 UTC
Permalink
Hello,

Paul referred to gem5 not being able to put backpressure on the CPU from the main memory, citing this topic;
According to this post there is no limit to the number of concurrent page walks. Do page walks go through the main memory system and could reach DRAM by the normal mechanisms? Is it a true statement about lack of backpressure? It seems hard to believe because response latency is being modeled.

Thanks

Dave
As someone who has used (and tried to modify) both marssx86 and gem5, I would like to add one (potential) benefit to the marssx86 side of things: the emulation mode (via QEMU) allows you to boot the system very quickly up to a region of interest and take a checkpoint right before the simulation launch point. From what I understand, the full system boot process is much slower in gem5 even if booting with a simple CPU model and then picking up with the O3 model.
Finally, one last thing to add to your list, marss doesn't support any kind of syscall emulation mode (i.e. you have to run full system mode all the time) whereas gem5 does.
-Paul
Hello,
What is the difference between gem5
http://gem5.org/Main_Page
and MARSS (Micro-ARchitectural and System Simulator for x86-based Systems)
http://marss86.org/~marss86/index.php/Home
As far as I can tell,
-gem5 can support Alpha, ARM, SPARC, and x86 instruction set architectures, whereas MARSS is only for x86.
-gem5 can be integrated into Structural Simulation Toolkit, whereas MARSS has not been
-both gem5 and MARSS can simulate multiple cores
-both gem5 and MARSS can use DRAMSim2
Please correct me if any of these statements are incorrect.
Are there any other considerations?
Thank you,
Ben Payne
http://mst.edu/~bhpxc9/
Suite 450, Room S452
5520 Research Park Drive
Catonsville, MD 21228-4870
Laboratory for Physical Sciences
http://www.lps.umd.edu/
office: 443-654-7890
cell: 608-308-2413
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
Ali Saidi
2013-01-03 18:51:33 UTC
Permalink
Hi Dave,

There isn't a limit on the number of pending page walks.
At least for ARM only a single walk can be active at a time. Because the
walk does take time, back pressure is applied to the CPU, but the CPU
can request as many as it would like. Since this original posting, a bug
was fixed in which instructions that were squashed with a pending walk
would still have that walk occur. This has been fixed and the number of
pending walks is rather small and the original bug has been addressed.


Thanks,

Ali
Post by Payne, Benjamin
Hello,
Paul referred to gem5 not being able to put backpressure on the
CPU from the main memory, citing this topic;
http://www.mail-archive.com/gem5-***@gem5.org/msg03792.html [1]
According to this post there is no limit to the number of concurrent
page walks. Do page walks go through the main memory system and could
reach DRAM by the normal mechanisms? Is it a true statement about lack
of backpressure? It seems hard to believe because response latency is
being modeled.
Post by Payne, Benjamin
Thanks
Dave
On Oct 23, 2012, at 12:49
Post by Paul Rosenfeld
As someone who has used (and tried to
modify) both marssx86 and gem5, I would like to add one (potential)
benefit to the marssx86 side of things: the emulation mode (via QEMU)
allows you to boot the system very quickly up to a region of interest
and take a checkpoint right before the simulation launch point. From
what I understand, the full system boot process is much slower in gem5
even if booting with a simple CPU model and then picking up with the O3
model.
Post by Payne, Benjamin
Post by Paul Rosenfeld
Additionally, since you mentioned DRAMSim2, it might be
worthwhile to note that even with DRAMSim2, gem5 does not (currently)
support putting backpressure on the CPU (see:
http://www.mail-archive.com/gem5-***@gem5.org/msg03792.html [1]),
which, depending on what you're trying to do, this may or may not be an
important consideration.
Post by Payne, Benjamin
Post by Paul Rosenfeld
Finally, one last thing to add to your
list, marss doesn't support any kind of syscall emulation mode (i.e. you
have to run full system mode all the time) whereas gem5 does.
-Paul
Post by Payne, Benjamin
Post by Paul Rosenfeld
On Mon, Oct 22, 2012 at 5:06 PM, Payne, Benjamin
Post by Payne, Benjamin
Hello,
What is the
difference between gem5
Post by Payne, Benjamin
Post by Paul Rosenfeld
Post by Payne, Benjamin
http://gem5.org/Main_Page [2]
and MARSS (Micro-ARchitectural and System Simulator for x86-based Systems)
http://marss86.org/~marss86/index.php/Home [3]
As far as I can tell,
-gem5 can support Alpha, ARM,
SPARC, and x86 instruction set architectures, whereas MARSS is only for
x86.
Post by Payne, Benjamin
Post by Paul Rosenfeld
Post by Payne, Benjamin
-gem5 can be integrated into Structural Simulation
Toolkit, whereas MARSS has not been
Post by Payne, Benjamin
Post by Paul Rosenfeld
Post by Payne, Benjamin
-both gem5 and MARSS can
simulate multiple cores
Post by Payne, Benjamin
Post by Paul Rosenfeld
Post by Payne, Benjamin
-both gem5 and MARSS can use DRAMSim2
Please correct me if any of these statements are incorrect.
Are there any other considerations?
Thank you,
Ben Payne
http://mst.edu/~bhpxc9/ [4]
Suite
450, Room S452
Post by Payne, Benjamin
Post by Paul Rosenfeld
Post by Payne, Benjamin
5520 Research Park Drive
Catonsville,
MD 21228-4870
Post by Payne, Benjamin
Post by Paul Rosenfeld
Post by Payne, Benjamin
Laboratory for Physical Sciences
http://www.lps.umd.edu/ [5]
Post by Payne, Benjamin
Post by Paul Rosenfeld
Post by Payne, Benjamin
office: 443-654-7890 [6]
cell: 608-308-2413 [7]
_______________________________________________
Post by Payne, Benjamin
Post by Paul Rosenfeld
Post by Payne, Benjamin
gem5-users mailing
list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users [8]
_______________________________________________
Post by Payne, Benjamin
Post by Paul Rosenfeld
gem5-users mailing
list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
_______________________________________________
Post by Payne, Benjamin
gem5-users mailing
list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users [8]




Links:
------
[1]
http://www.mail-archive.com/gem5-***@gem5.org/msg03792.html
[2]
http://gem5.org/Main_Page
[3]
http://marss86.org/~marss86/index.php/Home
[4]
http://mst.edu/~bhpxc9/
[5] http://www.lps.umd.edu/
[6]
tel:443-654-7890
[7] tel:608-308-2413
[8]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
Paul Rosenfeld
2013-01-04 16:49:21 UTC
Permalink
Sorry about that; I hope that people finding that thread later on read the
followup post where I stand corrected.
Post by Ali Saidi
**
Hi Dave,
There isn't a limit on the number of pending page walks. At least for ARM
only a single walk can be active at a time. Because the walk does take
time, back pressure is applied to the CPU, but the CPU can request as many
as it would like. Since this original posting, a bug was fixed in which
instructions that were squashed with a pending walk would still have that
walk occur. This has been fixed and the number of pending walks is rather
small and the original bug has been addressed.
Thanks,
Ali
Hello,
Paul referred to gem5 not being able to put backpressure on the CPU from
the main memory, citing this topic;
According to this post there is no limit to the number of concurrent
page walks. Do page walks go through the main memory system and could
reach DRAM by the normal mechanisms? Is it a true statement about lack of
backpressure? It seems hard to believe because response latency is being
modeled.
Thanks
Dave
As someone who has used (and tried to modify) both marssx86 and gem5, I
the emulation mode (via QEMU) allows you to boot the system very quickly up
to a region of interest and take a checkpoint right before the simulation
launch point. From what I understand, the full system boot process is much
slower in gem5 even if booting with a simple CPU model and then picking up
with the O3 model.
Additionally, since you mentioned DRAMSim2, it might be worthwhile to note
that even with DRAMSim2, gem5 does not (currently) support putting
depending on what you're trying to do, this may or may not be an important
consideration.
Finally, one last thing to add to your list, marss doesn't support any
kind of syscall emulation mode (i.e. you have to run full system mode all
the time) whereas gem5 does.
-Paul
Post by Payne, Benjamin
Hello,
What is the difference between gem5
http://gem5.org/Main_Page
and MARSS (Micro-ARchitectural and System Simulator for x86-based Systems)
http://marss86.org/~marss86/index.php/Home
As far as I can tell,
-gem5 can support Alpha, ARM, SPARC, and x86 instruction set
architectures, whereas MARSS is only for x86.
-gem5 can be integrated into Structural Simulation Toolkit, whereas MARSS has not been
-both gem5 and MARSS can simulate multiple cores
-both gem5 and MARSS can use DRAMSim2
Please correct me if any of these statements are incorrect.
Are there any other considerations?
Thank you,
Ben Payne
http://mst.edu/~bhpxc9/
Suite 450, Room S452
5520 Research Park Drive
Catonsville, MD 21228-4870
Laboratory for Physical Sciences
http://www.lps.umd.edu/
office: 443-654-7890
cell: 608-308-2413
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
_______________________________________________
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
Loading...