Discussion:
Checkpointing PARSEC benchmark in m5.
Bhushan
2010-04-02 22:03:12 UTC
Permalink
Hi,
I'm trying to use the checkpoint feature in m5 for the benchmarks in the
PARSEC suite. In the first run, the checkpoint gets created and in the
second run when I try to run in detailed mode using the restore checkpoint
option, I get some errors.

first run - creating checkpoint - successful.
# ./build/ALPHA_FS/m5.opt ./configs/example/fs.py -n 1
--script=./scripts/blackscholes_64c_simdev_ckpts.rcS


second run - running in detailed mode:
#./build/ALPHA_FS/m5.opt ./configs/example/fs.py --detailed --caches
--l2cache *--checkpoint-restore=1* -n 1
..............
..............
Switch at curTick count:10000
info: Entering event queue @ 2254485270500. Starting simulation...
*m5.opt: build/ALPHA_FS/sim/simulate.cc:68: SimLoopExitEvent*
simulate(Tick): Assertion `curTick <= mainEventQueue.nextTick() && "event
scheduled in the past"' failed.
*Program aborted at cycle 2254485270500


The benchmarks in the PARSEC suite run fine if I do not use the
checkpointing feature.


Also, I have been trying to understand how exactly checkpointing is invoked?

How does m5 know from which part the ROI starts? Where does (in the scripts)
m5 create a checkpoint? If these questions sound repetitive, could anyone
point me to the mailing list discussions that explain checkpointing
(references to checkpointing in mail archive seem to explain specific cases
instead of the general working)?
--
Regards,
Bhushan
ef
2010-04-02 22:10:34 UTC
Permalink
A good explanation can be found here on ROI and checkpoints, also check out
parsec benchmark code :
http://www.cs.utexas.edu/~parsec_m5/TR-09-32.pdf
basically roi region is where all the parallel portion of execution is done,
exlcuding startup and exit code


Ive also heard, take this with a grain of salt running 64 cores and trying
to restore checkpoint is a bit buggy. I think other people have had similar
problems.
Post by Bhushan
Hi,
I'm trying to use the checkpoint feature in m5 for the benchmarks in the
PARSEC suite. In the first run, the checkpoint gets created and in the
second run when I try to run in detailed mode using the restore checkpoint
option, I get some errors.
first run - creating checkpoint - successful.
# ./build/ALPHA_FS/m5.opt ./configs/example/fs.py -n 1
--script=./scripts/blackscholes_64c_simdev_ckpts.rcS
#./build/ALPHA_FS/m5.opt ./configs/example/fs.py --detailed --caches
--l2cache *--checkpoint-restore=1* -n 1
..............
..............
Switch at curTick count:10000
*m5.opt: build/ALPHA_FS/sim/simulate.cc:68: SimLoopExitEvent*
simulate(Tick): Assertion `curTick <= mainEventQueue.nextTick() && "event
scheduled in the past"' failed.
*Program aborted at cycle 2254485270500
The benchmarks in the PARSEC suite run fine if I do not use the
checkpointing feature.
Also, I have been trying to understand how exactly checkpointing is invoked?
How does m5 know from which part the ROI starts? Where does (in the
scripts) m5 create a checkpoint? If these questions sound repetitive, could
anyone point me to the mailing list discussions that explain checkpointing
(references to checkpointing in mail archive seem to explain specific cases
instead of the general working)?
--
Regards,
Bhushan
_______________________________________________
m5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
Bhushan
2010-04-05 00:22:26 UTC
Permalink
Any thoughts on the crash that I have mentioned?
Post by ef
A good explanation can be found here on ROI and checkpoints, also check out
http://www.cs.utexas.edu/~parsec_m5/TR-09-32.pdf<http://www.cs.utexas.edu/%7Eparsec_m5/TR-09-32.pdf>
basically roi region is where all the parallel portion of execution is
done, exlcuding startup and exit code
Ive also heard, take this with a grain of salt running 64 cores and trying
to restore checkpoint is a bit buggy. I think other people have had similar
problems.
Post by Bhushan
Hi,
I'm trying to use the checkpoint feature in m5 for the benchmarks in the
PARSEC suite. In the first run, the checkpoint gets created and in the
second run when I try to run in detailed mode using the restore checkpoint
option, I get some errors.
first run - creating checkpoint - successful.
# ./build/ALPHA_FS/m5.opt ./configs/example/fs.py -n 1
--script=./scripts/blackscholes_64c_simdev_ckpts.rcS
#./build/ALPHA_FS/m5.opt ./configs/example/fs.py --detailed --caches
--l2cache *--checkpoint-restore=1* -n 1
..............
..............
Switch at curTick count:10000
*m5.opt: build/ALPHA_FS/sim/simulate.cc:68: SimLoopExitEvent*
simulate(Tick): Assertion `curTick <= mainEventQueue.nextTick() && "event
scheduled in the past"' failed.
*Program aborted at cycle 2254485270500
The benchmarks in the PARSEC suite run fine if I do not use the
checkpointing feature.
Also, I have been trying to understand how exactly checkpointing is invoked?
How does m5 know from which part the ROI starts? Where does (in the
scripts) m5 create a checkpoint? If these questions sound repetitive, could
anyone point me to the mailing list discussions that explain checkpointing
(references to checkpointing in mail archive seem to explain specific cases
instead of the general working)?
--
Regards,
Bhushan
_______________________________________________
m5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
_______________________________________________
m5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
--
Regards,
Bhushan
Joel Hestness
2010-04-05 17:07:12 UTC
Permalink
Hi Bhushan,
We haven't run across this issue in the context of checkpointing using our
infrastructure. However, I did have this problem when I was trying to
modify M5, and I believe its caused by scheduling issues in the event queue.
I have a couple questions:
1) Have you made any modifications to M5? If so, are they timing
related?
2) Are you using the same simulated system and kernel image?
Thanks,
Joel


---------- Forwarded message ----------
From: Mark Gebhart <***@cs.utexas.edu>
Date: Sun, Apr 4, 2010 at 8:24 PM
Subject: Re: [m5-users] Checkpointing PARSEC benchmark in m5.
To: Joel Hestness <***@gmail.com>


Hey Joel,

I haven't ever seen this error. It looks like from his commands that
is doing everything right so I am not sure why it is crashing.

Mark
Hey Mark,
Do you have any insights on this?
Thanks,
Joel
---------- Forwarded message ----------
Date: Sun, Apr 4, 2010 at 7:22 PM
Subject: Re: [m5-users] Checkpointing PARSEC benchmark in m5.
Any thoughts on the crash that I have mentioned?
Post by ef
A good explanation can be found here on ROI and checkpoints, also check
http://www.cs.utexas.edu/~parsec_m5/TR-09-32.pdf
basically roi region is where all the parallel portion of execution is
done, exlcuding startup and exit code
Ive also heard, take this with a grain of salt running 64 cores and trying
to restore checkpoint is a bit buggy. I think other people have had similar
problems.
Post by Bhushan
Hi,
I'm trying to use the checkpoint feature in m5 for the benchmarks in the
PARSEC suite. In the first run, the checkpoint gets created and in the
second run when I try to run in detailed mode using the restore checkpoint
option, I get some errors.
first run - creating checkpoint - successful.
# ./build/ALPHA_FS/m5.opt ./configs/example/fs.py -n 1
--script=./scripts/blackscholes_64c_simdev_ckpts.rcS
#./build/ALPHA_FS/m5.opt ./configs/example/fs.py --detailed --caches
--l2cache --checkpoint-restore=1 -n 1
..............
..............
Switch at curTick count:10000
m5.opt: build/ALPHA_FS/sim/simulate.cc:68: SimLoopExitEvent*
simulate(Tick): Assertion `curTick <= mainEventQueue.nextTick() && "event
scheduled in the past"' failed.
Program aborted at cycle 2254485270500
The benchmarks in the PARSEC suite run fine if I do not use the
checkpointing feature.
Also, I have been trying to understand how exactly checkpointing is invoked?
How does m5 know from which part the ROI starts? Where does (in the
scripts) m5 create a checkpoint? If these questions sound repetitive, could
anyone point me to the mailing list discussions that explain
checkpointing
Post by ef
Post by Bhushan
(references to checkpointing in mail archive seem to explain specific cases
instead of the general working)?
--
Regards,
Bhushan
_______________________________________________
m5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
_______________________________________________
m5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
--
Regards,
Bhushan
_______________________________________________
m5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
--
Joel Hestness
PhD Student, Computer Architecture
Dept. of Computer Science, University of Texas - Austin
http://www.cs.utexas.edu/~hestness
--
Joel Hestness
PhD Student, Computer Architecture
Dept. of Computer Science, University of Texas - Austin
http://www.cs.utexas.edu/~hestness
Bhushan
2010-04-05 18:09:21 UTC
Permalink
Post by Joel Hestness
Hi Bhushan,
We haven't run across this issue in the context of checkpointing using
our infrastructure. However, I did have this problem when I was trying to
modify M5, and I believe its caused by scheduling issues in the event queue.
1) Have you made any modifications to M5? If so, are they timing
related?
No, I haven't modified M5.
Post by Joel Hestness
2) Are you using the same simulated system and kernel image?
Yes, the kernel image and the parsec binaries are same as the one mentioned
in the tech report. Btw, fast forwarding works without any issues. Any idea
on how can localize the crash? Would unit testing help?

Thanks,
Post by Joel Hestness
Joel
---------- Forwarded message ----------
Date: Sun, Apr 4, 2010 at 8:24 PM
Subject: Re: [m5-users] Checkpointing PARSEC benchmark in m5.
Hey Joel,
I haven't ever seen this error. It looks like from his commands that
is doing everything right so I am not sure why it is crashing.
Mark
Hey Mark,
Do you have any insights on this?
Thanks,
Joel
---------- Forwarded message ----------
Date: Sun, Apr 4, 2010 at 7:22 PM
Subject: Re: [m5-users] Checkpointing PARSEC benchmark in m5.
Any thoughts on the crash that I have mentioned?
Post by ef
A good explanation can be found here on ROI and checkpoints, also check
http://www.cs.utexas.edu/~parsec_m5/TR-09-32.pdf<http://www.cs.utexas.edu/%7Eparsec_m5/TR-09-32.pdf>
basically roi region is where all the parallel portion of execution is
done, exlcuding startup and exit code
Ive also heard, take this with a grain of salt running 64 cores and
trying
Post by ef
to restore checkpoint is a bit buggy. I think other people have had
similar
Post by ef
problems.
Post by Bhushan
Hi,
I'm trying to use the checkpoint feature in m5 for the benchmarks in
the
Post by ef
Post by Bhushan
PARSEC suite. In the first run, the checkpoint gets created and in the
second run when I try to run in detailed mode using the restore
checkpoint
Post by ef
Post by Bhushan
option, I get some errors.
first run - creating checkpoint - successful.
# ./build/ALPHA_FS/m5.opt ./configs/example/fs.py -n 1
--script=./scripts/blackscholes_64c_simdev_ckpts.rcS
#./build/ALPHA_FS/m5.opt ./configs/example/fs.py --detailed --caches
--l2cache --checkpoint-restore=1 -n 1
..............
..............
Switch at curTick count:10000
m5.opt: build/ALPHA_FS/sim/simulate.cc:68: SimLoopExitEvent*
simulate(Tick): Assertion `curTick <= mainEventQueue.nextTick() &&
"event
Post by ef
Post by Bhushan
scheduled in the past"' failed.
Program aborted at cycle 2254485270500
The benchmarks in the PARSEC suite run fine if I do not use the
checkpointing feature.
Also, I have been trying to understand how exactly checkpointing is invoked?
How does m5 know from which part the ROI starts? Where does (in the
scripts) m5 create a checkpoint? If these questions sound repetitive,
could
Post by ef
Post by Bhushan
anyone point me to the mailing list discussions that explain
checkpointing
Post by ef
Post by Bhushan
(references to checkpointing in mail archive seem to explain specific
cases
Post by ef
Post by Bhushan
instead of the general working)?
--
Regards,
Bhushan
_______________________________________________
m5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
_______________________________________________
m5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
--
Regards,
Bhushan
_______________________________________________
m5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
--
Joel Hestness
PhD Student, Computer Architecture
Dept. of Computer Science, University of Texas - Austin
http://www.cs.utexas.edu/~hestness<http://www.cs.utexas.edu/%7Ehestness>
--
Joel Hestness
PhD Student, Computer Architecture
Dept. of Computer Science, University of Texas - Austin
http://www.cs.utexas.edu/~hestness <http://www.cs.utexas.edu/%7Ehestness>
_______________________________________________
m5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
--
Regards,
Bhushan
Loading...