Thanks a lot for the tips. I will give a try.
Post by Gutierrez, AnthonyYes, make sure all buffers are flushed, etc., before taking your
checkpoint you can call the âsyncâ command, which should be already
installed on the image. Youâll need to call sync before your commands to
halt and take a checkpoint.
http://gem5.org/BBench-gem5#Tips_for_Making_Your_Disk_Image_gem5_Friendly
-Tony
*Sent:* Thursday, July 19, 2018 12:00 PM
*Subject:* Re: [gem5-users] dacapo (java) benchmark suite encounters
"SIGSEGV" and "null exception" during timing mode (fs mode) after
restarting from a checkpoint
Hey Gutierrez,
"*sync* the disk image", do you mean making sure all disk modifications
are actually made on the disk (update to date) before taking the
checkpoint? How to do that?
I haven't tried to take a checkpoint with COW layer disabled and then
restart from that checkpoint before. All I have done is "ctrl+c" to stop
gem5 to take the checkpoint (--checkpoint-at-end); I rely on gem5 to take
care of all things that need to be checked when taking checkpoints.
Best,
Da Zhang
On Thu, Jul 19, 2018 at 2:36 PM Gutierrez, Anthony <
JIT was precisely the issue I was thinking was causing this. One thing may
be necessary, that is to ensure you *sync* the disk image before taking
your checkpoint.
gem5âs debug flags should help you identify something like a hang, for
example an ExecAll trace. A SyscallAll trace would most likely help you
understand better what the JIT is doing.
*Sent:* Thursday, July 19, 2018 11:15 AM
*Subject:* Re: [gem5-users] dacapo (java) benchmark suite encounters
"SIGSEGV" and "null exception" during timing mode (fs mode) after
restarting from a checkpoint
Thanks for the suggestions.
I have been trying a couple of solutions (I only test for a small subset
1. using TimingSimpleCPU: no segfaults
there are still segfaults
3. take checkpoints with JIT compiler disabled (20x slowdown): no segfaults
4. take checkpoints during atomic mode (without warming up JIT): no segfaults
5. take checkpoints with Java OOPs compress disabled: there are still segfaults
One thing that I can't tell is if the benchmark hangs since there is no
printing during the execution. Is there a statistic I can use to tell if
the benchmark hangs?
So far, all my experiments are running using 1CPU (even some benchmarks
are multithreading). I attempted to take some checkpoints with more CPUs
with KVM CPU. But unfortunately, I got some "rcu_sched self-detected stall
on CPU" issues. Any idea?
On Mon, Jul 16, 2018 at 5:47 PM Gutierrez, Anthony <
Da,
Do you encounter the segfault only when restoring from a checkpoint? That
is, if you do not use checkpoints can any DaCapo benchmark successfully
complete under one of the simple CPU models (and not just KVM CPU)?
If so, you may want to get a syscall trace (e.g., using strace) to see
what sorts of files the JVM is trying to read etc. Itâs possible that the
VM generates some files that it will read back later. If you use
checkpoints, due to the disk image COW layer, I do not believe any disk
updates are checkpointed, thus these files will not persist, which could
lead to some weird segfault issues. Not sure if this is happening in your
case, but it may be worth investigating.
I created some of the original Android disk images, and the original
DaCapo image, and at that time I would typically run the benchmarks thru
the FS mode and Atomic CPU once, with the COW layer disabled, in order to
generate the needed files on the disk image and have them persist. This was
entirely for performance, however, to prevent the VMs from regenerating the
same files for each run, but I can envision it causing issues during
runtime as well. In particular, it seems youâre code is faulting while
doing some XML serializing/deserializing, perhaps the xml file it is
looking for is gone?
Beyond that, assuming it is a real bug in gem5, I would recommend an
ExecAll trace to figure out why the instruction at that PC is faulting.
-Tony
Zhang
*Sent:* Monday, July 16, 2018 1:50 PM
*Subject:* Re: [gem5-users] dacapo (java) benchmark suite encounters
"SIGSEGV" and "null exception" during timing mode (fs mode) after
restarting from a checkpoint
Hey Jason,
There are a bunch of "warn: instruction 'prefetch_nta' unimplemented" in
atomic modes, during which the java benchmarks don't crash. However, there
is no these kind of warnings during timing mode. Does it imply that
unimplemented instructions don't cause the problem? Any clues or
suggestions to debug these problems?
best,
Da Zhang
Hello,
Are you seeing any warnings like "warn: Instruction XXX not implemented"?
There are many X86 SIMD instructions that are currently unimplemented. I
would bet that your application is using some of those instructions and
getting 0's as the output instead of the correct value.
The "right" way to solve this problem is to implement these instructions
(and we would really appreciate it if you contribute your fixes back on
https://gem5-review.googlesource.com. The other option is to recompile
your applications without SIMD extensions (e.g., -march=athlon64 or
whatever is the original x86-64 name in GCC). However, this likely requires
compiling all of the java runtime in your case.
Cheers,
Jason
To clarify, "SIGSEGV and null exceptions " happens to the benchmark
suite, not gem5. Gem5 is running without errors. But in the
system.pc.com_1.device files, I observe that most of the benchmarks crash
due to SIGSEGV or null exceptions.
"
x/system.pc.com_1.device
buffers
1 #
3 #
4 # SIGSEGV (0xb) at pc=0x00007f81d17742b7, pid=1474,
tid=0x00007f81cf46d700
5 #
6 # JRE version: Java(TM) SE Runtime Environment (8.0_171-b11) (build 1.8.0_171-b11)
7 # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.171-b11 mixed mode
linux-amd64 compressed oops)
9 # J 1815 C2
org.apache.xml.serializer.ToHTMLStream.endElement(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;)V
10 #
11 #
"
Hey guys,
I am testing a java benchmark suite, dacapo, on gem5 with fs mode.
Unfortunately, I encounter a lot of SIGSEGV and null exceptions during
timing mode after restarting from the checkpoints.
I am using linux kernel v4.8.13 and ubuntu-server-16.04.1 with
oracle jdk v8.0_171-b11. To eliminate the influence of my modifications to
gem5 src/ and configs/, I re-download gem5 and checkout to commit
"ee2ffdc0fdb489767768e5273a4ccd7b51735c7c", which is the gem5 version I am
working on. The checkpoint was taken by using kvm cpu with 1 CPU and 16GB
memory. For the simulation, I use build/X86/gem5.opt (in order to enable
assertions) with fs mode (configs/example/fs.py). Other options include
"--cpu-type=DerivO3CPU -n 1 --mem-size=16GB --caches --l2cache
--l2_size=${L2SIZE}" (I try L2SIZE from 256KB to 8MB). I test with 100ms
warmup and 1ps real simulation time. There are no errors presented. But
with longer real simulation time, the benchmark suite crashes with
segfault.
I am able to run the dacapo benchmark suite in fs mode with kvm cpu,
without any segfaults or exceptions. I have some simple java benchmarks
tested; neither segfaults nor exceptions present.
Does anyone have suggestions or experience against these issues?
best,
Da Zhang
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users