Discussion:
Monitor and dump the cache state during full system simulation
(too old to reply)
Shuai Wang
2016-12-30 04:38:06 UTC
Permalink
Dear list,


I am using the full-system simulation of gem5 to analyze the cache access
of some x86 binary code. I have been able to add a monitor between the CPU
and the L1 data cache to track all the cache access when executing the
binary code on the simulated OS.

Currently, I am thinking to go one step further and dump the cache state
during the execution of the binary code. After a quick search online, I am
unable to find some useful information, and I am wondering if it is
actually possible to do so..?

Could anyone provide some pointers regarding this task? Thank you in
advance!

Sincerely,
Shuai
Jason Lowe-Power
2017-01-02 16:01:29 UTC
Permalink
Hi Shuai,

There is currently nothing built into gem5 to dump the cache state (unless
you're using Ruby in which case you can look at the code to take a
checkpoint in the RubySystem class and the CacheTrace class). However, it
should be pretty simple to dump the data in the classic caches. You would
need to get a pointer to all of the caches, then add a function to the
Cache class that dumps the data. You may be able to leverage the DDUMP
macro which formats data in a reasonable way. Or, if you're only going to
be using code to consume the output, you can look into the protobuf support
in gem5 for dumping/consuming data.

Cheers,
Jason
Post by Shuai Wang
Dear list,
I am using the full-system simulation of gem5 to analyze the cache access
of some x86 binary code. I have been able to add a monitor between the CPU
and the L1 data cache to track all the cache access when executing the
binary code on the simulated OS.
Currently, I am thinking to go one step further and dump the cache state
during the execution of the binary code. After a quick search online, I am
unable to find some useful information, and I am wondering if it is
actually possible to do so..?
Could anyone provide some pointers regarding this task? Thank you in
advance!
Sincerely,
Shuai
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
--
Jason
Shuai Wang
2017-01-03 16:21:15 UTC
Permalink
Dear Jason,

Thank you so much for your reply. Could you please elaborate more on how to
"implement a function in Caches.py to dump the data"? As far as I can see,
there are only some cache parameters defined in this scripts.. I really
have no idea how should I bridge the code there with the runtime cache
state (my focus is the L1 D Cache)...

I am not a system person and I am sincerely sorry if it is actually quite
obvious... Thank you so much in advance!

Sincerely,
Shuai
Post by Jason Lowe-Power
Hi Shuai,
There is currently nothing built into gem5 to dump the cache state (unless
you're using Ruby in which case you can look at the code to take a
checkpoint in the RubySystem class and the CacheTrace class). However, it
should be pretty simple to dump the data in the classic caches. You would
need to get a pointer to all of the caches, then add a function to the
Cache class that dumps the data. You may be able to leverage the DDUMP
macro which formats data in a reasonable way. Or, if you're only going to
be using code to consume the output, you can look into the protobuf support
in gem5 for dumping/consuming data.
Cheers,
Jason
Post by Shuai Wang
Dear list,
I am using the full-system simulation of gem5 to analyze the cache access
of some x86 binary code. I have been able to add a monitor between the CPU
and the L1 data cache to track all the cache access when executing the
binary code on the simulated OS.
Currently, I am thinking to go one step further and dump the cache state
during the execution of the binary code. After a quick search online, I am
unable to find some useful information, and I am wondering if it is
actually possible to do so..?
Could anyone provide some pointers regarding this task? Thank you in
advance!
Sincerely,
Shuai
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
--
Jason
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
Majid Namaki Shoushtari
2017-01-03 20:12:14 UTC
Permalink
Hi Shuai,

I don't think Jason meant that you need to add a function to Caches.py. You
will need to add something to the C++ class (src/mem/cache/cache.hh/cc).

I'm not sure what kind of information you need to dump, but basically all
of the incoming requests from CPU are received here:
"Cache::CpuSidePort::recvTimingReq(PacketPtr pkt)"
and all of the responses to CPU are happening anywhere there is a call to:
"cpuSidePort->schedTimingResp". There is currently four places that
responses to CPU are scheduled. If you read the code, it's relatively easy
to figure out which call site covers what condition (hit, miss, uncacheable
access, etc).

If you need to dump this information for one (some) specific cache(s) only,
one way of doing it is to pass a boolean variable and make it conditional
based on the value of that variable. For that you will need to add the
variable to Caches.py and possibly CacheConfig.py.

Cheers,
Majid
Post by Shuai Wang
Dear Jason,
Thank you so much for your reply. Could you please elaborate more on how
to "implement a function in Caches.py to dump the data"? As far as I can
see, there are only some cache parameters defined in this scripts.. I
really have no idea how should I bridge the code there with the runtime
cache state (my focus is the L1 D Cache)...
I am not a system person and I am sincerely sorry if it is actually quite
obvious... Thank you so much in advance!
Sincerely,
Shuai
Post by Jason Lowe-Power
Hi Shuai,
There is currently nothing built into gem5 to dump the cache state
(unless you're using Ruby in which case you can look at the code to take a
checkpoint in the RubySystem class and the CacheTrace class). However, it
should be pretty simple to dump the data in the classic caches. You would
need to get a pointer to all of the caches, then add a function to the
Cache class that dumps the data. You may be able to leverage the DDUMP
macro which formats data in a reasonable way. Or, if you're only going to
be using code to consume the output, you can look into the protobuf support
in gem5 for dumping/consuming data.
Cheers,
Jason
Post by Shuai Wang
Dear list,
I am using the full-system simulation of gem5 to analyze the cache
access of some x86 binary code. I have been able to add a monitor between
the CPU and the L1 data cache to track all the cache access when executing
the binary code on the simulated OS.
Currently, I am thinking to go one step further and dump the cache state
during the execution of the binary code. After a quick search online, I am
unable to find some useful information, and I am wondering if it is
actually possible to do so..?
Could anyone provide some pointers regarding this task? Thank you in
advance!
Sincerely,
Shuai
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
--
Jason
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
--
Majid Namaki Shoushtari
PhD Candidate
Department of Computer Science
University of California, Irvine
Irvine, CA 92697
***@uci.edu
http://www.ics.uci.edu/~anamakis
Shuai Wang
2017-01-04 04:13:15 UTC
Permalink
Hi Majid,

Thank you so much for your detailed information. I strongly appreciate that.

I tried to update the code as you said, and it works fine to dump the
information in the C++ code. However, I am still confused to interpret the
"cache state" information at this step. Could you please take a look at the
following questions and shed some lights on it? Thank you!

The first question is that I can never observe a "cache miss":

So what I am basically doing right now, as you suggested, is to check the
conditions in the context of each "cpuSidePort->schedTimingResp" to decide
whether the current memory addressing leads to a hit or miss. However,
after running multiple test cases (including some small binaries and mediam
size GNU Coreutils binaries), all I can find is the "hit" (schedTimingResp
at line 742 of the cache.cc) and schedTimingResp at line 1454 of the
cache.cc. Basically I cannot find any "miss" (schedTimingResp at line 801
of the cache.cc). Am I missed anything here?

The second question is still about the interpretation of the cache state:

If I understood correctly, given a N-bit memory address, it is dissected
into the following three parts in a memory access:

[image: Inline image 2]

The "set index" is used to locate the cache set in which the data may be
stored, and the tag is used to confirm that the data currently indeed
presents in one of the cache lines in that cache set. In other words, I
understand that the "cache state" (hit; miss; etc.) should be associated
with each cache set regarding every memory addressing.

Given the above context, I would like to confirm that the captured
"hit/miss" surely represents the cache state of the accessed cache set. Or
it is actually something towards the cache lines?

Am I clear on this? Any suggestion and advice would be appreciated! Thank
you!
Sincerely,
Shuai
Post by Jason Lowe-Power
Hi Shuai,
I don't think Jason meant that you need to add a function to Caches.py.
You will need to add something to the C++ class (src/mem/cache/cache.hh/cc).
I'm not sure what kind of information you need to dump, but basically all
"Cache::CpuSidePort::recvTimingReq(PacketPtr pkt)"
"cpuSidePort->schedTimingResp". There is currently four places that
responses to CPU are scheduled. If you read the code, it's relatively easy
to figure out which call site covers what condition (hit, miss, uncacheable
access, etc).
If you need to dump this information for one (some) specific cache(s)
only, one way of doing it is to pass a boolean variable and make it
conditional based on the value of that variable. For that you will need to
add the variable to Caches.py and possibly CacheConfig.py.
Cheers,
Majid
Post by Shuai Wang
Dear Jason,
Thank you so much for your reply. Could you please elaborate more on how
to "implement a function in Caches.py to dump the data"? As far as I can
see, there are only some cache parameters defined in this scripts.. I
really have no idea how should I bridge the code there with the runtime
cache state (my focus is the L1 D Cache)...
I am not a system person and I am sincerely sorry if it is actually quite
obvious... Thank you so much in advance!
Sincerely,
Shuai
Post by Jason Lowe-Power
Hi Shuai,
There is currently nothing built into gem5 to dump the cache state
(unless you're using Ruby in which case you can look at the code to take a
checkpoint in the RubySystem class and the CacheTrace class). However, it
should be pretty simple to dump the data in the classic caches. You would
need to get a pointer to all of the caches, then add a function to the
Cache class that dumps the data. You may be able to leverage the DDUMP
macro which formats data in a reasonable way. Or, if you're only going to
be using code to consume the output, you can look into the protobuf support
in gem5 for dumping/consuming data.
Cheers,
Jason
Post by Shuai Wang
Dear list,
I am using the full-system simulation of gem5 to analyze the cache
access of some x86 binary code. I have been able to add a monitor between
the CPU and the L1 data cache to track all the cache access when executing
the binary code on the simulated OS.
Currently, I am thinking to go one step further and dump the cache
state during the execution of the binary code. After a quick search online,
I am unable to find some useful information, and I am wondering if it is
actually possible to do so..?
Could anyone provide some pointers regarding this task? Thank you in
advance!
Sincerely,
Shuai
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
--
Jason
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
--
Majid Namaki Shoushtari
PhD Candidate
Department of Computer Science
University of California, Irvine
Irvine, CA 92697
http://www.ics.uci.edu/~anamakis
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
Shuai Wang
2017-01-04 15:23:07 UTC
Permalink
Besides, while my instrumented code in the "schedTimingResp" function works
well when leveraging the system call mode, I find the "schedTimingResp"
function is never executed in the full-system simulation mode. Am I missed
anything here?

This is the command I use:

./build/X86/gem5.opt --debug-flags=CacheDebug
--debug-file=cacheDebug.out.gz configs/example/fs.py
--disk-image=/home/test/work/x86_full_system/disks/linux-x86.img
--kernel=/home/test/work/x86_full_system/binaries/x86_64-vmlinux-3.2.1
--caches

By inserting some printf at the beginning of function "schedTimingResp ", I
am pretty sure this function is never invoked...
Post by Shuai Wang
Hi Majid,
Thank you so much for your detailed information. I strongly appreciate that.
I tried to update the code as you said, and it works fine to dump the
information in the C++ code. However, I am still confused to interpret the
"cache state" information at this step. Could you please take a look at the
following questions and shed some lights on it? Thank you!
So what I am basically doing right now, as you suggested, is to check the
conditions in the context of each "cpuSidePort->schedTimingResp" to
decide whether the current memory addressing leads to a hit or miss.
However, after running multiple test cases (including some small binaries
and mediam size GNU Coreutils binaries), all I can find is the "hit" (schedTimingResp
at line 742 of the cache.cc) and schedTimingResp at line 1454 of the
cache.cc. Basically I cannot find any "miss" (schedTimingResp at line 801
of the cache.cc). Am I missed anything here?
If I understood correctly, given a N-bit memory address, it is dissected
[image: Inline image 2]
The "set index" is used to locate the cache set in which the data may be
stored, and the tag is used to confirm that the data currently indeed
presents in one of the cache lines in that cache set. In other words, I
understand that the "cache state" (hit; miss; etc.) should be associated
with each cache set regarding every memory addressing.
Given the above context, I would like to confirm that the captured
"hit/miss" surely represents the cache state of the accessed cache set. Or
it is actually something towards the cache lines?
Am I clear on this? Any suggestion and advice would be appreciated! Thank
you!
Sincerely,
Shuai
Post by Jason Lowe-Power
Hi Shuai,
I don't think Jason meant that you need to add a function to Caches.py.
You will need to add something to the C++ class (src/mem/cache/cache.hh/cc).
I'm not sure what kind of information you need to dump, but basically all
"Cache::CpuSidePort::recvTimingReq(PacketPtr pkt)"
and all of the responses to CPU are happening anywhere there is a call
to: "cpuSidePort->schedTimingResp". There is currently four places that
responses to CPU are scheduled. If you read the code, it's relatively easy
to figure out which call site covers what condition (hit, miss, uncacheable
access, etc).
If you need to dump this information for one (some) specific cache(s)
only, one way of doing it is to pass a boolean variable and make it
conditional based on the value of that variable. For that you will need to
add the variable to Caches.py and possibly CacheConfig.py.
Cheers,
Majid
Post by Shuai Wang
Dear Jason,
Thank you so much for your reply. Could you please elaborate more on how
to "implement a function in Caches.py to dump the data"? As far as I can
see, there are only some cache parameters defined in this scripts.. I
really have no idea how should I bridge the code there with the runtime
cache state (my focus is the L1 D Cache)...
I am not a system person and I am sincerely sorry if it is actually
quite obvious... Thank you so much in advance!
Sincerely,
Shuai
Post by Jason Lowe-Power
Hi Shuai,
There is currently nothing built into gem5 to dump the cache state
(unless you're using Ruby in which case you can look at the code to take a
checkpoint in the RubySystem class and the CacheTrace class). However, it
should be pretty simple to dump the data in the classic caches. You would
need to get a pointer to all of the caches, then add a function to the
Cache class that dumps the data. You may be able to leverage the DDUMP
macro which formats data in a reasonable way. Or, if you're only going to
be using code to consume the output, you can look into the protobuf support
in gem5 for dumping/consuming data.
Cheers,
Jason
Post by Shuai Wang
Dear list,
I am using the full-system simulation of gem5 to analyze the cache
access of some x86 binary code. I have been able to add a monitor between
the CPU and the L1 data cache to track all the cache access when executing
the binary code on the simulated OS.
Currently, I am thinking to go one step further and dump the cache
state during the execution of the binary code. After a quick search online,
I am unable to find some useful information, and I am wondering if it is
actually possible to do so..?
Could anyone provide some pointers regarding this task? Thank you in
advance!
Sincerely,
Shuai
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
--
Jason
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
--
Majid Namaki Shoushtari
PhD Candidate
Department of Computer Science
University of California, Irvine
Irvine, CA 92697
http://www.ics.uci.edu/~anamakis
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
Shuai Wang
2017-01-04 15:52:21 UTC
Permalink
Sorry, what I mean is function "recvTimingReq", not function "
schedTimingResp"...
Post by Shuai Wang
Besides, while my instrumented code in the "schedTimingResp" function
works well when leveraging the system call mode, I find the "schedTimingResp"
function is never executed in the full-system simulation mode. Am I missed
anything here?
./build/X86/gem5.opt --debug-flags=CacheDebug
--debug-file=cacheDebug.out.gz configs/example/fs.py
--disk-image=/home/test/work/x86_full_system/disks/linux-x86.img
--kernel=/home/test/work/x86_full_system/binaries/x86_64-vmlinux-3.2.1
--caches
By inserting some printf at the beginning of function "schedTimingResp ", I
am pretty sure this function is never invoked...
Post by Shuai Wang
Hi Majid,
Thank you so much for your detailed information. I strongly appreciate that.
I tried to update the code as you said, and it works fine to dump the
information in the C++ code. However, I am still confused to interpret the
"cache state" information at this step. Could you please take a look at the
following questions and shed some lights on it? Thank you!
So what I am basically doing right now, as you suggested, is to check the
conditions in the context of each "cpuSidePort->schedTimingResp" to
decide whether the current memory addressing leads to a hit or miss.
However, after running multiple test cases (including some small binaries
and mediam size GNU Coreutils binaries), all I can find is the "hit" (schedTimingResp
at line 742 of the cache.cc) and schedTimingResp at line 1454 of the
cache.cc. Basically I cannot find any "miss" (schedTimingResp at line
801 of the cache.cc). Am I missed anything here?
If I understood correctly, given a N-bit memory address, it is
[image: Inline image 2]
The "set index" is used to locate the cache set in which the data may be
stored, and the tag is used to confirm that the data currently indeed
presents in one of the cache lines in that cache set. In other words, I
understand that the "cache state" (hit; miss; etc.) should be associated
with each cache set regarding every memory addressing.
Given the above context, I would like to confirm that the captured
"hit/miss" surely represents the cache state of the accessed cache set. Or
it is actually something towards the cache lines?
Am I clear on this? Any suggestion and advice would be appreciated! Thank
you!
Sincerely,
Shuai
Post by Jason Lowe-Power
Hi Shuai,
I don't think Jason meant that you need to add a function to Caches.py.
You will need to add something to the C++ class (src/mem/cache/cache.hh/cc).
I'm not sure what kind of information you need to dump, but basically
"Cache::CpuSidePort::recvTimingReq(PacketPtr pkt)"
and all of the responses to CPU are happening anywhere there is a call
to: "cpuSidePort->schedTimingResp". There is currently four places that
responses to CPU are scheduled. If you read the code, it's relatively easy
to figure out which call site covers what condition (hit, miss, uncacheable
access, etc).
If you need to dump this information for one (some) specific cache(s)
only, one way of doing it is to pass a boolean variable and make it
conditional based on the value of that variable. For that you will need to
add the variable to Caches.py and possibly CacheConfig.py.
Cheers,
Majid
Post by Shuai Wang
Dear Jason,
Thank you so much for your reply. Could you please elaborate more on
how to "implement a function in Caches.py to dump the data"? As far as I
can see, there are only some cache parameters defined in this scripts.. I
really have no idea how should I bridge the code there with the runtime
cache state (my focus is the L1 D Cache)...
I am not a system person and I am sincerely sorry if it is actually
quite obvious... Thank you so much in advance!
Sincerely,
Shuai
Post by Jason Lowe-Power
Hi Shuai,
There is currently nothing built into gem5 to dump the cache state
(unless you're using Ruby in which case you can look at the code to take a
checkpoint in the RubySystem class and the CacheTrace class). However, it
should be pretty simple to dump the data in the classic caches. You would
need to get a pointer to all of the caches, then add a function to the
Cache class that dumps the data. You may be able to leverage the DDUMP
macro which formats data in a reasonable way. Or, if you're only going to
be using code to consume the output, you can look into the protobuf support
in gem5 for dumping/consuming data.
Cheers,
Jason
Post by Shuai Wang
Dear list,
I am using the full-system simulation of gem5 to analyze the cache
access of some x86 binary code. I have been able to add a monitor between
the CPU and the L1 data cache to track all the cache access when executing
the binary code on the simulated OS.
Currently, I am thinking to go one step further and dump the cache
state during the execution of the binary code. After a quick search online,
I am unable to find some useful information, and I am wondering if it is
actually possible to do so..?
Could anyone provide some pointers regarding this task? Thank you in
advance!
Sincerely,
Shuai
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
--
Jason
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
--
Majid Namaki Shoushtari
PhD Candidate
Department of Computer Science
University of California, Irvine
Irvine, CA 92697
http://www.ics.uci.edu/~anamakis
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
Majid Namaki Shoushtari
2017-01-04 20:09:25 UTC
Permalink
Hi Shuai,

- Q1) I believe schedTimingResp at line 1454 in cache.cc handles a cache
miss not hit.
- Q2) I'm not sure if I understand your question. Your figure and
description of how cache is looked up seem fine to me. Hit or Miss happens
for every access. So I think you would say they are defined for cache
lines, If I want to use your terminology. If the cache line that contains
the data is present in cache, then that access is a hit, otherwise it is a
miss.
- Q3) I haven't dumped any cache data in full-system mode, but I expect
it to be similar to system-emulation mode. These two functions are
triggered when you use any of the timing CPU modes (timing, minor,
detailed). You need to specify that using "--cpu-type".

Good luck,
Majid
Post by Shuai Wang
Sorry, what I mean is function "recvTimingReq", not function "
schedTimingResp"...
Post by Shuai Wang
Besides, while my instrumented code in the "schedTimingResp" function
works well when leveraging the system call mode, I find the "schedTimingResp"
function is never executed in the full-system simulation mode. Am I missed
anything here?
./build/X86/gem5.opt --debug-flags=CacheDebug
--debug-file=cacheDebug.out.gz configs/example/fs.py
--disk-image=/home/test/work/x86_full_system/disks/linux-x86.img
--kernel=/home/test/work/x86_full_system/binaries/x86_64-vmlinux-3.2.1
--caches
By inserting some printf at the beginning of function "schedTimingResp
", I am pretty sure this function is never invoked...
Post by Shuai Wang
Hi Majid,
Thank you so much for your detailed information. I strongly appreciate that.
I tried to update the code as you said, and it works fine to dump the
information in the C++ code. However, I am still confused to interpret the
"cache state" information at this step. Could you please take a look at the
following questions and shed some lights on it? Thank you!
So what I am basically doing right now, as you suggested, is to check
the conditions in the context of each "cpuSidePort->schedTimingResp" to
decide whether the current memory addressing leads to a hit or miss.
However, after running multiple test cases (including some small binaries
and mediam size GNU Coreutils binaries), all I can find is the "hit" (schedTimingResp
at line 742 of the cache.cc) and schedTimingResp at line 1454 of the
cache.cc. Basically I cannot find any "miss" (schedTimingResp at line
801 of the cache.cc). Am I missed anything here?
If I understood correctly, given a N-bit memory address, it is
[image: Inline image 2]
The "set index" is used to locate the cache set in which the data may be
stored, and the tag is used to confirm that the data currently indeed
presents in one of the cache lines in that cache set. In other words, I
understand that the "cache state" (hit; miss; etc.) should be associated
with each cache set regarding every memory addressing.
Given the above context, I would like to confirm that the captured
"hit/miss" surely represents the cache state of the accessed cache set. Or
it is actually something towards the cache lines?
Am I clear on this? Any suggestion and advice would be appreciated!
Thank you!
Sincerely,
Shuai
On Tue, Jan 3, 2017 at 3:12 PM, Majid Namaki Shoushtari <
Post by Jason Lowe-Power
Hi Shuai,
I don't think Jason meant that you need to add a function to Caches.py.
You will need to add something to the C++ class (src/mem/cache/cache.hh/cc).
I'm not sure what kind of information you need to dump, but basically
"Cache::CpuSidePort::recvTimingReq(PacketPtr pkt)"
and all of the responses to CPU are happening anywhere there is a call
to: "cpuSidePort->schedTimingResp". There is currently four places
that responses to CPU are scheduled. If you read the code, it's relatively
easy to figure out which call site covers what condition (hit, miss,
uncacheable access, etc).
If you need to dump this information for one (some) specific cache(s)
only, one way of doing it is to pass a boolean variable and make it
conditional based on the value of that variable. For that you will need to
add the variable to Caches.py and possibly CacheConfig.py.
Cheers,
Majid
Post by Shuai Wang
Dear Jason,
Thank you so much for your reply. Could you please elaborate more on
how to "implement a function in Caches.py to dump the data"? As far as I
can see, there are only some cache parameters defined in this scripts.. I
really have no idea how should I bridge the code there with the runtime
cache state (my focus is the L1 D Cache)...
I am not a system person and I am sincerely sorry if it is actually
quite obvious... Thank you so much in advance!
Sincerely,
Shuai
Post by Jason Lowe-Power
Hi Shuai,
There is currently nothing built into gem5 to dump the cache state
(unless you're using Ruby in which case you can look at the code to take a
checkpoint in the RubySystem class and the CacheTrace class). However, it
should be pretty simple to dump the data in the classic caches. You would
need to get a pointer to all of the caches, then add a function to the
Cache class that dumps the data. You may be able to leverage the DDUMP
macro which formats data in a reasonable way. Or, if you're only going to
be using code to consume the output, you can look into the protobuf support
in gem5 for dumping/consuming data.
Cheers,
Jason
Post by Shuai Wang
Dear list,
I am using the full-system simulation of gem5 to analyze the cache
access of some x86 binary code. I have been able to add a monitor between
the CPU and the L1 data cache to track all the cache access when executing
the binary code on the simulated OS.
Currently, I am thinking to go one step further and dump the cache
state during the execution of the binary code. After a quick search online,
I am unable to find some useful information, and I am wondering if it is
actually possible to do so..?
Could anyone provide some pointers regarding this task? Thank you in
advance!
Sincerely,
Shuai
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
--
Jason
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
--
Majid Namaki Shoushtari
PhD Candidate
Department of Computer Science
University of California, Irvine
Irvine, CA 92697
http://www.ics.uci.edu/~anamakis
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
Jason Lowe-Power
2017-01-05 13:40:59 UTC
Permalink
Hi Shuai,

By default, se/fs.py use the atomic CPU, which performs atomic memory
accesses. This CPU/memory mode is used to fast-forward the simulation, and
does not accurately perform timing operations. All of the memory
requests/responses flow through the "atomic" functions (recvAtomic). You
should specify the CPU type on the command line to use timing mode (e.g.,
--cpu-type=timing or --cpu-type=detailed).

Cheers,
Jason
Post by Shuai Wang
Sorry, what I mean is function "recvTimingReq", not function "
schedTimingResp"...
Besides, while my instrumented code in the "schedTimingResp" function
works well when leveraging the system call mode, I find the "schedTimingResp"
function is never executed in the full-system simulation mode. Am I missed
anything here?
./build/X86/gem5.opt --debug-flags=CacheDebug
--debug-file=cacheDebug.out.gz configs/example/fs.py
--disk-image=/home/test/work/x86_full_system/disks/linux-x86.img
--kernel=/home/test/work/x86_full_system/binaries/x86_64-vmlinux-3.2.1
--caches
By inserting some printf at the beginning of function "schedTimingResp ", I
am pretty sure this function is never invoked...
Hi Majid,
Thank you so much for your detailed information. I strongly appreciate that.
I tried to update the code as you said, and it works fine to dump the
information in the C++ code. However, I am still confused to interpret the
"cache state" information at this step. Could you please take a look at the
following questions and shed some lights on it? Thank you!
So what I am basically doing right now, as you suggested, is to check the
conditions in the context of each "cpuSidePort->schedTimingResp" to
decide whether the current memory addressing leads to a hit or miss.
However, after running multiple test cases (including some small binaries
and mediam size GNU Coreutils binaries), all I can find is the "hit" (schedTimingResp
at line 742 of the cache.cc) and schedTimingResp at line 1454 of the
cache.cc. Basically I cannot find any "miss" (schedTimingResp at line 801
of the cache.cc). Am I missed anything here?
If I understood correctly, given a N-bit memory address, it is dissected
[image: Inline image 2]
The "set index" is used to locate the cache set in which the data may be
stored, and the tag is used to confirm that the data currently indeed
presents in one of the cache lines in that cache set. In other words, I
understand that the "cache state" (hit; miss; etc.) should be associated
with each cache set regarding every memory addressing.
Given the above context, I would like to confirm that the captured
"hit/miss" surely represents the cache state of the accessed cache set. Or
it is actually something towards the cache lines?
Am I clear on this? Any suggestion and advice would be appreciated! Thank
you!
Sincerely,
Shuai
Hi Shuai,
I don't think Jason meant that you need to add a function to Caches.py.
You will need to add something to the C++ class (src/mem/cache/cache.hh/cc).
I'm not sure what kind of information you need to dump, but basically all
"Cache::CpuSidePort::recvTimingReq(PacketPtr pkt)"
"cpuSidePort->schedTimingResp". There is currently four places that
responses to CPU are scheduled. If you read the code, it's relatively easy
to figure out which call site covers what condition (hit, miss, uncacheable
access, etc).
If you need to dump this information for one (some) specific cache(s)
only, one way of doing it is to pass a boolean variable and make it
conditional based on the value of that variable. For that you will need to
add the variable to Caches.py and possibly CacheConfig.py.
Cheers,
Majid
Dear Jason,
Thank you so much for your reply. Could you please elaborate more on how
to "implement a function in Caches.py to dump the data"? As far as I can
see, there are only some cache parameters defined in this scripts.. I
really have no idea how should I bridge the code there with the runtime
cache state (my focus is the L1 D Cache)...
I am not a system person and I am sincerely sorry if it is actually quite
obvious... Thank you so much in advance!
Sincerely,
Shuai
Hi Shuai,
There is currently nothing built into gem5 to dump the cache state (unless
you're using Ruby in which case you can look at the code to take a
checkpoint in the RubySystem class and the CacheTrace class). However, it
should be pretty simple to dump the data in the classic caches. You would
need to get a pointer to all of the caches, then add a function to the
Cache class that dumps the data. You may be able to leverage the DDUMP
macro which formats data in a reasonable way. Or, if you're only going to
be using code to consume the output, you can look into the protobuf support
in gem5 for dumping/consuming data.
Cheers,
Jason
Dear list,
I am using the full-system simulation of gem5 to analyze the cache access
of some x86 binary code. I have been able to add a monitor between the CPU
and the L1 data cache to track all the cache access when executing the
binary code on the simulated OS.
Currently, I am thinking to go one step further and dump the cache state
during the execution of the binary code. After a quick search online, I am
unable to find some useful information, and I am wondering if it is
actually possible to do so..?
Could anyone provide some pointers regarding this task? Thank you in
advance!
Sincerely,
Shuai
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
--
Jason
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
--
Majid Namaki Shoushtari
PhD Candidate
Department of Computer Science
University of California, Irvine
Irvine, CA 92697
http://www.ics.uci.edu/~anamakis
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
--
Jason
Shuai Wang
2017-01-05 21:57:47 UTC
Permalink
Hey Majid and Jason,


Thank you so much for these detailed information. I learned a lot from
them:) Now I am able to boot the full-system simulation and record the
cache status. I really appreciate your help!

While the current system works well on all the program binaries compiled
from C program, I am trapped in executing one C++ test case in the
full-system simulation mode...

My current configuration is: Kernel 3.2.0 + Ubuntu 12.04.4 64-bit. I
compiled the kernel with the gem5 provided configuration
file: linux-2.6.28.4.

When running the binary code, it throws an exception:

pure virtual method called
terminate called without an active exception
Aborted

Of course, this more seems like a source code-level bug, calling virtual
function inside the constructor or something. However, I tried the same
code on various physical machines (including a Ubuntu 12.04 64-bit with
kernel version 3.2.0 and a Ubuntu 12.04 64-bit with kernel version
3.8.0-44), and all of them work well.

Besides, although debugging on the simulated platform is too slow, I use
strace to dump the system call sequence when executing the static-linked
C++ binary code and checked the constructor/destructor functions around the
call sites; I haven't seen any suspicious code pieces so far.

I am starting to think, maybe some configurations of the simulated platform
lead to this issue.. Is there anything chance that any of this problem
looks familiar to you guys..? If so, could you shed some lights on it?
Thank you in advance!


Sincerely,
Shuai
Post by Jason Lowe-Power
Hi Shuai,
By default, se/fs.py use the atomic CPU, which performs atomic memory
accesses. This CPU/memory mode is used to fast-forward the simulation, and
does not accurately perform timing operations. All of the memory
requests/responses flow through the "atomic" functions (recvAtomic). You
should specify the CPU type on the command line to use timing mode (e.g.,
--cpu-type=timing or --cpu-type=detailed).
Cheers,
Jason
Post by Shuai Wang
Sorry, what I mean is function "recvTimingReq", not function "
schedTimingResp"...
Besides, while my instrumented code in the "schedTimingResp" function
works well when leveraging the system call mode, I find the "schedTimingResp"
function is never executed in the full-system simulation mode. Am I missed
anything here?
./build/X86/gem5.opt --debug-flags=CacheDebug
--debug-file=cacheDebug.out.gz configs/example/fs.py
--disk-image=/home/test/work/x86_full_system/disks/linux-x86.img
--kernel=/home/test/work/x86_full_system/binaries/x86_64-vmlinux-3.2.1
--caches
By inserting some printf at the beginning of function "schedTimingResp
", I am pretty sure this function is never invoked...
Hi Majid,
Thank you so much for your detailed information. I strongly appreciate that.
I tried to update the code as you said, and it works fine to dump the
information in the C++ code. However, I am still confused to interpret the
"cache state" information at this step. Could you please take a look at the
following questions and shed some lights on it? Thank you!
So what I am basically doing right now, as you suggested, is to check the
conditions in the context of each "cpuSidePort->schedTimingResp" to
decide whether the current memory addressing leads to a hit or miss.
However, after running multiple test cases (including some small binaries
and mediam size GNU Coreutils binaries), all I can find is the "hit" (schedTimingResp
at line 742 of the cache.cc) and schedTimingResp at line 1454 of the
cache.cc. Basically I cannot find any "miss" (schedTimingResp at line
801 of the cache.cc). Am I missed anything here?
If I understood correctly, given a N-bit memory address, it is
[image: Inline image 2]
The "set index" is used to locate the cache set in which the data may be
stored, and the tag is used to confirm that the data currently indeed
presents in one of the cache lines in that cache set. In other words, I
understand that the "cache state" (hit; miss; etc.) should be associated
with each cache set regarding every memory addressing.
Given the above context, I would like to confirm that the captured
"hit/miss" surely represents the cache state of the accessed cache set. Or
it is actually something towards the cache lines?
Am I clear on this? Any suggestion and advice would be appreciated! Thank
you!
Sincerely,
Shuai
Hi Shuai,
I don't think Jason meant that you need to add a function to Caches.py.
You will need to add something to the C++ class (src/mem/cache/cache.hh/cc).
I'm not sure what kind of information you need to dump, but basically all
of the incoming requests from CPU are received here: "Cache::CpuSidePort::recvTimingReq(PacketPtr
pkt)"
and all of the responses to CPU are happening anywhere there is a call
to: "cpuSidePort->schedTimingResp". There is currently four places that
responses to CPU are scheduled. If you read the code, it's relatively easy
to figure out which call site covers what condition (hit, miss, uncacheable
access, etc).
If you need to dump this information for one (some) specific cache(s)
only, one way of doing it is to pass a boolean variable and make it
conditional based on the value of that variable. For that you will need to
add the variable to Caches.py and possibly CacheConfig.py.
Cheers,
Majid
Dear Jason,
Thank you so much for your reply. Could you please elaborate more on how
to "implement a function in Caches.py to dump the data"? As far as I can
see, there are only some cache parameters defined in this scripts.. I
really have no idea how should I bridge the code there with the runtime
cache state (my focus is the L1 D Cache)...
I am not a system person and I am sincerely sorry if it is actually quite
obvious... Thank you so much in advance!
Sincerely,
Shuai
Hi Shuai,
There is currently nothing built into gem5 to dump the cache state
(unless you're using Ruby in which case you can look at the code to take a
checkpoint in the RubySystem class and the CacheTrace class). However, it
should be pretty simple to dump the data in the classic caches. You would
need to get a pointer to all of the caches, then add a function to the
Cache class that dumps the data. You may be able to leverage the DDUMP
macro which formats data in a reasonable way. Or, if you're only going to
be using code to consume the output, you can look into the protobuf support
in gem5 for dumping/consuming data.
Cheers,
Jason
Dear list,
I am using the full-system simulation of gem5 to analyze the cache access
of some x86 binary code. I have been able to add a monitor between the CPU
and the L1 data cache to track all the cache access when executing the
binary code on the simulated OS.
Currently, I am thinking to go one step further and dump the cache state
during the execution of the binary code. After a quick search online, I am
unable to find some useful information, and I am wondering if it is
actually possible to do so..?
Could anyone provide some pointers regarding this task? Thank you in
advance!
Sincerely,
Shuai
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
--
Jason
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
--
Majid Namaki Shoushtari
PhD Candidate
Department of Computer Science
University of California, Irvine
Irvine, CA 92697
http://www.ics.uci.edu/~anamakis
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
--
Jason
Shuai Wang
2017-01-05 22:19:37 UTC
Permalink
Some debugging shows that this can be actually traced back to the random
number generator utility of the test case I use. I will see what I can do
here.
Post by Shuai Wang
Hey Majid and Jason,
Thank you so much for these detailed information. I learned a lot from
them:) Now I am able to boot the full-system simulation and record the
cache status. I really appreciate your help!
While the current system works well on all the program binaries compiled
from C program, I am trapped in executing one C++ test case in the
full-system simulation mode...
My current configuration is: Kernel 3.2.0 + Ubuntu 12.04.4 64-bit. I
compiled the kernel with the gem5 provided configuration
file: linux-2.6.28.4.
pure virtual method called
terminate called without an active exception
Aborted
Of course, this more seems like a source code-level bug, calling virtual
function inside the constructor or something. However, I tried the same
code on various physical machines (including a Ubuntu 12.04 64-bit with
kernel version 3.2.0 and a Ubuntu 12.04 64-bit with kernel version
3.8.0-44), and all of them work well.
Besides, although debugging on the simulated platform is too slow, I use
strace to dump the system call sequence when executing the static-linked
C++ binary code and checked the constructor/destructor functions around the
call sites; I haven't seen any suspicious code pieces so far.
I am starting to think, maybe some configurations of the simulated
platform lead to this issue.. Is there anything chance that any of this
problem looks familiar to you guys..? If so, could you shed some lights on
it? Thank you in advance!
Sincerely,
Shuai
Post by Jason Lowe-Power
Hi Shuai,
By default, se/fs.py use the atomic CPU, which performs atomic memory
accesses. This CPU/memory mode is used to fast-forward the simulation, and
does not accurately perform timing operations. All of the memory
requests/responses flow through the "atomic" functions (recvAtomic). You
should specify the CPU type on the command line to use timing mode (e.g.,
--cpu-type=timing or --cpu-type=detailed).
Cheers,
Jason
Post by Shuai Wang
Sorry, what I mean is function "recvTimingReq", not function "
schedTimingResp"...
Besides, while my instrumented code in the "schedTimingResp" function
works well when leveraging the system call mode, I find the "schedTimingResp"
function is never executed in the full-system simulation mode. Am I missed
anything here?
./build/X86/gem5.opt --debug-flags=CacheDebug
--debug-file=cacheDebug.out.gz configs/example/fs.py
--disk-image=/home/test/work/x86_full_system/disks/linux-x86.img
--kernel=/home/test/work/x86_full_system/binaries/x86_64-vmlinux-3.2.1
--caches
By inserting some printf at the beginning of function "schedTimingResp
", I am pretty sure this function is never invoked...
Hi Majid,
Thank you so much for your detailed information. I strongly appreciate that.
I tried to update the code as you said, and it works fine to dump the
information in the C++ code. However, I am still confused to interpret the
"cache state" information at this step. Could you please take a look at the
following questions and shed some lights on it? Thank you!
So what I am basically doing right now, as you suggested, is to check
the conditions in the context of each "cpuSidePort->schedTimingResp" to
decide whether the current memory addressing leads to a hit or miss.
However, after running multiple test cases (including some small binaries
and mediam size GNU Coreutils binaries), all I can find is the "hit" (schedTimingResp
at line 742 of the cache.cc) and schedTimingResp at line 1454 of the
cache.cc. Basically I cannot find any "miss" (schedTimingResp at line
801 of the cache.cc). Am I missed anything here?
If I understood correctly, given a N-bit memory address, it is
[image: Inline image 2]
The "set index" is used to locate the cache set in which the data may be
stored, and the tag is used to confirm that the data currently indeed
presents in one of the cache lines in that cache set. In other words, I
understand that the "cache state" (hit; miss; etc.) should be associated
with each cache set regarding every memory addressing.
Given the above context, I would like to confirm that the captured
"hit/miss" surely represents the cache state of the accessed cache set. Or
it is actually something towards the cache lines?
Am I clear on this? Any suggestion and advice would be appreciated!
Thank you!
Sincerely,
Shuai
On Tue, Jan 3, 2017 at 3:12 PM, Majid Namaki Shoushtari <
Hi Shuai,
I don't think Jason meant that you need to add a function to Caches.py.
You will need to add something to the C++ class (src/mem/cache/cache.hh/cc).
I'm not sure what kind of information you need to dump, but basically
"Cache::CpuSidePort::recvTimingReq(PacketPtr pkt)"
and all of the responses to CPU are happening anywhere there is a call
to: "cpuSidePort->schedTimingResp". There is currently four places that
responses to CPU are scheduled. If you read the code, it's relatively easy
to figure out which call site covers what condition (hit, miss, uncacheable
access, etc).
If you need to dump this information for one (some) specific cache(s)
only, one way of doing it is to pass a boolean variable and make it
conditional based on the value of that variable. For that you will need to
add the variable to Caches.py and possibly CacheConfig.py.
Cheers,
Majid
Dear Jason,
Thank you so much for your reply. Could you please elaborate more on how
to "implement a function in Caches.py to dump the data"? As far as I can
see, there are only some cache parameters defined in this scripts.. I
really have no idea how should I bridge the code there with the runtime
cache state (my focus is the L1 D Cache)...
I am not a system person and I am sincerely sorry if it is actually
quite obvious... Thank you so much in advance!
Sincerely,
Shuai
Hi Shuai,
There is currently nothing built into gem5 to dump the cache state
(unless you're using Ruby in which case you can look at the code to take a
checkpoint in the RubySystem class and the CacheTrace class). However, it
should be pretty simple to dump the data in the classic caches. You would
need to get a pointer to all of the caches, then add a function to the
Cache class that dumps the data. You may be able to leverage the DDUMP
macro which formats data in a reasonable way. Or, if you're only going to
be using code to consume the output, you can look into the protobuf support
in gem5 for dumping/consuming data.
Cheers,
Jason
Dear list,
I am using the full-system simulation of gem5 to analyze the cache
access of some x86 binary code. I have been able to add a monitor between
the CPU and the L1 data cache to track all the cache access when executing
the binary code on the simulated OS.
Currently, I am thinking to go one step further and dump the cache state
during the execution of the binary code. After a quick search online, I am
unable to find some useful information, and I am wondering if it is
actually possible to do so..?
Could anyone provide some pointers regarding this task? Thank you in
advance!
Sincerely,
Shuai
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
--
Jason
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
--
Majid Namaki Shoushtari
PhD Candidate
Department of Computer Science
University of California, Irvine
Irvine, CA 92697
http://www.ics.uci.edu/~anamakis
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
--
Jason
Loading...