Discussion:
RISCV ISA : "C" (compressed) extension supported?
(too old to reply)
Marcelo Brandalero
2018-05-24 23:06:19 UTC
Permalink
Hi all,

I recently switched from gem5/x86 to gem5/RISCV due to some advantages of
this ISA.

I'm getting some weird simulation results and I realized my compiler was
generating instructions for the compressed RISCV ISA extension (chp 12 in
the user level ISA specification <https://riscv.org/specifications/>). The
weirdness disappears when I use *--march* to remove these extensions.

*So the question is: does gem5/RISCV support this ISA extension? *If so, I
can share the weird results (maybe I'm missing something) but basically a
wide-issue O3 processor fetches only max 1 instruction/cycle when it should
probably be fetching more.

If it doesn't support then it's all OK, I just find it a bit weird that the
program executes normally with no warnings whatsoever.

Best regards,
--
Marcelo Brandalero
PhD Candidate
Programa de Pós Graduação em Computação
Universidade Federal do Rio Grande do Sul
Jason Lowe-Power
2018-05-24 23:20:36 UTC
Permalink
Hi Marcelo,

I'm not sure if RISC-V has been tested with the out of order CPU at all!
I'm happy that at least it doesn't completely fail!

For you problem of only fetching 1 instruction per cycle... I think it's
going to take some digging. My first guess would be that it could be a
problem with the advancePC() function that's implemented in the RISC-V
decoder (in gem5/arch/riscv), but I don't really have any specific reason
to think that :).

You could try turning on some debug flags for the O3 CPU. Specifically,
Fetch might be helpful.

Cheers,
Jason
Post by Marcelo Brandalero
Hi all,
I recently switched from gem5/x86 to gem5/RISCV due to some advantages of
this ISA.
I'm getting some weird simulation results and I realized my compiler was
generating instructions for the compressed RISCV ISA extension (chp 12 in
the user level ISA specification <https://riscv.org/specifications/>).
The weirdness disappears when I use *--march* to remove these extensions.
*So the question is: does gem5/RISCV support this ISA extension? *If so,
I can share the weird results (maybe I'm missing something) but basically a
wide-issue O3 processor fetches only max 1 instruction/cycle when it should
probably be fetching more.
If it doesn't support then it's all OK, I just find it a bit weird that
the program executes normally with no warnings whatsoever.
Best regards,
--
Marcelo Brandalero
PhD Candidate
Programa de Pós Graduação em Computação
Universidade Federal do Rio Grande do Sul
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
Alec Roelke
2018-05-25 00:06:33 UTC
Permalink
Hi Marcelo,

Yes, gem5 does support the C extension (64-bit version only, though). I
don't know what could be causing your particular issue. I'm not sure
advancePC is the issue, though, because all that essentially does is call
PCState::advance(), which is inherited unchanged from
GenericISA::UPCState. Try doing as Jason suggests and run your simulation
with the Fetch debug flag enabled, and maybe that will shed some light on
the issue.

-Alec
Post by Jason Lowe-Power
Hi Marcelo,
I'm not sure if RISC-V has been tested with the out of order CPU at all!
I'm happy that at least it doesn't completely fail!
For you problem of only fetching 1 instruction per cycle... I think it's
going to take some digging. My first guess would be that it could be a
problem with the advancePC() function that's implemented in the RISC-V
decoder (in gem5/arch/riscv), but I don't really have any specific reason
to think that :).
You could try turning on some debug flags for the O3 CPU. Specifically,
Fetch might be helpful.
Cheers,
Jason
On Thu, May 24, 2018 at 4:06 PM Marcelo Brandalero <
Post by Marcelo Brandalero
Hi all,
I recently switched from gem5/x86 to gem5/RISCV due to some advantages of
this ISA.
I'm getting some weird simulation results and I realized my compiler was
generating instructions for the compressed RISCV ISA extension (chp 12
in the user level ISA specification <https://riscv.org/specifications/>).
The weirdness disappears when I use *--march* to remove these extensions.
*So the question is: does gem5/RISCV support this ISA extension? *If so,
I can share the weird results (maybe I'm missing something) but basically a
wide-issue O3 processor fetches only max 1 instruction/cycle when it should
probably be fetching more.
If it doesn't support then it's all OK, I just find it a bit weird that
the program executes normally with no warnings whatsoever.
Best regards,
--
Marcelo Brandalero
PhD Candidate
Programa de Pós Graduação em Computação
Universidade Federal do Rio Grande do Sul
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
Marcelo Brandalero
2018-05-25 00:33:05 UTC
Permalink
Hi Jason, Alec,

Thanks for the fast responses!

I can say I managed to run a lot of benchmarks on O3 and none of them
crashed. I did notice however that their performance on for distinct-width
O3 processors had only minor differences (on x86, the differences were much
more significant).

I ran into this particular issue only today, though, so I can only say it
*seems* *to affect only binaries compíled with C extensions*.

I'll run the tests suggested by both of you and reply here in case I find
anything interesting.

Best regards,
Hi Jason, Alec,
Thanks for the fast responses!
I can say I managed to run a lot of benchmarks on O3 and none of them
crashed. I did notice however that their performance on for distinct-width
O3 processors had only minor differences (on x86, the differences were much
more significant).
I ran into this particular issue only today, though, so I can only say it
*seems* *to affect only binaries compíled with C extensions*.
I'll run the tests suggested and reply here in case I find anything
interesting.
Best regards,
Post by Jason Lowe-Power
Hi Marcelo,
Yes, gem5 does support the C extension (64-bit version only, though). I
don't know what could be causing your particular issue. I'm not sure
advancePC is the issue, though, because all that essentially does is call
PCState::advance(), which is inherited unchanged from
GenericISA::UPCState. Try doing as Jason suggests and run your simulation
with the Fetch debug flag enabled, and maybe that will shed some light on
the issue.
-Alec
Post by Jason Lowe-Power
Hi Marcelo,
I'm not sure if RISC-V has been tested with the out of order CPU at all!
I'm happy that at least it doesn't completely fail!
For you problem of only fetching 1 instruction per cycle... I think it's
going to take some digging. My first guess would be that it could be a
problem with the advancePC() function that's implemented in the RISC-V
decoder (in gem5/arch/riscv), but I don't really have any specific reason
to think that :).
You could try turning on some debug flags for the O3 CPU. Specifically,
Fetch might be helpful.
Cheers,
Jason
On Thu, May 24, 2018 at 4:06 PM Marcelo Brandalero <
Post by Marcelo Brandalero
Hi all,
I recently switched from gem5/x86 to gem5/RISCV due to some advantages
of this ISA.
I'm getting some weird simulation results and I realized my compiler
was generating instructions for the compressed RISCV ISA extension (chp
12 in the user level ISA specification
<https://riscv.org/specifications/>). The weirdness disappears when I
use *--march* to remove these extensions.
*So the question is: does gem5/RISCV support this ISA extension? *If
so, I can share the weird results (maybe I'm missing something) but
basically a wide-issue O3 processor fetches only max 1 instruction/cycle
when it should probably be fetching more.
If it doesn't support then it's all OK, I just find it a bit weird that
the program executes normally with no warnings whatsoever.
Best regards,
--
Marcelo Brandalero
PhD Candidate
Programa de Pós Graduação em Computação
Universidade Federal do Rio Grande do Sul
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
--
Marcelo Brandalero
--
Marcelo Brandalero
PhD Candidate
Programa de Pós Graduação em Computação
Universidade Federal do Rio Grande do Sul
Marcelo Brandalero
2018-05-25 15:20:18 UTC
Permalink
Hi Jason, Alec,

Just to provide some feedback on this issue, it seems that the processor is
mistakenly identifying (add reg, reg, reg) in compressed format as a branch
instruction.

I'm running a kernel that looks like this (result from
*riscv64-unknown-elf-objdump
-D*)

000000000001019a <myFunction>:
1019a: 06400793 li a5,100
1019e: 4701 li a4,0
101a0: 4681 li a3,0
101a2: 4601 li a2,0
101a4: 0c800513 li a0,200
101a8: 952a add a0,a0,a0
101aa: 9632 add a2,a2,a2
101ac: 96b6 add a3,a3,a3
101ae: 973a add a4,a4,a4




* 101b0: 952a add a0,a0,a0 101b2:
9632 add a2,a2,a2 101b4: 96b6
add a3,a3,a3 101b6: 973a
add a4,a4,a4*(repeat the four instructions above
until this:)
104b8: 952a add a0,a0,a0
104ba: 9632 add a2,a2,a2
104bc: 96b6 add a3,a3,a3
104be: 973a add a4,a4,a4
104c0: 952a add a0,a0,a0
104c2: 2501 sext.w a0,a0
104c4: 9632 add a2,a2,a2
104c6: 2601 sext.w a2,a2
104c8: 96b6 add a3,a3,a3
104ca: 2681 sext.w a3,a3
104cc: 973a add a4,a4,a4
104ce: 2701 sext.w a4,a4
104d0: 37fd addiw a5,a5,-1
104d2: cc079be3 bnez a5,101a8 <myFunction+0xe>

And what the Fetch stage looks like when fetching this code block is this:

4048968: system.cpu.fetch: [tid:0] Waking up from cache miss.
4048968: system.cpu.fetch: Running stage.
4048968: system.cpu.fetch: Attempting to fetch from [tid:0]
4048968: system.cpu.fetch: [tid:0]: Icache miss is complete.
4048968: system.cpu.fetch: [tid:0]: Adding instructions to queue to decode.
4048968: system.cpu.fetch: [tid:0]: Instruction PC 0x101a8 (0) created
[sn:8124].
4048968: system.cpu.fetch: [tid:0]: Instruction is: c_add a0, a0, a0
4048968: system.cpu.fetch: [tid:0]: Fetch queue entry created (1/256).
*4048968: system.cpu.fetch: Branch detected with PC =
(0x101a8=>0x101aa).(0=>1)*
4048968: system.cpu.fetch: [tid:0]: Done fetching, predicted branch
instruction encountered.
4048968: system.cpu.fetch: [tid:0][sn:8124]: Sending instruction to decode
from fetch queue. Fetch queue size: 1.
4049281: system.cpu.fetch: Running stage.
4049281: system.cpu.fetch: Attempting to fetch from [tid:0]
4049281: system.cpu.fetch: [tid:0]: Adding instructions to queue to decode.
4049281: system.cpu.fetch: [tid:0]: Instruction PC 0x101aa (0) created
[sn:8125].
4049281: system.cpu.fetch: [tid:0]: Instruction is: c_add a2, a2, a2
4049281: system.cpu.fetch: [tid:0]: Fetch queue entry created (1/256).
*4049281: system.cpu.fetch: Branch detected with PC =
(0x101aa=>0x101ac).(0=>1)*
4049281: system.cpu.fetch: [tid:0]: Done fetching, predicted branch
instruction encountered.
4049281: system.cpu.fetch: [tid:0][sn:8125]: Sending instruction to decode
from fetch queue. Fetch queue size: 1.
4049594: system.cpu.fetch: Running stage.
4049594: system.cpu.fetch: Attempting to fetch from [tid:0]
4049594: system.cpu.fetch: [tid:0]: Adding instructions to queue to decode.
4049594: system.cpu.fetch: [tid:0]: Instruction PC 0x101ac (0) created
[sn:8126].
4049594: system.cpu.fetch: [tid:0]: Instruction is: c_add a3, a3, a3
4049594: system.cpu.fetch: [tid:0]: Fetch queue entry created (1/256).
*4049594: system.cpu.fetch: Branch detected with PC =
(0x101ac=>0x101ae).(0=>1)*
4049594: system.cpu.fetch: [tid:0]: Done fetching, predicted branch
instruction encountered.
4049594: system.cpu.fetch: [tid:0][sn:8126]: Sending instruction to decode
from fetch queue. Fetch queue size: 1.
4049907: system.cpu.fetch: Running stage.
4049907: system.cpu.fetch: Attempting to fetch from [tid:0]
4049907: system.cpu.fetch: [tid:0]: Adding instructions to queue to decode.
4049907: system.cpu.fetch: [tid:0]: Instruction PC 0x101ae (0) created
[sn:8127].
4049907: system.cpu.fetch: [tid:0]: Instruction is: c_add a4, a4, a4
4049907: system.cpu.fetch: [tid:0]: Fetch queue entry created (1/256).
*4049907: system.cpu.fetch: Branch detected with PC =
(0x101ae=>0x101b0).(0=>1)*
4049907: system.cpu.fetch: [tid:0]: Done fetching, predicted branch
instruction encountered.
4049907: system.cpu.fetch: [tid:0][sn:8127]: Sending instruction to decode
from fetch queue. Fetch queue size: 1.
4050220: system.cpu.fetch: Running stage.
4050220: system.cpu.fetch: Attempting to fetch from [tid:0]
4050220: system.cpu.fetch: [tid:0]: Adding instructions to queue to decode.
4050220: system.cpu.fetch: [tid:0]: Instruction PC 0x101b0 (0) created
[sn:8128].
4050220: system.cpu.fetch: [tid:0]: Instruction is: c_add a0, a0, a0
4050220: system.cpu.fetch: [tid:0]: Fetch queue entry created (1/256).
*4050220: system.cpu.fetch: Branch detected with PC =
(0x101b0=>0x101b2).(0=>1)*
4050220: system.cpu.fetch: [tid:0]: Done fetching, predicted branch
instruction encountered.
4050220: system.cpu.fetch: [tid:0][sn:8128]: Sending instruction to decode
from fetch queue. Fetch queue size: 1.
4050533: system.cpu.fetch: Running stage.
4050533: system.cpu.fetch: Attempting to fetch from [tid:0]
4050533: system.cpu.fetch: [tid:0]: Adding instructions to queue to decode.
4050533: system.cpu.fetch: [tid:0]: Instruction PC 0x101b2 (0) created
[sn:8129].
4050533: system.cpu.fetch: [tid:0]: Instruction is: c_add a2, a2, a2
4050533: system.cpu.fetch: [tid:0]: Fetch queue entry created (1/256).
*4050533: system.cpu.fetch: Branch detected with PC =
(0x101b2=>0x101b4).(0=>1)*
4050533: system.cpu.fetch: [tid:0]: Done fetching, predicted branch
instruction encountered.
4050533: system.cpu.fetch: [tid:0][sn:8129]: Sending instruction to decode
from fetch queue. Fetch queue size: 1.
4050846: system.cpu.fetch: Running stage.
4050846: system.cpu.fetch: Attempting to fetch from [tid:0]
4050846: system.cpu.fetch: [tid:0]: Adding instructions to queue to decode.
4050846: system.cpu.fetch: [tid:0]: Instruction PC 0x101b4 (0) created
[sn:8130].
4050846: system.cpu.fetch: [tid:0]: Instruction is: c_add a3, a3, a3
4050846: system.cpu.fetch: [tid:0]: Fetch queue entry created (1/256).
*4050846: system.cpu.fetch: Branch detected with PC =
(0x101b4=>0x101b6).(0=>1)*
4050846: system.cpu.fetch: [tid:0]: Done fetching, predicted branch
instruction encountered.
4050846: system.cpu.fetch: [tid:0][sn:8130]: Sending instruction to decode
from fetch queue. Fetch queue size: 1.

Not sure if it's a decoder problem or what, but it seems to affect only
instructions in the compressed format. It manifests itself in the
statistics with the following abnormal behavior:

system.cpu.fetch.rateDist::0 13812 23.92% 23.92%
# Number of instructions fetched each cycle (Total)
*system.cpu.fetch.rateDist::1 42910 74.32%
98.24% # Number of instructions fetched each cycle (Total) *
system.cpu.fetch.rateDist::2 624 1.08% 99.32%
# Number of instructions fetched each cycle (Total)
system.cpu.fetch.rateDist::3 256 0.44% 99.77%
# Number of instructions fetched each cycle (Total)
system.cpu.fetch.rateDist::4 59 0.10% 99.87%
# Number of instructions fetched each cycle (Total)
system.cpu.fetch.rateDist::5 50 0.09% 99.95%
# Number of instructions fetched each cycle (Total)
system.cpu.fetch.rateDist::6 5 0.01% 99.96%
# Number of instructions fetched each cycle (Total)
system.cpu.fetch.rateDist::7 2 0.00% 99.97%
# Number of instructions fetched each cycle (Total)
system.cpu.fetch.rateDist::8 19 0.03% 100.00%
# Number of instructions fetched each cycle (Total)
system.cpu.fetch.rateDist::overflows 0 0.00% 100.00%
# Number of instructions fetched each cycle (Total)

I won't be digging further into this, since running without compressed
format seems to fix the issue and is enough for my usage scenario. Just
thought this information could be useful for someone.

Cheers!


On Thu, May 24, 2018 at 9:33 PM, Marcelo Brandalero <
Post by Marcelo Brandalero
Hi Jason, Alec,
Thanks for the fast responses!
I can say I managed to run a lot of benchmarks on O3 and none of them
crashed. I did notice however that their performance on for distinct-width
O3 processors had only minor differences (on x86, the differences were much
more significant).
I ran into this particular issue only today, though, so I can only say it
*seems* *to affect only binaries compíled with C extensions*.
I'll run the tests suggested by both of you and reply here in case I find
anything interesting.
Best regards,
Post by Marcelo Brandalero
Hi Jason, Alec,
Thanks for the fast responses!
I can say I managed to run a lot of benchmarks on O3 and none of them
crashed. I did notice however that their performance on for distinct-width
O3 processors had only minor differences (on x86, the differences were much
more significant).
I ran into this particular issue only today, though, so I can only say it
*seems* *to affect only binaries compíled with C extensions*.
I'll run the tests suggested and reply here in case I find anything
interesting.
Best regards,
Post by Jason Lowe-Power
Hi Marcelo,
Yes, gem5 does support the C extension (64-bit version only, though). I
don't know what could be causing your particular issue. I'm not sure
advancePC is the issue, though, because all that essentially does is call
PCState::advance(), which is inherited unchanged from
GenericISA::UPCState. Try doing as Jason suggests and run your simulation
with the Fetch debug flag enabled, and maybe that will shed some light on
the issue.
-Alec
Post by Jason Lowe-Power
Hi Marcelo,
I'm not sure if RISC-V has been tested with the out of order CPU at
all! I'm happy that at least it doesn't completely fail!
For you problem of only fetching 1 instruction per cycle... I think
it's going to take some digging. My first guess would be that it could be a
problem with the advancePC() function that's implemented in the RISC-V
decoder (in gem5/arch/riscv), but I don't really have any specific reason
to think that :).
You could try turning on some debug flags for the O3 CPU. Specifically,
Fetch might be helpful.
Cheers,
Jason
On Thu, May 24, 2018 at 4:06 PM Marcelo Brandalero <
Post by Marcelo Brandalero
Hi all,
I recently switched from gem5/x86 to gem5/RISCV due to some advantages
of this ISA.
I'm getting some weird simulation results and I realized my compiler
was generating instructions for the compressed RISCV ISA extension (chp
12 in the user level ISA specification
<https://riscv.org/specifications/>). The weirdness disappears when I
use *--march* to remove these extensions.
*So the question is: does gem5/RISCV support this ISA extension? *If
so, I can share the weird results (maybe I'm missing something) but
basically a wide-issue O3 processor fetches only max 1 instruction/cycle
when it should probably be fetching more.
If it doesn't support then it's all OK, I just find it a bit weird
that the program executes normally with no warnings whatsoever.
Best regards,
--
Marcelo Brandalero
PhD Candidate
Programa de Pós Graduação em Computação
Universidade Federal do Rio Grande do Sul
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
--
Marcelo Brandalero
--
Marcelo Brandalero
PhD Candidate
Programa de Pós Graduação em Computação
Universidade Federal do Rio Grande do Sul
--
Marcelo Brandalero
PhD Candidate
Programa de Pós Graduação em Computação
Universidade Federal do Rio Grande do Sul
Jason Lowe-Power
2018-05-28 15:48:27 UTC
Permalink
Hi Marcelo,

For future reference, if someone else has this issue... Another possibility
is that the branch predictor is the problem. It looks like it could be
predicting that instruction is a branch. I'm not sure if it's specifically
because of the compressed format or not, though. It's another place for the
next person to start digging.

Cheers,
Jason
Post by Marcelo Brandalero
Hi Jason, Alec,
Just to provide some feedback on this issue, it seems that the processor
is mistakenly identifying (add reg, reg, reg) in compressed format as a
branch instruction.
I'm running a kernel that looks like this (result from *riscv64-unknown-elf-objdump
-D*)
1019a: 06400793 li a5,100
1019e: 4701 li a4,0
101a0: 4681 li a3,0
101a2: 4601 li a2,0
101a4: 0c800513 li a0,200
101a8: 952a add a0,a0,a0
101aa: 9632 add a2,a2,a2
101ac: 96b6 add a3,a3,a3
101ae: 973a add a4,a4,a4
9632 add a2,a2,a2 101b4: 96b6
add a3,a3,a3 101b6: 973a
add a4,a4,a4*(repeat the four instructions above
until this:)
104b8: 952a add a0,a0,a0
104ba: 9632 add a2,a2,a2
104bc: 96b6 add a3,a3,a3
104be: 973a add a4,a4,a4
104c0: 952a add a0,a0,a0
104c2: 2501 sext.w a0,a0
104c4: 9632 add a2,a2,a2
104c6: 2601 sext.w a2,a2
104c8: 96b6 add a3,a3,a3
104ca: 2681 sext.w a3,a3
104cc: 973a add a4,a4,a4
104ce: 2701 sext.w a4,a4
104d0: 37fd addiw a5,a5,-1
104d2: cc079be3 bnez a5,101a8 <myFunction+0xe>
4048968: system.cpu.fetch: [tid:0] Waking up from cache miss.
4048968: system.cpu.fetch: Running stage.
4048968: system.cpu.fetch: Attempting to fetch from [tid:0]
4048968: system.cpu.fetch: [tid:0]: Icache miss is complete.
4048968: system.cpu.fetch: [tid:0]: Adding instructions to queue to decode.
4048968: system.cpu.fetch: [tid:0]: Instruction PC 0x101a8 (0) created
[sn:8124].
4048968: system.cpu.fetch: [tid:0]: Instruction is: c_add a0, a0, a0
4048968: system.cpu.fetch: [tid:0]: Fetch queue entry created (1/256).
*4048968: system.cpu.fetch: Branch detected with PC =
(0x101a8=>0x101aa).(0=>1)*
4048968: system.cpu.fetch: [tid:0]: Done fetching, predicted branch
instruction encountered.
4048968: system.cpu.fetch: [tid:0][sn:8124]: Sending instruction to decode
from fetch queue. Fetch queue size: 1.
4049281: system.cpu.fetch: Running stage.
4049281: system.cpu.fetch: Attempting to fetch from [tid:0]
4049281: system.cpu.fetch: [tid:0]: Adding instructions to queue to decode.
4049281: system.cpu.fetch: [tid:0]: Instruction PC 0x101aa (0) created
[sn:8125].
4049281: system.cpu.fetch: [tid:0]: Instruction is: c_add a2, a2, a2
4049281: system.cpu.fetch: [tid:0]: Fetch queue entry created (1/256).
*4049281: system.cpu.fetch: Branch detected with PC =
(0x101aa=>0x101ac).(0=>1)*
4049281: system.cpu.fetch: [tid:0]: Done fetching, predicted branch
instruction encountered.
4049281: system.cpu.fetch: [tid:0][sn:8125]: Sending instruction to decode
from fetch queue. Fetch queue size: 1.
4049594: system.cpu.fetch: Running stage.
4049594: system.cpu.fetch: Attempting to fetch from [tid:0]
4049594: system.cpu.fetch: [tid:0]: Adding instructions to queue to decode.
4049594: system.cpu.fetch: [tid:0]: Instruction PC 0x101ac (0) created
[sn:8126].
4049594: system.cpu.fetch: [tid:0]: Instruction is: c_add a3, a3, a3
4049594: system.cpu.fetch: [tid:0]: Fetch queue entry created (1/256).
*4049594: system.cpu.fetch: Branch detected with PC =
(0x101ac=>0x101ae).(0=>1)*
4049594: system.cpu.fetch: [tid:0]: Done fetching, predicted branch
instruction encountered.
4049594: system.cpu.fetch: [tid:0][sn:8126]: Sending instruction to decode
from fetch queue. Fetch queue size: 1.
4049907: system.cpu.fetch: Running stage.
4049907: system.cpu.fetch: Attempting to fetch from [tid:0]
4049907: system.cpu.fetch: [tid:0]: Adding instructions to queue to decode.
4049907: system.cpu.fetch: [tid:0]: Instruction PC 0x101ae (0) created
[sn:8127].
4049907: system.cpu.fetch: [tid:0]: Instruction is: c_add a4, a4, a4
4049907: system.cpu.fetch: [tid:0]: Fetch queue entry created (1/256).
*4049907: system.cpu.fetch: Branch detected with PC =
(0x101ae=>0x101b0).(0=>1)*
4049907: system.cpu.fetch: [tid:0]: Done fetching, predicted branch
instruction encountered.
4049907: system.cpu.fetch: [tid:0][sn:8127]: Sending instruction to decode
from fetch queue. Fetch queue size: 1.
4050220: system.cpu.fetch: Running stage.
4050220: system.cpu.fetch: Attempting to fetch from [tid:0]
4050220: system.cpu.fetch: [tid:0]: Adding instructions to queue to decode.
4050220: system.cpu.fetch: [tid:0]: Instruction PC 0x101b0 (0) created
[sn:8128].
4050220: system.cpu.fetch: [tid:0]: Instruction is: c_add a0, a0, a0
4050220: system.cpu.fetch: [tid:0]: Fetch queue entry created (1/256).
*4050220: system.cpu.fetch: Branch detected with PC =
(0x101b0=>0x101b2).(0=>1)*
4050220: system.cpu.fetch: [tid:0]: Done fetching, predicted branch
instruction encountered.
4050220: system.cpu.fetch: [tid:0][sn:8128]: Sending instruction to decode
from fetch queue. Fetch queue size: 1.
4050533: system.cpu.fetch: Running stage.
4050533: system.cpu.fetch: Attempting to fetch from [tid:0]
4050533: system.cpu.fetch: [tid:0]: Adding instructions to queue to decode.
4050533: system.cpu.fetch: [tid:0]: Instruction PC 0x101b2 (0) created
[sn:8129].
4050533: system.cpu.fetch: [tid:0]: Instruction is: c_add a2, a2, a2
4050533: system.cpu.fetch: [tid:0]: Fetch queue entry created (1/256).
*4050533: system.cpu.fetch: Branch detected with PC =
(0x101b2=>0x101b4).(0=>1)*
4050533: system.cpu.fetch: [tid:0]: Done fetching, predicted branch
instruction encountered.
4050533: system.cpu.fetch: [tid:0][sn:8129]: Sending instruction to decode
from fetch queue. Fetch queue size: 1.
4050846: system.cpu.fetch: Running stage.
4050846: system.cpu.fetch: Attempting to fetch from [tid:0]
4050846: system.cpu.fetch: [tid:0]: Adding instructions to queue to decode.
4050846: system.cpu.fetch: [tid:0]: Instruction PC 0x101b4 (0) created
[sn:8130].
4050846: system.cpu.fetch: [tid:0]: Instruction is: c_add a3, a3, a3
4050846: system.cpu.fetch: [tid:0]: Fetch queue entry created (1/256).
*4050846: system.cpu.fetch: Branch detected with PC =
(0x101b4=>0x101b6).(0=>1)*
4050846: system.cpu.fetch: [tid:0]: Done fetching, predicted branch
instruction encountered.
4050846: system.cpu.fetch: [tid:0][sn:8130]: Sending instruction to decode
from fetch queue. Fetch queue size: 1.
Not sure if it's a decoder problem or what, but it seems to affect only
instructions in the compressed format. It manifests itself in the
system.cpu.fetch.rateDist::0 13812 23.92%
23.92% # Number of instructions fetched each cycle (Total)
*system.cpu.fetch.rateDist::1 42910 74.32%
98.24% # Number of instructions fetched each cycle (Total) *
system.cpu.fetch.rateDist::2 624 1.08%
99.32% # Number of instructions fetched each cycle (Total)
system.cpu.fetch.rateDist::3 256 0.44%
99.77% # Number of instructions fetched each cycle (Total)
system.cpu.fetch.rateDist::4 59 0.10%
99.87% # Number of instructions fetched each cycle (Total)
system.cpu.fetch.rateDist::5 50 0.09%
99.95% # Number of instructions fetched each cycle (Total)
system.cpu.fetch.rateDist::6 5 0.01%
99.96% # Number of instructions fetched each cycle (Total)
system.cpu.fetch.rateDist::7 2 0.00%
99.97% # Number of instructions fetched each cycle (Total)
system.cpu.fetch.rateDist::8 19 0.03%
100.00% # Number of instructions fetched each cycle (Total)
system.cpu.fetch.rateDist::overflows 0 0.00%
100.00% # Number of instructions fetched each cycle (Total)
I won't be digging further into this, since running without compressed
format seems to fix the issue and is enough for my usage scenario. Just
thought this information could be useful for someone.
Cheers!
On Thu, May 24, 2018 at 9:33 PM, Marcelo Brandalero <
Post by Marcelo Brandalero
Hi Jason, Alec,
Thanks for the fast responses!
I can say I managed to run a lot of benchmarks on O3 and none of them
crashed. I did notice however that their performance on for distinct-width
O3 processors had only minor differences (on x86, the differences were much
more significant).
I ran into this particular issue only today, though, so I can only say it
*seems* *to affect only binaries compíled with C extensions*.
I'll run the tests suggested by both of you and reply here in case I find
anything interesting.
Best regards,
Post by Marcelo Brandalero
Hi Jason, Alec,
Thanks for the fast responses!
I can say I managed to run a lot of benchmarks on O3 and none of them
crashed. I did notice however that their performance on for distinct-width
O3 processors had only minor differences (on x86, the differences were much
more significant).
I ran into this particular issue only today, though, so I can only say
it *seems* *to affect only binaries compíled with C extensions*.
I'll run the tests suggested and reply here in case I find anything
interesting.
Best regards,
Post by Jason Lowe-Power
Hi Marcelo,
Yes, gem5 does support the C extension (64-bit version only, though).
I don't know what could be causing your particular issue. I'm not sure
advancePC is the issue, though, because all that essentially does is call
PCState::advance(), which is inherited unchanged from
GenericISA::UPCState. Try doing as Jason suggests and run your simulation
with the Fetch debug flag enabled, and maybe that will shed some light on
the issue.
-Alec
Post by Jason Lowe-Power
Hi Marcelo,
I'm not sure if RISC-V has been tested with the out of order CPU at
all! I'm happy that at least it doesn't completely fail!
For you problem of only fetching 1 instruction per cycle... I think
it's going to take some digging. My first guess would be that it could be a
problem with the advancePC() function that's implemented in the RISC-V
decoder (in gem5/arch/riscv), but I don't really have any specific reason
to think that :).
You could try turning on some debug flags for the O3 CPU.
Specifically, Fetch might be helpful.
Cheers,
Jason
On Thu, May 24, 2018 at 4:06 PM Marcelo Brandalero <
Post by Marcelo Brandalero
Hi all,
I recently switched from gem5/x86 to gem5/RISCV due to some
advantages of this ISA.
I'm getting some weird simulation results and I realized my compiler
was generating instructions for the compressed RISCV ISA extension (chp
12 in the user level ISA specification
<https://riscv.org/specifications/>). The weirdness disappears when
I use *--march* to remove these extensions.
*So the question is: does gem5/RISCV support this ISA extension? *If
so, I can share the weird results (maybe I'm missing something) but
basically a wide-issue O3 processor fetches only max 1 instruction/cycle
when it should probably be fetching more.
If it doesn't support then it's all OK, I just find it a bit weird
that the program executes normally with no warnings whatsoever.
Best regards,
--
Marcelo Brandalero
PhD Candidate
Programa de Pós Graduação em Computação
Universidade Federal do Rio Grande do Sul
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
--
Marcelo Brandalero
--
Marcelo Brandalero
PhD Candidate
Programa de Pós Graduação em Computação
Universidade Federal do Rio Grande do Sul
--
Marcelo Brandalero
PhD Candidate
Programa de Pós Graduação em Computação
Universidade Federal do Rio Grande do Sul
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
Loading...