Discussion:
forward invalidations to lsq
(too old to reply)
Dibakar Gope
2012-06-21 22:09:55 UTC
Permalink
Hi All,

I was skimming through the O3+Ruby portion of the current dev repository code, that attempts to support load-load ordering for a stronger consistency model. In existing code, L1 cache controller sends a forward_eviction_to_cpu, which in turn set the hitExternalSnoop flag of a load instruction (provided the load has not committed yet) through checksnoop() function. The checkViolation() portion of the code says that in order to be squashed and re-executed, that particular load instruction has to see another supposedly older load that maps to the same cache block (the first if () block). The code snippet is shown below:


lsq_unit_impl.hh
checkViolations()
------------
if (inst_eff_addr2 >= ld_eff_addr1 && inst_eff_addr1 <= ld_eff_addr2) {
if (inst->isLoad()) {
// If this load is to the same block as an external snoop
// invalidate that we've observed then the load needs to be
// squashed as it could have newer data
if (ld_inst->hitExternalSnoop) {
if (!memDepViolator ||
ld_inst->seqNum < memDepViolator->seqNum) {
DPRINTF(LSQUnit, "Detected fault with inst [sn:%lli] "
"and [sn:%lli] at address %#x\n",
inst->seqNum, ld_inst->seqNum, ld_eff_addr1);
memDepViolator = ld_inst;
++lsqMemOrderViolation;
return new GenericISA::M5PanicFault(
"Detected fault with inst [sn:%lli] and "
"[sn:%lli] at address %#x\n",
inst->seqNum, ld_inst->seqNum, ld_eff_addr1);
}
}
// Otherwise, mark the load has a possible load violation
// and if we see a snoop before it's commited, we need to squash
ld_inst->possibleLoadViolation = true;
DPRINTF(LSQUnit, "Found possible load violaiton at addr: %#x"
" between instructions [sn:%lli] and [sn:%lli]\n",
inst_eff_addr1, inst->seqNum, ld_inst->seqNum);
} else {
---------------
---------------


In my understanding, if a snoop hits a younger load in lsq before it is committed, it need to be re-executed without any constraints from checkViolation() function to maintain stronger consistency. I was talking about the following simple example:
c0 c1


St B Ld A
St A Ld B


if Ld B in core1 is executed out-of-order and later sees a snoop before commit, should not we re-execute Ld B without any constraints from checkViolations() function? Am I missing something completely over here?


Regards,
Dibakar
Ali Saidi
2012-06-22 04:06:04 UTC
Permalink
It completely depends on what consistency model you're going for.
The current code doesn't support sequential consistency, but the
load-load ordering that is enforced is inline with ARMs ordering
requirements.

Ali
Post by Dibakar Gope
Hi
All,
Post by Dibakar Gope
I was skimming through the O3+Ruby portion of the current dev
repository code, that attempts to support load-load ordering for a
stronger consistency model. In existing code, L1 cache controller sends
a forward_eviction_to_cpu, which in turn set the hitExternalSnoop flag
of a load instruction (provided the load has not committed yet) through
checksnoop() function. The checkViolation() portion of the code says
that in order to be squashed and re-executed, that particular load
instruction has to see another supposedly older load that maps to the
same cache block (the first if () block). The code snippet is shown
Post by Dibakar Gope
lsq_unit_impl.hh
checkViolations()
------------
if
(inst_eff_addr2 >= ld_eff_addr1 && inst_eff_addr1 isLoad()) {
Post by Dibakar Gope
// If
this load is to the same block as an external snoop
Post by Dibakar Gope
// invalidate that
we've observed then the load needs to be
Post by Dibakar Gope
// squashed as it could have
newer data
Post by Dibakar Gope
if (ld_inst->hitExternalSnoop) {
if (!memDepViolator ||
ld_inst->seqNum < memDepViolator->seqNum) {
Post by Dibakar Gope
DPRINTF(LSQUnit, "Detected
fault with inst [sn:%lli] "
Post by Dibakar Gope
"and [sn:%lli] at address %#xn",
inst->seqNum, ld_inst->seqNum, ld_eff_addr1);
Post by Dibakar Gope
memDepViolator =
ld_inst;
Post by Dibakar Gope
++lsqMemOrderViolation;
return new
GenericISA::M5PanicFault(
Post by Dibakar Gope
"Detected fault with inst [sn:%lli] and "
"[sn:%lli] at address %#xn",
Post by Dibakar Gope
inst->seqNum, ld_inst->seqNum,
ld_eff_addr1);
Post by Dibakar Gope
}
}
// Otherwise, mark the load has a possible load
violation
Post by Dibakar Gope
// and if we see a snoop before it's commited, we need to
squash
Post by Dibakar Gope
ld_inst->possibleLoadViolation = true;
DPRINTF(LSQUnit,
"Found possible load violaiton at addr: %#x"
Post by Dibakar Gope
" between instructions
[sn:%lli] and [sn:%lli]n",
Post by Dibakar Gope
inst_eff_addr1, inst->seqNum,
ld_inst->seqNum);
Post by Dibakar Gope
} else {
---------------
---------------
In my understanding, if a snoop hits a younger load in lsq before it is
committed, it need to be re-executed without any constraints from
checkViolation() function to maintain stronger consistency. I was
Post by Dibakar Gope
c0 c1
St B Ld A
St
A Ld B
Post by Dibakar Gope
if Ld B in core1 is executed out-of-order and later sees a
snoop before commit, should not we re-execute Ld B without any
constraints from checkViolations() function? Am I missing something
completely over here?
Post by Dibakar Gope
Regards,
Dibakar
_______________________________________________
Post by Dibakar Gope
gem5-users mailing
list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
Dibakar Gope
2012-06-22 12:46:01 UTC
Permalink
Hi Ali,

Thanks for the response. Ok, I got the point. I thought that since the O3 attempts to support the TSO for X86 , so inherently this enforces/covers the regular load-load ordering present in any stronger consistency model. But if it inline with ARM's requirements,then does it not violate x86 and TSO's conventional load-load ordering?


thanks,
Dibakar
It completely depends on what consistency model you're going for. The current code doesn't support sequential consistency, but the load-load ordering that is enforced is inline with ARMs ordering requirements.
Ali
Hi All, I was skimming through the O3+Ruby portion of the current dev repository code, that attempts to support load-load ordering for a stronger consistency model. In existing code, L1 cache controller sends a forward_eviction_to_cpu, which in turn set the hitExternalSnoop flag of a load instruction (provided the load has not committed yet) through checksnoop() function. The checkViolation() portion of the code says that in order to be squashed and re-executed, that particular load instruction has to see another supposedly older load that maps to the same cache block (the first if () block). The code snippet is shown below: lsq_unit_impl.hh checkViolations() ------------ if (inst_eff_addr2 >= ld_eff_addr1 && inst_eff_addr1 isLoad()) { // If this load is to the same block as an externa
l snoop // invalidate that we've observed then the load needs to be // squashed as it could have newer data if (ld_inst->hitExternalSnoop) { if (!memDepViolator || ld_inst->seqNum < memDepVi
olator->seqNum) { DPRINTF(LSQUnit, "Detected fault with inst [sn:%lli] " "and [sn:%lli] at address %#x\n", inst->seqNum, ld_inst->seqNum, ld_eff_addr1); memDepViolator = ld_inst; ++lsqMemOrderViolation; return new GenericISA::M5PanicFault( "Detected fault with inst [sn:%lli] and " "[sn:%lli] at address %#x\n", inst->seqNum, ld_inst->seqNum, ld_eff_addr1); } } // Otherwise, mark the load has a possible load violation // and if we see a snoop before it's commited, we need to squash ld_inst->possibleLoadViolation = true; DPRINTF(LSQUnit, "Found possible load violaiton at addr: %#x" " between instructions [sn:%lli] and [sn:%lli]\n", inst_eff_addr1, inst->seqNum, ld_inst->seqNum); } else { --------------- --------------- In my understanding, if a snoop hits a younger load in lsq before it is c
ommitted, it need to be re-executed without any constraints from checkViolation() function to maintain stronger consistency. I was talking about the following simple example: c0 c1 St B Ld A
St A Ld B if Ld B in core1 is executed out-of-order and later sees a snoop before commit, should not we re-execute Ld B without any constraints from checkViolations() function? Am I missing something completely over here? Regards, Dibakar _______________________________________________ gem5-users mailing list gem5-***@gem5.org <gem5-***@gem5.org> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
Ali Saidi
2012-06-22 13:47:33 UTC
Permalink
HI Dibakar,

I'd have to think carefully about it, but you may be
right about TSO. I'd hope that someone who is more familiar with x86
could respond.

Thanks,

Ali

On 22.06.2012 07:46, Dibakar Gope
Post by Dibakar Gope
Hi Ali,
Thanks for the response. Ok, I got the point. I
thought that since the O3 attempts to support the TSO for X86 , so
inherently this enforces/covers the regular load-load ordering present
in any stronger consistency model. But if it inline with ARM's
requirements,then does it not violate x86 and TSO's conventional
load-load ordering?
Post by Dibakar Gope
thanks,
Dibakar
Nilay
2012-06-23 03:50:10 UTC
Permalink
What's the difference between ARM's load-load ordering and TSO? I am
guessing in ARM not all instructions are flushed from pipe, but only those
that are affected by the snoop. My understanding is that the O3 CPU
flushes the entire pipeline when it sees that an instruction needs to
execute again. Since instructions commit inorder, any load that gets
squashed would mean that all subsequent loads are squashed as well.

--
Nilay
Post by Ali Saidi
HI Dibakar,
I'd have to think carefully about it, but you may be
right about TSO. I'd hope that someone who is more familiar with x86
could respond.
Thanks,
Ali
On 22.06.2012 07:46, Dibakar Gope
Post by Dibakar Gope
Hi Ali,
Thanks for the response. Ok, I got the point. I
thought that since the O3 attempts to support the TSO for X86 , so
inherently this enforces/covers the regular load-load ordering present
in any stronger consistency model. But if it inline with ARM's
requirements,then does it not violate x86 and TSO's conventional
load-load ordering?
Post by Dibakar Gope
thanks,
Dibakar
Ali Saidi
2012-06-25 19:19:13 UTC
Permalink
ARM just requires load-load ordering (which is stronger than alpha). x86 to my knowledge requires all stores in the system to be visible in the same order.

Ali
Post by Nilay
What's the difference between ARM's load-load ordering and TSO? I am
guessing in ARM not all instructions are flushed from pipe, but only those
that are affected by the snoop. My understanding is that the O3 CPU
flushes the entire pipeline when it sees that an instruction needs to
execute again. Since instructions commit inorder, any load that gets
squashed would mean that all subsequent loads are squashed as well.
--
Nilay
Post by Ali Saidi
HI Dibakar,
I'd have to think carefully about it, but you may be
right about TSO. I'd hope that someone who is more familiar with x86
could respond.
Thanks,
Ali
On 22.06.2012 07:46, Dibakar Gope
Post by Dibakar Gope
Hi Ali,
Thanks for the response. Ok, I got the point. I
thought that since the O3 attempts to support the TSO for X86 , so
inherently this enforces/covers the regular load-load ordering present
in any stronger consistency model. But if it inline with ARM's
requirements,then does it not violate x86 and TSO's conventional
load-load ordering?
Post by Dibakar Gope
thanks,
Dibakar
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
Dibakar Gope
2012-06-27 22:08:55 UTC
Permalink
Hi Ali,

from this thread, http://www.mail-archive.com/gem5-***@gem5.org/msg00782.html, I get an idea that a snoop invalidate will make a younger load and its following younger instructions to re-execute, if only an older load in the program order to the same cache block see an updated value. But I am not still sure, if it obeys the load-load ordering of a stronger consistency model other than ARM. Suppose for example,
C0 C1
St A Ld C
St B Ld A


In the above scenario, if the memory order becomes Ld A -> St A -> St B -> Ld C and if C1 receives an invalidation for cache block A, before Ld A make it to the front of the commit queue, still checkViolations() code won't squash the Ld A and any younger instructions to maintain strong consistency.


My other doubt is that, can we make use of the squashDueToMemOrder() squash mechanism instead of using ReExec fault, if I want to squash the load A and younger instructions and re-fetch those again in the above scenario? ReExec waits for the faulted instruction to reach the front of the commit, is there any other fundamental difference of using ReExec in comparison to the squashDueToMemOrder() other than this?


Thanks,
--Dibakar
Post by Ali Saidi
ARM just requires load-load ordering (which is stronger than alpha). x86 to my knowledge requires all stores in the system to be visible in the same order.
Ali
Post by Nilay
What's the difference between ARM's load-load ordering and TSO? I am
guessing in ARM not all instructions are flushed from pipe, but only those
that are affected by the snoop. My understanding is that the O3 CPU
flushes the entire pipeline when it sees that an instruction needs to
execute again. Since instructions commit inorder, any load that gets
squashed would mean that all subsequent loads are squashed as well.
--
Nilay
Post by Ali Saidi
HI Dibakar,
I'd have to think carefully about it, but you may be
right about TSO. I'd hope that someone who is more familiar with x86
could respond.
Thanks,
Ali
On 22.06.2012 07:46, Dibakar Gope
Post by Dibakar Gope
Hi Ali,
Thanks for the response. Ok, I got the point. I
thought that since the O3 attempts to support the TSO for X86 , so
inherently this enforces/covers the regular load-load ordering present
in any stronger consistency model. But if it inline with ARM's
requirements,then does it not violate x86 and TSO's conventional
load-load ordering?
Post by Dibakar Gope
thanks,
Dibakar
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
_______________________________________________
gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
Ali Saidi
2012-06-27 23:17:28 UTC
Permalink
Hi Dibakar,

I'm not saying that I believe this is correct for x86.
It seems like x86 does require more ordering than is currently provided
by the lsq. Hopefully someone with more x86 experience could chime in
and confirm that. The faulting mechanism needs an overhaul in the o3
cpu. There shouldn't be any fundamental difference.

Thanks,

Ali

On
Post by Dibakar Gope
Hi Ali,
from this thread,
http://www.mail-archive.com/gem5-***@gem5.org/msg00782.html, I get an
idea that a snoop invalidate will make a younger load and its following
younger instructions to re-execute, if only an older load in the program
order to the same cache block see an updated value. But I am not still
sure, if it obeys the load-load ordering of a stronger consistency model
other than ARM. Suppose for example,
Post by Dibakar Gope
C0 C1
St A Ld C
St B Ld A
In the above scenario, if the memory order becomes Ld A -> St A -> St
B -> Ld C and if C1 receives an invalidation for cache block A, before
Ld A make it to the front of the commit queue, still checkViolations()
code won't squash the Ld A and any younger instructions to maintain
strong consistency.
Post by Dibakar Gope
My other doubt is that, can we make use of the
squashDueToMemOrder() squash mechanism instead of using ReExec fault, if
I want to squash the load A and younger instructions and re-fetch those
again in the above scenario? ReExec waits for the faulted instruction to
reach the front of the commit, is there any other fundamental difference
of using ReExec in comparison to the squashDueToMemOrder() other than
this?
Post by Dibakar Gope
Thanks,
--Dibakar
ARM just requires load-load ordering (which is stronger than alpha). x86
to my knowledge requires all stores in the system to be visible in the
same order. Ali On Jun 22, 2012, at 11:50 PM, Nilay wrote:
What's the difference between ARM's load-load ordering and TSO? I am
guessing in ARM not all instructions are flushed from pipe, but only
those that are affected by the snoop. My understanding is that the O3
CPU flushes the entire pipeline when it sees that an instruction needs
to execute again. Since instructions commit inorder, any load that gets
squashed would mean that all subsequent loads are squashed as well. --
Post by Dibakar Gope
HI
Dibakar, I'd have to think carefully about it, but you may be right
about TSO. I'd hope that someone who is more familiar with x86 could
Post by Dibakar Gope
Hi Ali, Thanks for the response. Ok, I got the point. I
thought that since the O3 attempts to support the TSO for X86 , so
inherently this enforces/covers the regular load-load ordering present
in any stronger consistency model. But if it inline with ARM's
requirements,then does it not violate x86 and TSO's conventional
load-load ordering?
Post by Dibakar Gope
thanks, Dibakar
_______________________________________________ gem5-users mailing list
gem5-***@gem5.org [1]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users [2]
_______________________________________________ gem5-users mailing list
gem5-***@gem5.org [3]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users [4]
_______________________________________________
Post by Dibakar Gope
gem5-users mailing
list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users




Links:
------
[1] mailto:gem5-***@gem5.org
[2]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
[3]
mailto:gem5-***@gem5.org
[4]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
Dibakar Gope
2012-07-13 16:47:35 UTC
Permalink
Hi Nilay,

Sorry for late response, I din't check my emails since last night :).


Anyway, so the checkviolations part that we are talking about, that takes care of not having any CMP violation of coherence, but it does not re-execute a load (not at the front of the commit queue) and following younger insts upon receiving a snoop invalidation request, so in my understanding it does not enforce the strict load-load ordering of a stronger model. So i add couple of lines in checkSnoop: see the changes below
(1) the first if clause of checking the " // If there are no loads in the LSQ we don't care" condition was wrong i guess in the existing code, it actually was checking"If there are no loads in the LSQ we don't care" with the "if (load_idx == loadTail)" clause. So with an additional if clause, I make sure that if the snoop hits the front of the load queue, then nothing need to be done.
(2) further I add a clause towards the end of checkSnoop () with needSC condition to check, if the snoop hits a executed load that is not at the front of the queue, reexecutes using ReExec (hopefully ReExec squashs all the younger insts including that and re-fetches, as i understood from Ali's response)


The other changes that I did to maintain SC is to add few more constraints on the load queue to ensure store-load ordering, ie a load in the load queue can not retire from ROB until and unless the committed store instructions before that in the program order are exposed to the memory system, as a result a load can still receive snoop invalidates and need to be re-executed, if needed. I can post my changes to enforce SC for review.


template <class Impl>
void
LSQUnit<Impl>::checkSnoop(PacketPtr pkt)
{
int load_idx = loadHead;


if (!cacheBlockMask) {
assert(dcachePort);
Addr bs = dcachePort->peerBlockSize();


// Make sure we actually got a size
assert(bs != 0);


cacheBlockMask = ~(bs - 1);
}


// If there are no loads in the LSQ we don't care
if (load_idx == loadTail) {
DPRINTF(LSQUnit, "loadHead: %d, loadTail:%d\n", loadHead, loadTail);
//assert(0);
return;
}


// If this is the only load in the LSQ we don't care
if (loadTail == (load_idx + 1)) {
DPRINTF(LSQUnit, "loadHead: %d, loadTail:%d\n", loadHead, loadTail);
//assert(0);
return;
}
incrLdIdx(load_idx);
DPRINTF(LSQUnit, "Got snoop for address %#x\n", pkt->getAddr());
Addr invalidate_addr = pkt->getAddr() & cacheBlockMask;
while (load_idx != loadTail) {
DynInstPtr ld_inst = loadQueue[load_idx];


if (!ld_inst->effAddrValid || ld_inst->uncacheable()) {
incrLdIdx(load_idx);
continue;
}


Addr load_addr = ld_inst->physEffAddr & cacheBlockMask;
DPRINTF(LSQUnit, "-- inst [sn:%lli] load_addr: %#x to pktAddr:%#x\n",
ld_inst->seqNum, load_addr, invalidate_addr);


if (load_addr == invalidate_addr) {
if (ld_inst->possibleLoadViolation) {
DPRINTF(LSQUnit, "Conflicting load at addr %#x [sn:%lli]\n",
ld_inst->physEffAddr, pkt->getAddr(), ld_inst->seqNum);


// Mark the load for re-execution
ld_inst->fault = new ReExec;
} else {
// If a older load checks this and it's true
// then we might have missed the snoop
// in which case we need to invalidate to be sure
ld_inst->hitExternalSnoop = true;


if (needsSC == true){

ld_inst->fault = new ReExec;
}
}
}
incrLdIdx(load_idx);
}
return;
}
Dibakar, any progress on this front?
Post by Ali Saidi
Hi Dibakar,
I'm not saying that I believe this is correct for x86.
It seems like x86 does require more ordering than is currently provided
by the lsq. Hopefully someone with more x86 experience could chime in
and confirm that. The faulting mechanism needs an overhaul in the o3
cpu. There shouldn't be any fundamental difference.
Thanks,
Ali
On
Post by Dibakar Gope
Hi Ali,
from this thread,
idea that a snoop invalidate will make a younger load and its following
younger instructions to re-execute, if only an older load in the program
order to the same cache block see an updated value. But I am not still
sure, if it obeys the load-load ordering of a stronger consistency model
other than ARM. Suppose for example,
Post by Dibakar Gope
C0 C1
St A Ld C
St B Ld A
In the above scenario, if the memory order becomes Ld A -> St A -> St
B -> Ld C and if C1 receives an invalidation for cache block A, before
Ld A make it to the front of the commit queue, still checkViolations()
code won't squash the Ld A and any younger instructions to maintain
strong consistency.
Post by Dibakar Gope
My other doubt is that, can we make use of the
squashDueToMemOrder() squash mechanism instead of using ReExec fault, if
I want to squash the load A and younger instructions and re-fetch those
again in the above scenario? ReExec waits for the faulted instruction to
reach the front of the commit, is there any other fundamental difference
of using ReExec in comparison to the squashDueToMemOrder() other than
this?
Post by Dibakar Gope
Thanks,
--Dibakar
ARM just requires load-load ordering (which is stronger than alpha). x86
to my knowledge requires all stores in the system to be visible in the
What's the difference between ARM's load-load ordering and TSO? I am
guessing in ARM not all instructions are flushed from pipe, but only
those that are affected by the snoop. My understanding is that the O3
CPU flushes the entire pipeline when it sees that an instruction needs
to execute again. Since instructions commit inorder, any load that gets
squashed would mean that all subsequent loads are squashed as well. --
Post by Dibakar Gope
HI
Dibakar, I'd have to think carefully about it, but you may be right
about TSO. I'd hope that someone who is more familiar with x86 could
Post by Dibakar Gope
Hi Ali, Thanks for the response. Ok, I got the point. I
thought that since the O3 attempts to support the TSO for X86 , so
inherently this enforces/covers the regular load-load ordering present
in any stronger consistency model. But if it inline with ARM's
requirements,then does it not violate x86 and TSO's conventional
load-load ordering?
Post by Dibakar Gope
thanks, Dibakar
_______________________________________________ gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users [2]
_______________________________________________ gem5-users mailing list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users [4]
_______________________________________________
Post by Dibakar Gope
gem5-users mailing
list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
------
[2]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
[3]
[4]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
Ali Saidi
2012-07-13 17:28:56 UTC
Permalink
If you could post it for review it would be a lot easier to
understand since the email seems to have stripped all indenting.


Thanks,

Ali
Post by Dibakar Gope
Hi
Nilay,
Post by Dibakar Gope
Sorry for late response, I din't check my emails since last
night :).
Post by Dibakar Gope
Anyway, so the checkviolations part that we are talking
about, that takes care of not having any CMP violation of coherence, but
it does not re-execute a load (not at the front of the commit queue) and
following younger insts upon receiving a snoop invalidation request, so
in my understanding it does not enforce the strict load-load ordering of
a stronger model. So i add couple of lines in checkSnoop: see the
changes below
Post by Dibakar Gope
(1) the first if clause of checking the " // If there
are no loads in the LSQ we don't care" condition was wrong i guess in
the existing code, it actually was checking"If there are no loads in the
LSQ we don't care" with the "if (load_idx == loadTail)" clause. So with
an additional if clause, I make sure that if the snoop hits the front of
the load queue, then nothing need to be done.
Post by Dibakar Gope
(2) further I add a
clause towards the end of checkSnoop () with needSC condition to check,
if the snoop hits a executed load that is not at the front of the queue,
reexecutes using ReExec (hopefully ReExec squashs all the younger insts
including that and re-fetches, as i understood from Ali's response)
The other changes that I did to maintain SC is to add few more
constraints on the load queue to ensure store-load ordering, ie a load
in the load queue can not retire from ROB until and unless the committed
store instructions before that in the program order are exposed to the
memory system, as a result a load can still receive snoop invalidates
and need to be re-executed, if needed. I can post my changes to enforce
SC for review.
Post by Dibakar Gope
template
void
LSQUnit::checkSnoop(PacketPtr
pkt)
Post by Dibakar Gope
{
int load_idx = loadHead;
if (!cacheBlockMask) {
assert(dcachePort);
Post by Dibakar Gope
Addr bs = dcachePort->peerBlockSize();
//
Make sure we actually got a size
Post by Dibakar Gope
assert(bs != 0);
cacheBlockMask
= ~(bs - 1);
Post by Dibakar Gope
}
// If there are no loads in the LSQ we don't
care
Post by Dibakar Gope
if (load_idx == loadTail) {
DPRINTF(LSQUnit, "loadHead: %d,
loadTail:%dn", loadHead, loadTail);
Post by Dibakar Gope
//assert(0);
return;
}
//
If this is the only load in the LSQ we don't care
Post by Dibakar Gope
if (loadTail ==
(load_idx + 1)) {
Post by Dibakar Gope
DPRINTF(LSQUnit, "loadHead: %d, loadTail:%dn",
loadHead, loadTail);
Post by Dibakar Gope
//assert(0);
return;
}
incrLdIdx(load_idx);
Post by Dibakar Gope
DPRINTF(LSQUnit, "Got snoop for address %#xn",
pkt->getAddr());
Post by Dibakar Gope
Addr invalidate_addr = pkt->getAddr() &
cacheBlockMask;
Post by Dibakar Gope
while (load_idx != loadTail) {
DynInstPtr ld_inst =
loadQueue[load_idx];
Post by Dibakar Gope
if (!ld_inst->effAddrValid ||
ld_inst->uncacheable()) {
Post by Dibakar Gope
incrLdIdx(load_idx);
continue;
}
Addr load_addr = ld_inst->physEffAddr & cacheBlockMask;
DPRINTF(LSQUnit, "-- inst [sn:%lli] load_addr: %#x to pktAddr:%#xn",
ld_inst->seqNum, load_addr, invalidate_addr);
Post by Dibakar Gope
if (load_addr ==
invalidate_addr) {
Post by Dibakar Gope
if (ld_inst->possibleLoadViolation) {
DPRINTF(LSQUnit, "Conflicting load at addr %#x [sn:%lli]n",
ld_inst->physEffAddr, pkt->getAddr(), ld_inst->seqNum);
Post by Dibakar Gope
// Mark the
load for re-execution
Post by Dibakar Gope
ld_inst->fault = new ReExec;
} else {
// If
a older load checks this and it's true
Post by Dibakar Gope
// then we might have missed
the snoop
Post by Dibakar Gope
// in which case we need to invalidate to be sure
ld_inst->hitExternalSnoop = true;
Post by Dibakar Gope
if (needsSC == true){
ld_inst->fault = new ReExec;
Post by Dibakar Gope
}
}
}
incrLdIdx(load_idx);
}
return;
Post by Dibakar Gope
}
Dibakar, any
progress on this front? On Wed, 27 Jun 2012, Ali Saidi wrote:
Hi Dibakar, I'm not saying that I believe this is correct for x86. It
seems like x86 does require more ordering than is currently provided by
the lsq. Hopefully someone with more x86 experience could chime in and
confirm that. The faulting mechanism needs an overhaul in the o3 cpu.
There shouldn't be any fundamental difference. Thanks, Ali On 27.06.2012
Post by Dibakar Gope
Hi Ali, from this thread,
http://www.mail-archive.com/gem5-***@gem5.org/msg00782.html [3], I get
an idea that a snoop invalidate will make a younger load and its
following younger instructions to re-execute, if only an older load in
the program order to the same cache block see an updated value. But I am
not still sure, if it obeys the load-load ordering of a stronger
consistency model other than ARM. Suppose for example,
Post by Dibakar Gope
C0 C1
St A Ld C St B Ld A
_______________________________________________
Post by Dibakar Gope
gem5-users mailing
list
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users




Links:
------
[1] mailto:gem5-***@gem5.org
[2]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
[3]
http://www.mail-archive.com/gem5-***@gem5.org/msg00782.html
[4]
mailto:gem5-***@gem5.org
[5]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
[6]
mailto:gem5-***@gem5.org
[7]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
[8]
mailto:gem5-***@gem5.org
[9]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
Dibakar Gope
2012-07-13 17:57:38 UTC
Permalink
Sure I will do that; let me see how can I make a diff file with all the changes (changes need to be made to obey store-load ordering of a stronger model too!) and post it for review.

Thanks,
dibakar
If you could post it for review it would be a lot easier to understand since the email seems to have stripped all indenting.
Thanks,
Ali
Hi Nilay, Sorry for late response, I din't check my emails since last night :). Anyway, so the checkviolations part that we are talking about, that takes care of not having any CMP violation of coherence, but it does not re-execute a load (not at the front of the commit queue) and following younger insts upon receiving a snoop invalidation request, so in my understanding it does not enforce the strict load-load ordering of a stronger model. So i add couple of lines in checkSnoop: see the changes below (1) the first if clause of checking the " // If there are no loads in the LSQ we don't care" condition was wrong i guess in the existing code, it actually was checking"If there are no loads in the LSQ we don't care" with the "if (load_idx == loadTail)" clause. So with an additional if cla
use, I make sure that if the snoop hits the front of the load queue, then nothing need to be done. (2) further I add a clause towards the end of checkSnoop () with needSC condition to check,
if the snoop hits a executed load that is not at the front of the queue, reexecutes using ReExec (hopefully ReExec squashs all the younger insts including that and re-fetches, as i understood from Ali's response) The other changes that I did to maintain SC is to add few more constraints on the load queue to ensure store-load ordering, ie a load in the load queue can not retire from ROB until and unless the committed store instructions before that in the program order are exposed to the memory system, as a result a load can still receive snoop invalidates and need to be re-executed, if needed. I can post my changes to enforce SC for review. template void LSQUnit::checkSnoop(PacketPtr pkt) { int load_idx = loadHead; if (!cacheBlockMask) { assert(dcachePort); Addr bs = dcachePort->peerBlock
Size(); // Make sure we actually got a size assert(bs != 0); cacheBlockMask = ~(bs - 1); } // If there are no loads in the LSQ we don't care if (load_idx == loadTail) { DPRINTF(LSQUnit, "loa
dHead: %d, loadTail:%d\n", loadHead, loadTail); //assert(0); return; } // If this is the only load in the LSQ we don't care if (loadTail == (load_idx + 1)) { DPRINTF(LSQUnit, "loadHead: %d, loadTail:%d\n", loadHead, loadTail); //assert(0); return; } incrLdIdx(load_idx); DPRINTF(LSQUnit, "Got snoop for address %#x\n", pkt->getAddr()); Addr invalidate_addr = pkt->getAddr() & cacheBlockMask; while (load_idx != loadTail) { DynInstPtr ld_inst = loadQueue[load_idx]; if (!ld_inst->effAddrValid || ld_inst->uncacheable()) { incrLdIdx(load_idx); continue; } Addr load_addr = ld_inst->physEffAddr & cacheBlockMask; DPRINTF(LSQUnit, "-- inst [sn:%lli] load_addr: %#x to pktAddr:%#x\n", ld_inst->seqNum, load_addr, invalidate_addr); if (load_addr == invalidate_addr) { if (ld_inst->possibleLoadViolation) {
DPRINTF(LSQUnit, "Conflicting load at addr %#x [sn:%lli]\n", ld_inst->physEffAddr, pkt->getAddr(), ld_inst->seqNum); // Mark the load for re-execution ld_inst->fault = new ReExec; } else {
Hi Ali, from this thread,
C0 C1 St A Ld C St B Ld A
In the above scenario, if the memory order becomes Ld A -> St A -> St
B -> Ld C and if C1 receives an invalidation for cache block A, before Ld A make it to the front of the commit queue, still checkViolations() code won't squash the Ld A and any younger instructions to maintain strong consistency.
My other doubt is that, can we make use of the
squashDueToMemOrder() squash mechanism instead of using ReExec fault, if I want to squash the load A and younger instructions and re-fetch those again in the above scenario? ReExec waits for the faulted instruction to reach the front of the commit, is there any other fundamental difference of using ReExec in comparison to the squashDueToMemOrder() other than this?
HI
Hi Ali, Thanks for the response. Ok, I got the point. I
thought that since the O3 attempts to support the TSO for X86 , so inherently this enforces/covers the regular load-load ordering present in any stronger consistency model. But if it inline with ARM's requirements,then does it not violate x86 and TSO's conventional load-load ordering?
thanks, Dibakar
gem5-users mailing
list
Continue reading on narkive:
Loading...