Discussion:
[gem5-users] Understanding the SnoopFilter
Daniel Carvalho
2018-08-06 15:13:56 UTC
Permalink
Hello!
First I will give some background of what is happening. This will be used to formulate some questions.

I've got an error with the SnoopFilter after trying to implement cache compression:
panic: panic condition !(sf_item.holder & req_port) occurred: requester 1 is not a holder :( SF value 0.0
Memory Usage: 659104 KBytes
Program aborted at tick 6762500
--- BEGIN LIBC BACKTRACE ---
./build/X86/gem5.opt(_Z15print_backtracev+0x28)[0x13c1948]
./build/X86/gem5.opt(_Z12abortHandleri+0x46)[0x13d44c6]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f61720c3390]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x38)[0x7f61708a6428]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x16a)[0x7f61708a802a]
./build/X86/gem5.opt[0x7061bf]
./build/X86/gem5.opt(_ZN11SnoopFilter13lookupRequestEPK6PacketRK9SlavePort+0xc7d)[0x77ec6d]
./build/X86/gem5.opt(_ZN12CoherentXBar10recvAtomicEP6Packets+0x107)[0x74b0d7]
./build/X86/gem5.opt(_ZN5Cache18doWritebacksAtomicERNSt7__cxx114listIP6PacketSaIS3_EEE+0xc3)[0x14aa163]
./build/X86/gem5.opt(_ZN9BaseCache10recvAtomicEP6Packet+0x392)[0x14a55a2]
./build/X86/gem5.opt(_ZN12CoherentXBar10recvAtomicEP6Packets+0x686)[0x74b656]
./build/X86/gem5.opt(_ZN5Cache19handleAtomicReqMissEP6PacketP8CacheBlkRNSt7__cxx114listIS1_SaIS1_EEE+0x149)[0x14ab569]
./build/X86/gem5.opt(_ZN9BaseCache10recvAtomicEP6Packet+0x41d)[0x14a562d]
./build/X86/gem5.opt(_ZN15AtomicSimpleCPU4tickEv+0x8cf)[0x147c02f]
./build/X86/gem5.opt(_ZN10EventQueue10serviceOneEv+0xc5)[0x13c8135]
./build/X86/gem5.opt(_Z9doSimLoopP10EventQueue+0x50)[0x13e01c0]
./build/X86/gem5.opt(_Z8simulatem+0xd1b)[0x13e12ab]

I am using the following line to run Gem5:
    ./build/X86/gem5.opt --debug-flags=SnoopFilter,CacheAll ./configs/example/se.py -c <path_to_executable> --caches --l2cache --l2_size=4096

A superblock is a set of blocks that share a single tag. In this execution I allow a maximum of 8 blocks per superblock, although it can happen to other values too. The error only happens if there can be multiple instances of the same superblock coexisting in the cache (i.e., 2 blocks per superblock and that are not co-allocatable would generate two superblock entries, each with one of the blocks, but both with the same tag), however it is not certain to happen just because they coexist. This is likely to be caused by an insertion that is not being informed correctly.

Setting the debug flags to CacheAll,SnoopFilter generates the following output (isolated for the address that causes the error; "===+===" was inserted by me to inform that there are lines relative to other addresses in between).

4559500: system.cpu.dcache: createMissPacket: created ReadExReq [2f500:2f53f] from WriteReq [2f530:2f537]
4559500: system.cpu.dcache: handleAtomicReqMiss: Sending an atomic ReadExReq [2f500:2f53f]
4559500: system.tol2bus.snoop_filter: lookupRequest: src system.tol2bus.slave[1] packet ReadExReq [2f500:2f53f]
4559500: system.tol2bus.snoop_filter: lookupRequest:   SF value 0.0
4559500: system.tol2bus.snoop_filter: lookupRequest:   new SF value 2.0
4559500: system.l2: access for ReadExReq [2f500:2f53f] miss
4559500: system.l2: createMissPacket: created ReadExReq [2f500:2f53f] from ReadExReq [2f500:2f53f]
4559500: system.l2: handleAtomicReqMiss: Sending an atomic ReadExReq [2f500:2f53f]
4559500: system.membus.snoop_filter: lookupRequest: src system.membus.slave[1] packet ReadExReq [2f500:2f53f]
4559500: system.membus.snoop_filter: lookupRequest:   SF value 0.0
4559500: system.membus.snoop_filter: lookupRequest:   new SF value 1.0
4559500: system.membus.snoop_filter: updateResponse: src system.membus.slave[1] packet ReadExResp [2f500:2f53f]
4559500: system.membus.snoop_filter: updateResponse:   old SF value 1.0
4559500: system.membus.snoop_filter: updateResponse:   new SF value 0.1
4559500: system.l2: handleAtomicReqMiss: Receive response: ReadExResp [2f500:2f53f] in state 0
4559500: system.l2.compressor: Compressed cache line from 512 to 32 bits. Compression latency: 13, decompression latency: 8
4559500: system.l2.tags: set 2, way 5, superblock offset 4: selecting blk for replacement
4559500: system.l2: Block addr 0x2f500 (ns) moving from state 0 to state: 87 (E) valid: 1 writable: 1 readable: 1 dirty: 0 tag: 2f
===+===
4559500: system.tol2bus.snoop_filter: updateResponse: src system.tol2bus.slave[1] packet ReadExResp [2f500:2f53f]
4559500: system.tol2bus.snoop_filter: updateResponse:   old SF value 2.0
4559500: system.tol2bus.snoop_filter: updateResponse:   new SF value 0.2
4559500: system.cpu.dcache: handleAtomicReqMiss: Receive response: ReadExResp [2f500:2f53f] in state 0
4559500: system.cpu.dcache.compressor: Compressed cache line from 512 to 32 bits. Compression latency: 13, decompression latency: 8
4559500: system.cpu.dcache.tags: set 17a, way 0, superblock offset 4: selecting blk for replacement
4559500: system.cpu.dcache: Block addr 0x2f500 (ns) moving from state 0 to state: 87 (E) valid: 1 writable: 1 readable: 1 dirty: 0 tag: 0
===+===
6682500: system.cpu.dcache: Create Writeback WritebackDirty [2f500:2f53f] writable: 1, dirty: 1
===+===
6682500: system.l2: access for WritebackDirty [2f4c0:2f4ff] miss
6682500: system.l2.compressor: Compressed cache line from 512 to 416 bits. Compression latency: 13, decompression latency: 8
6682500: system.l2.tags: set 2, way 5, superblock offset 3: selecting blk for replacement
6682500: system.l2: Create CleanEvict CleanEvict [2f500:2f53f]
6682500: system.l2: access new state is state: f (M) valid: 1 writable: 1 readable: 1 dirty: 1 tag: 2f
6682500: system.tol2bus.snoop_filter: lookupSnoop: packet CleanEvict [2f500:2f53f]
6682500: system.tol2bus.snoop_filter: lookupSnoop:   old SF value 0.2
6682500: system.tol2bus.snoop_filter: lookupSnoop:   new SF value 0.2 interest: 2
6682500: system.cpu.dcache: handleSnoop: for CleanEvict [2f500:2f53f]
6682500: system.cpu.dcache: handleSnoop: snoop miss for CleanEvict [2f500:2f53f]
6682500: system.membus.snoop_filter: lookupRequest: src system.membus.slave[1] packet CleanEvict [2f500:2f53f]
6682500: system.membus.snoop_filter: lookupRequest:   SF value 0.1
6682500: system.membus.snoop_filter: lookupRequest:   new SF value 0.0
6682500: system.membus.snoop_filter: eraseIfNullEntry:   Removed SF entry.
6682500: system.tol2bus.snoop_filter: lookupRequest: src system.tol2bus.slave[1] packet WritebackDirty [2f500:2f53f]
6682500: system.tol2bus.snoop_filter: lookupRequest:   SF value 0.2
6682500: system.tol2bus.snoop_filter: lookupRequest:   new SF value 0.0
6682500: system.tol2bus.snoop_filter: eraseIfNullEntry:   Removed SF entry.
6682500: system.l2: access for WritebackDirty [2f500:2f53f] miss
6682500: system.l2.compressor: Compressed cache line from 512 to 200 bits. Compression latency: 13, decompression latency: 8
===+===
6709500: system.cpu.dcache: functionalAccess: WriteReq [2f500:2f51f]
6709500: system.cpu.icache: functionalAccess: WriteReq [2f500:2f51f]
6709500: system.cpu.itb_walker_cache: functionalAccess: WriteReq [2f500:2f51f]
6709500: system.cpu.dtb_walker_cache: functionalAccess: WriteReq [2f500:2f51f]
6709500: system.l2: functionalAccess: WriteReq [2f500:2f51f] valid
===+===
6762500: system.l2.tags: set 2, way 6, superblock offset 3: selecting blk for replacement
6762500: system.l2: Create Writeback WritebackDirty [2f500:2f53f] writable: 1, dirty: 1
6762500: system.l2: Block addr 0xc4c0 (ns) moving from state 0 to state: 7 (E) valid: 1 writable: 1 readable: 1 dirty: 0 tag: c
6762500: system.tol2bus.snoop_filter: lookupSnoop: packet WritebackDirty [2f500:2f53f]
6762500: system.membus.snoop_filter: lookupRequest: src system.membus.slave[1] packet WritebackDirty [2f500:2f53f]
6762500: system.membus.snoop_filter: lookupRequest:   SF value 0.0



Comments on what I think is going on during execution:
4559500: First miss that inserts addr 2f500 in the caches. For the L2 cache it is inserted at set 2, way 5, superblock offset 4 in compressed format (32 <= 512/8). For the L1D cache it is inserted at set 17A, way 0, superblock offset 3 also in compressed format (compression is being used in the whole cache hierarchy).

6682500: Access to 2f4c0 generates a miss which replaces set 2, way 5, offset 3. This is a superblock tag conflict, so 2f500 must be evicted. L1D's 2f500 became dirty at some point (can't track this), so a WritebackDirty is needed. Why did it generate a CleanEvict too?

6709500: The L1D(?) copy is modified (write), but the copy had been evicted, and no calls to insert were made. How is this possible? Does it mean that it was a functional insert that skipped the compression process?

6762500: A replacement is caused by address c4c0. I don't know what causes this writeback, since at no point was 2f500's superblock re-inserted. Maybe it was due to the functional access magic.

Could anyone help me understand the snoop filtering process (who is the requester in the debugging lines?), and check if my comments on what is hapenning actually makes sense?

Thank you in advance!
Nikos Nikoleris
2018-08-06 17:18:44 UTC
Permalink
Hi Daniel

It seems that there are two different evictions taking place.


1. First the dcache needs to allocate and for that it evicts the dirty line:
6682500: system.cpu.dcache: Create Writeback WritebackDirty [2f500:2f53f] writable: 1, dirty: 1
I am not quite sure what happens to it, I am guessing you omitted the lines that follow. But I would expect that it finds the block in the L2, updates it and set the dirty bit.


1. Then the L2 cache needs to allocate and similarly picks the line in question of replacement. It creates a clean eviction although I would expect to see a WritebackDirty.
6682500: system.l2: Create CleanEvict CleanEvict [2f500:2f53f]

My impression is that there is some problem with the first WritebackDirty and that’s what triggers a snoop filter assertion which seems to have observed a double eviction for the line [2f000:2f53f].

Nikos

From: gem5-users <gem5-users-***@gem5.org> on behalf of "***@yahoo.com.br Daniel Carvalho" <***@yahoo.com.br>
Reply-To: Gem5 Users Mailing List <gem5-***@gem5.org>
Date: Monday, 6 August 2018 at 16:14
To: Gem5 Users Mailing List <gem5-***@gem5.org>
Subject: [gem5-users] Understanding the SnoopFilter

Hello!

First I will give some background of what is happening. This will be used to formulate some questions.

I've got an error with the SnoopFilter after trying to implement cache compression:

panic: panic condition !(sf_item.holder & req_port) occurred: requester 1 is not a holder :( SF value 0.0
Memory Usage: 659104 KBytes
Program aborted at tick 6762500
--- BEGIN LIBC BACKTRACE ---
./build/X86/gem5.opt(_Z15print_backtracev+0x28)[0x13c1948]
./build/X86/gem5.opt(_Z12abortHandleri+0x46)[0x13d44c6]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f61720c3390]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x38)[0x7f61708a6428]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x16a)[0x7f61708a802a]
./build/X86/gem5.opt[0x7061bf]
./build/X86/gem5.opt(_ZN11SnoopFilter13lookupRequestEPK6PacketRK9SlavePort+0xc7d)[0x77ec6d]
./build/X86/gem5.opt(_ZN12CoherentXBar10recvAtomicEP6Packets+0x107)[0x74b0d7]
./build/X86/gem5.opt(_ZN5Cache18doWritebacksAtomicERNSt7__cxx114listIP6PacketSaIS3_EEE+0xc3)[0x14aa163]
./build/X86/gem5.opt(_ZN9BaseCache10recvAtomicEP6Packet+0x392)[0x14a55a2]
./build/X86/gem5.opt(_ZN12CoherentXBar10recvAtomicEP6Packets+0x686)[0x74b656]
./build/X86/gem5.opt(_ZN5Cache19handleAtomicReqMissEP6PacketP8CacheBlkRNSt7__cxx114listIS1_SaIS1_EEE+0x149)[0x14ab569]
./build/X86/gem5.opt(_ZN9BaseCache10recvAtomicEP6Packet+0x41d)[0x14a562d]
./build/X86/gem5.opt(_ZN15AtomicSimpleCPU4tickEv+0x8cf)[0x147c02f]
./build/X86/gem5.opt(_ZN10EventQueue10serviceOneEv+0xc5)[0x13c8135]
./build/X86/gem5.opt(_Z9doSimLoopP10EventQueue+0x50)[0x13e01c0]
./build/X86/gem5.opt(_Z8simulatem+0xd1b)[0x13e12ab]

I am using the following line to run Gem5:
./build/X86/gem5.opt --debug-flags=SnoopFilter,CacheAll ./configs/example/se.py -c <path_to_executable> --caches --l2cache --l2_size=4096

A superblock is a set of blocks that share a single tag. In this execution I allow a maximum of 8 blocks per superblock, although it can happen to other values too. The error only happens if there can be multiple instances of the same superblock coexisting in the cache (i.e., 2 blocks per superblock and that are not co-allocatable would generate two superblock entries, each with one of the blocks, but both with the same tag), however it is not certain to happen just because they coexist. This is likely to be caused by an insertion that is not being informed correctly.

Setting the debug flags to CacheAll,SnoopFilter generates the following output (isolated for the address that causes the error; "===+===" was inserted by me to inform that there are lines relative to other addresses in between).

4559500: system.cpu.dcache: createMissPacket: created ReadExReq [2f500:2f53f] from WriteReq [2f530:2f537]
4559500: system.cpu.dcache: handleAtomicReqMiss: Sending an atomic ReadExReq [2f500:2f53f]
4559500: system.tol2bus.snoop_filter: lookupRequest: src system.tol2bus.slave[1] packet ReadExReq [2f500:2f53f]
4559500: system.tol2bus.snoop_filter: lookupRequest: SF value 0.0
4559500: system.tol2bus.snoop_filter: lookupRequest: new SF value 2.0
4559500: system.l2: access for ReadExReq [2f500:2f53f] miss
4559500: system.l2: createMissPacket: created ReadExReq [2f500:2f53f] from ReadExReq [2f500:2f53f]
4559500: system.l2: handleAtomicReqMiss: Sending an atomic ReadExReq [2f500:2f53f]
4559500: system.membus.snoop_filter: lookupRequest: src system.membus.slave[1] packet ReadExReq [2f500:2f53f]
4559500: system.membus.snoop_filter: lookupRequest: SF value 0.0
4559500: system.membus.snoop_filter: lookupRequest: new SF value 1.0
4559500: system.membus.snoop_filter: updateResponse: src system.membus.slave[1] packet ReadExResp [2f500:2f53f]
4559500: system.membus.snoop_filter: updateResponse: old SF value 1.0
4559500: system.membus.snoop_filter: updateResponse: new SF value 0.1
4559500: system.l2: handleAtomicReqMiss: Receive response: ReadExResp [2f500:2f53f] in state 0
4559500: system.l2.compressor: Compressed cache line from 512 to 32 bits. Compression latency: 13, decompression latency: 8
4559500: system.l2.tags: set 2, way 5, superblock offset 4: selecting blk for replacement
4559500: system.l2: Block addr 0x2f500 (ns) moving from state 0 to state: 87 (E) valid: 1 writable: 1 readable: 1 dirty: 0 tag: 2f
===+===
4559500: system.tol2bus.snoop_filter: updateResponse: src system.tol2bus.slave[1] packet ReadExResp [2f500:2f53f]
4559500: system.tol2bus.snoop_filter: updateResponse: old SF value 2.0
4559500: system.tol2bus.snoop_filter: updateResponse: new SF value 0.2
4559500: system.cpu.dcache: handleAtomicReqMiss: Receive response: ReadExResp [2f500:2f53f] in state 0
4559500: system.cpu.dcache.compressor: Compressed cache line from 512 to 32 bits. Compression latency: 13, decompression latency: 8
4559500: system.cpu.dcache.tags: set 17a, way 0, superblock offset 4: selecting blk for replacement
4559500: system.cpu.dcache: Block addr 0x2f500 (ns) moving from state 0 to state: 87 (E) valid: 1 writable: 1 readable: 1 dirty: 0 tag: 0
===+===
6682500: system.cpu.dcache: Create Writeback WritebackDirty [2f500:2f53f] writable: 1, dirty: 1
===+===
6682500: system.l2: access for WritebackDirty [2f4c0:2f4ff] miss
6682500: system.l2.compressor: Compressed cache line from 512 to 416 bits. Compression latency: 13, decompression latency: 8
6682500: system.l2.tags: set 2, way 5, superblock offset 3: selecting blk for replacement
6682500: system.l2: Create CleanEvict CleanEvict [2f500:2f53f]
6682500: system.l2: access new state is state: f (M) valid: 1 writable: 1 readable: 1 dirty: 1 tag: 2f
6682500: system.tol2bus.snoop_filter: lookupSnoop: packet CleanEvict [2f500:2f53f]
6682500: system.tol2bus.snoop_filter: lookupSnoop: old SF value 0.2
6682500: system.tol2bus.snoop_filter: lookupSnoop: new SF value 0.2 interest: 2
6682500: system.cpu.dcache: handleSnoop: for CleanEvict [2f500:2f53f]
6682500: system.cpu.dcache: handleSnoop: snoop miss for CleanEvict [2f500:2f53f]
6682500: system.membus.snoop_filter: lookupRequest: src system.membus.slave[1] packet CleanEvict [2f500:2f53f]
6682500: system.membus.snoop_filter: lookupRequest: SF value 0.1
6682500: system.membus.snoop_filter: lookupRequest: new SF value 0.0
6682500: system.membus.snoop_filter: eraseIfNullEntry: Removed SF entry.
6682500: system.tol2bus.snoop_filter: lookupRequest: src system.tol2bus.slave[1] packet WritebackDirty [2f500:2f53f]
6682500: system.tol2bus.snoop_filter: lookupRequest: SF value 0.2
6682500: system.tol2bus.snoop_filter: lookupRequest: new SF value 0.0
6682500: system.tol2bus.snoop_filter: eraseIfNullEntry: Removed SF entry.
6682500: system.l2: access for WritebackDirty [2f500:2f53f] miss
6682500: system.l2.compressor: Compressed cache line from 512 to 200 bits. Compression latency: 13, decompression latency: 8
===+===
6709500: system.cpu.dcache: functionalAccess: WriteReq [2f500:2f51f]
6709500: system.cpu.icache: functionalAccess: WriteReq [2f500:2f51f]
6709500: system.cpu.itb_walker_cache: functionalAccess: WriteReq [2f500:2f51f]
6709500: system.cpu.dtb_walker_cache: functionalAccess: WriteReq [2f500:2f51f]
6709500: system.l2: functionalAccess: WriteReq [2f500:2f51f] valid
===+===
6762500: system.l2.tags: set 2, way 6, superblock offset 3: selecting blk for replacement
6762500: system.l2: Create Writeback WritebackDirty [2f500:2f53f] writable: 1, dirty: 1
6762500: system.l2: Block addr 0xc4c0 (ns) moving from state 0 to state: 7 (E) valid: 1 writable: 1 readable: 1 dirty: 0 tag: c
6762500: system.tol2bus.snoop_filter: lookupSnoop: packet WritebackDirty [2f500:2f53f]
6762500: system.membus.snoop_filter: lookupRequest: src system.membus.slave[1] packet WritebackDirty [2f500:2f53f]
6762500: system.membus.snoop_filter: lookupRequest: SF value 0.0



Comments on what I think is going on during execution:
4559500: First miss that inserts addr 2f500 in the caches. For the L2 cache it is inserted at set 2, way 5, superblock offset 4 in compressed format (32 <= 512/8). For the L1D cache it is inserted at set 17A, way 0, superblock offset 3 also in compressed format (compression is being used in the whole cache hierarchy).

6682500: Access to 2f4c0 generates a miss which replaces set 2, way 5, offset 3. This is a superblock tag conflict, so 2f500 must be evicted. L1D's 2f500 became dirty at some point (can't track this), so a WritebackDirty is needed. Why did it generate a CleanEvict too?

6709500: The L1D(?) copy is modified (write), but the copy had been evicted, and no calls to insert were made. How is this possible? Does it mean that it was a functional insert that skipped the compression process?

6762500: A replacement is caused by address c4c0. I don't know what causes this writeback, since at no point was 2f500's superblock re-inserted. Maybe it was due to the functional access magic.

Could anyone help me understand the snoop filtering process (who is the requester in the debugging lines?), and check if my comments on what is hapenning actually makes sense?

Thank you in advance!
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Loading...