Daniel Carvalho
2018-08-06 15:13:56 UTC
Hello!
First I will give some background of what is happening. This will be used to formulate some questions.
I've got an error with the SnoopFilter after trying to implement cache compression:
panic: panic condition !(sf_item.holder & req_port) occurred: requester 1 is not a holder :( SF value 0.0
Memory Usage: 659104 KBytes
Program aborted at tick 6762500
--- BEGIN LIBC BACKTRACE ---
./build/X86/gem5.opt(_Z15print_backtracev+0x28)[0x13c1948]
./build/X86/gem5.opt(_Z12abortHandleri+0x46)[0x13d44c6]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f61720c3390]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x38)[0x7f61708a6428]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x16a)[0x7f61708a802a]
./build/X86/gem5.opt[0x7061bf]
./build/X86/gem5.opt(_ZN11SnoopFilter13lookupRequestEPK6PacketRK9SlavePort+0xc7d)[0x77ec6d]
./build/X86/gem5.opt(_ZN12CoherentXBar10recvAtomicEP6Packets+0x107)[0x74b0d7]
./build/X86/gem5.opt(_ZN5Cache18doWritebacksAtomicERNSt7__cxx114listIP6PacketSaIS3_EEE+0xc3)[0x14aa163]
./build/X86/gem5.opt(_ZN9BaseCache10recvAtomicEP6Packet+0x392)[0x14a55a2]
./build/X86/gem5.opt(_ZN12CoherentXBar10recvAtomicEP6Packets+0x686)[0x74b656]
./build/X86/gem5.opt(_ZN5Cache19handleAtomicReqMissEP6PacketP8CacheBlkRNSt7__cxx114listIS1_SaIS1_EEE+0x149)[0x14ab569]
./build/X86/gem5.opt(_ZN9BaseCache10recvAtomicEP6Packet+0x41d)[0x14a562d]
./build/X86/gem5.opt(_ZN15AtomicSimpleCPU4tickEv+0x8cf)[0x147c02f]
./build/X86/gem5.opt(_ZN10EventQueue10serviceOneEv+0xc5)[0x13c8135]
./build/X86/gem5.opt(_Z9doSimLoopP10EventQueue+0x50)[0x13e01c0]
./build/X86/gem5.opt(_Z8simulatem+0xd1b)[0x13e12ab]
I am using the following line to run Gem5:
   ./build/X86/gem5.opt --debug-flags=SnoopFilter,CacheAll ./configs/example/se.py -c <path_to_executable> --caches --l2cache --l2_size=4096
A superblock is a set of blocks that share a single tag. In this execution I allow a maximum of 8 blocks per superblock, although it can happen to other values too. The error only happens if there can be multiple instances of the same superblock coexisting in the cache (i.e., 2 blocks per superblock and that are not co-allocatable would generate two superblock entries, each with one of the blocks, but both with the same tag), however it is not certain to happen just because they coexist. This is likely to be caused by an insertion that is not being informed correctly.
Setting the debug flags to CacheAll,SnoopFilter generates the following output (isolated for the address that causes the error; "===+===" was inserted by me to inform that there are lines relative to other addresses in between).
4559500: system.cpu.dcache: createMissPacket: created ReadExReq [2f500:2f53f] from WriteReq [2f530:2f537]
4559500: system.cpu.dcache: handleAtomicReqMiss: Sending an atomic ReadExReq [2f500:2f53f]
4559500: system.tol2bus.snoop_filter: lookupRequest: src system.tol2bus.slave[1] packet ReadExReq [2f500:2f53f]
4559500: system.tol2bus.snoop_filter: lookupRequest:Â Â SF value 0.0
4559500: system.tol2bus.snoop_filter: lookupRequest:Â Â new SF value 2.0
4559500: system.l2: access for ReadExReq [2f500:2f53f] miss
4559500: system.l2: createMissPacket: created ReadExReq [2f500:2f53f] from ReadExReq [2f500:2f53f]
4559500: system.l2: handleAtomicReqMiss: Sending an atomic ReadExReq [2f500:2f53f]
4559500: system.membus.snoop_filter: lookupRequest: src system.membus.slave[1] packet ReadExReq [2f500:2f53f]
4559500: system.membus.snoop_filter: lookupRequest:Â Â SF value 0.0
4559500: system.membus.snoop_filter: lookupRequest:Â Â new SF value 1.0
4559500: system.membus.snoop_filter: updateResponse: src system.membus.slave[1] packet ReadExResp [2f500:2f53f]
4559500: system.membus.snoop_filter: updateResponse:Â Â old SF value 1.0
4559500: system.membus.snoop_filter: updateResponse:Â Â new SF value 0.1
4559500: system.l2: handleAtomicReqMiss: Receive response: ReadExResp [2f500:2f53f] in state 0
4559500: system.l2.compressor: Compressed cache line from 512 to 32 bits. Compression latency: 13, decompression latency: 8
4559500: system.l2.tags: set 2, way 5, superblock offset 4: selecting blk for replacement
4559500: system.l2: Block addr 0x2f500 (ns) moving from state 0 to state: 87 (E) valid: 1 writable: 1 readable: 1 dirty: 0 tag: 2f
===+===
4559500: system.tol2bus.snoop_filter: updateResponse: src system.tol2bus.slave[1] packet ReadExResp [2f500:2f53f]
4559500: system.tol2bus.snoop_filter: updateResponse:Â Â old SF value 2.0
4559500: system.tol2bus.snoop_filter: updateResponse:Â Â new SF value 0.2
4559500: system.cpu.dcache: handleAtomicReqMiss: Receive response: ReadExResp [2f500:2f53f] in state 0
4559500: system.cpu.dcache.compressor: Compressed cache line from 512 to 32 bits. Compression latency: 13, decompression latency: 8
4559500: system.cpu.dcache.tags: set 17a, way 0, superblock offset 4: selecting blk for replacement
4559500: system.cpu.dcache: Block addr 0x2f500 (ns) moving from state 0 to state: 87 (E) valid: 1 writable: 1 readable: 1 dirty: 0 tag: 0
===+===
6682500: system.cpu.dcache: Create Writeback WritebackDirty [2f500:2f53f] writable: 1, dirty: 1
===+===
6682500: system.l2: access for WritebackDirty [2f4c0:2f4ff] miss
6682500: system.l2.compressor: Compressed cache line from 512 to 416 bits. Compression latency: 13, decompression latency: 8
6682500: system.l2.tags: set 2, way 5, superblock offset 3: selecting blk for replacement
6682500: system.l2: Create CleanEvict CleanEvict [2f500:2f53f]
6682500: system.l2: access new state is state: f (M) valid: 1 writable: 1 readable: 1 dirty: 1 tag: 2f
6682500: system.tol2bus.snoop_filter: lookupSnoop: packet CleanEvict [2f500:2f53f]
6682500: system.tol2bus.snoop_filter: lookupSnoop:Â Â old SF value 0.2
6682500: system.tol2bus.snoop_filter: lookupSnoop:Â Â new SF value 0.2 interest: 2
6682500: system.cpu.dcache: handleSnoop: for CleanEvict [2f500:2f53f]
6682500: system.cpu.dcache: handleSnoop: snoop miss for CleanEvict [2f500:2f53f]
6682500: system.membus.snoop_filter: lookupRequest: src system.membus.slave[1] packet CleanEvict [2f500:2f53f]
6682500: system.membus.snoop_filter: lookupRequest:Â Â SF value 0.1
6682500: system.membus.snoop_filter: lookupRequest:Â Â new SF value 0.0
6682500: system.membus.snoop_filter: eraseIfNullEntry:Â Â Removed SF entry.
6682500: system.tol2bus.snoop_filter: lookupRequest: src system.tol2bus.slave[1] packet WritebackDirty [2f500:2f53f]
6682500: system.tol2bus.snoop_filter: lookupRequest:Â Â SF value 0.2
6682500: system.tol2bus.snoop_filter: lookupRequest:Â Â new SF value 0.0
6682500: system.tol2bus.snoop_filter: eraseIfNullEntry:Â Â Removed SF entry.
6682500: system.l2: access for WritebackDirty [2f500:2f53f] miss
6682500: system.l2.compressor: Compressed cache line from 512 to 200 bits. Compression latency: 13, decompression latency: 8
===+===
6709500: system.cpu.dcache: functionalAccess: WriteReq [2f500:2f51f]
6709500: system.cpu.icache: functionalAccess: WriteReq [2f500:2f51f]
6709500: system.cpu.itb_walker_cache: functionalAccess: WriteReq [2f500:2f51f]
6709500: system.cpu.dtb_walker_cache: functionalAccess: WriteReq [2f500:2f51f]
6709500: system.l2: functionalAccess: WriteReq [2f500:2f51f] valid
===+===
6762500: system.l2.tags: set 2, way 6, superblock offset 3: selecting blk for replacement
6762500: system.l2: Create Writeback WritebackDirty [2f500:2f53f] writable: 1, dirty: 1
6762500: system.l2: Block addr 0xc4c0 (ns) moving from state 0 to state: 7 (E) valid: 1 writable: 1 readable: 1 dirty: 0 tag: c
6762500: system.tol2bus.snoop_filter: lookupSnoop: packet WritebackDirty [2f500:2f53f]
6762500: system.membus.snoop_filter: lookupRequest: src system.membus.slave[1] packet WritebackDirty [2f500:2f53f]
6762500: system.membus.snoop_filter: lookupRequest:Â Â SF value 0.0
Comments on what I think is going on during execution:
4559500: First miss that inserts addr 2f500 in the caches. For the L2 cache it is inserted at set 2, way 5, superblock offset 4 in compressed format (32 <= 512/8). For the L1D cache it is inserted at set 17A, way 0, superblock offset 3 also in compressed format (compression is being used in the whole cache hierarchy).
6682500: Access to 2f4c0 generates a miss which replaces set 2, way 5, offset 3. This is a superblock tag conflict, so 2f500 must be evicted. L1D's 2f500 became dirty at some point (can't track this), so a WritebackDirty is needed. Why did it generate a CleanEvict too?
6709500: The L1D(?) copy is modified (write), but the copy had been evicted, and no calls to insert were made. How is this possible? Does it mean that it was a functional insert that skipped the compression process?
6762500: A replacement is caused by address c4c0. I don't know what causes this writeback, since at no point was 2f500's superblock re-inserted. Maybe it was due to the functional access magic.
Could anyone help me understand the snoop filtering process (who is the requester in the debugging lines?), and check if my comments on what is hapenning actually makes sense?
Thank you in advance!
First I will give some background of what is happening. This will be used to formulate some questions.
I've got an error with the SnoopFilter after trying to implement cache compression:
panic: panic condition !(sf_item.holder & req_port) occurred: requester 1 is not a holder :( SF value 0.0
Memory Usage: 659104 KBytes
Program aborted at tick 6762500
--- BEGIN LIBC BACKTRACE ---
./build/X86/gem5.opt(_Z15print_backtracev+0x28)[0x13c1948]
./build/X86/gem5.opt(_Z12abortHandleri+0x46)[0x13d44c6]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f61720c3390]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x38)[0x7f61708a6428]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x16a)[0x7f61708a802a]
./build/X86/gem5.opt[0x7061bf]
./build/X86/gem5.opt(_ZN11SnoopFilter13lookupRequestEPK6PacketRK9SlavePort+0xc7d)[0x77ec6d]
./build/X86/gem5.opt(_ZN12CoherentXBar10recvAtomicEP6Packets+0x107)[0x74b0d7]
./build/X86/gem5.opt(_ZN5Cache18doWritebacksAtomicERNSt7__cxx114listIP6PacketSaIS3_EEE+0xc3)[0x14aa163]
./build/X86/gem5.opt(_ZN9BaseCache10recvAtomicEP6Packet+0x392)[0x14a55a2]
./build/X86/gem5.opt(_ZN12CoherentXBar10recvAtomicEP6Packets+0x686)[0x74b656]
./build/X86/gem5.opt(_ZN5Cache19handleAtomicReqMissEP6PacketP8CacheBlkRNSt7__cxx114listIS1_SaIS1_EEE+0x149)[0x14ab569]
./build/X86/gem5.opt(_ZN9BaseCache10recvAtomicEP6Packet+0x41d)[0x14a562d]
./build/X86/gem5.opt(_ZN15AtomicSimpleCPU4tickEv+0x8cf)[0x147c02f]
./build/X86/gem5.opt(_ZN10EventQueue10serviceOneEv+0xc5)[0x13c8135]
./build/X86/gem5.opt(_Z9doSimLoopP10EventQueue+0x50)[0x13e01c0]
./build/X86/gem5.opt(_Z8simulatem+0xd1b)[0x13e12ab]
I am using the following line to run Gem5:
   ./build/X86/gem5.opt --debug-flags=SnoopFilter,CacheAll ./configs/example/se.py -c <path_to_executable> --caches --l2cache --l2_size=4096
A superblock is a set of blocks that share a single tag. In this execution I allow a maximum of 8 blocks per superblock, although it can happen to other values too. The error only happens if there can be multiple instances of the same superblock coexisting in the cache (i.e., 2 blocks per superblock and that are not co-allocatable would generate two superblock entries, each with one of the blocks, but both with the same tag), however it is not certain to happen just because they coexist. This is likely to be caused by an insertion that is not being informed correctly.
Setting the debug flags to CacheAll,SnoopFilter generates the following output (isolated for the address that causes the error; "===+===" was inserted by me to inform that there are lines relative to other addresses in between).
4559500: system.cpu.dcache: createMissPacket: created ReadExReq [2f500:2f53f] from WriteReq [2f530:2f537]
4559500: system.cpu.dcache: handleAtomicReqMiss: Sending an atomic ReadExReq [2f500:2f53f]
4559500: system.tol2bus.snoop_filter: lookupRequest: src system.tol2bus.slave[1] packet ReadExReq [2f500:2f53f]
4559500: system.tol2bus.snoop_filter: lookupRequest:Â Â SF value 0.0
4559500: system.tol2bus.snoop_filter: lookupRequest:Â Â new SF value 2.0
4559500: system.l2: access for ReadExReq [2f500:2f53f] miss
4559500: system.l2: createMissPacket: created ReadExReq [2f500:2f53f] from ReadExReq [2f500:2f53f]
4559500: system.l2: handleAtomicReqMiss: Sending an atomic ReadExReq [2f500:2f53f]
4559500: system.membus.snoop_filter: lookupRequest: src system.membus.slave[1] packet ReadExReq [2f500:2f53f]
4559500: system.membus.snoop_filter: lookupRequest:Â Â SF value 0.0
4559500: system.membus.snoop_filter: lookupRequest:Â Â new SF value 1.0
4559500: system.membus.snoop_filter: updateResponse: src system.membus.slave[1] packet ReadExResp [2f500:2f53f]
4559500: system.membus.snoop_filter: updateResponse:Â Â old SF value 1.0
4559500: system.membus.snoop_filter: updateResponse:Â Â new SF value 0.1
4559500: system.l2: handleAtomicReqMiss: Receive response: ReadExResp [2f500:2f53f] in state 0
4559500: system.l2.compressor: Compressed cache line from 512 to 32 bits. Compression latency: 13, decompression latency: 8
4559500: system.l2.tags: set 2, way 5, superblock offset 4: selecting blk for replacement
4559500: system.l2: Block addr 0x2f500 (ns) moving from state 0 to state: 87 (E) valid: 1 writable: 1 readable: 1 dirty: 0 tag: 2f
===+===
4559500: system.tol2bus.snoop_filter: updateResponse: src system.tol2bus.slave[1] packet ReadExResp [2f500:2f53f]
4559500: system.tol2bus.snoop_filter: updateResponse:Â Â old SF value 2.0
4559500: system.tol2bus.snoop_filter: updateResponse:Â Â new SF value 0.2
4559500: system.cpu.dcache: handleAtomicReqMiss: Receive response: ReadExResp [2f500:2f53f] in state 0
4559500: system.cpu.dcache.compressor: Compressed cache line from 512 to 32 bits. Compression latency: 13, decompression latency: 8
4559500: system.cpu.dcache.tags: set 17a, way 0, superblock offset 4: selecting blk for replacement
4559500: system.cpu.dcache: Block addr 0x2f500 (ns) moving from state 0 to state: 87 (E) valid: 1 writable: 1 readable: 1 dirty: 0 tag: 0
===+===
6682500: system.cpu.dcache: Create Writeback WritebackDirty [2f500:2f53f] writable: 1, dirty: 1
===+===
6682500: system.l2: access for WritebackDirty [2f4c0:2f4ff] miss
6682500: system.l2.compressor: Compressed cache line from 512 to 416 bits. Compression latency: 13, decompression latency: 8
6682500: system.l2.tags: set 2, way 5, superblock offset 3: selecting blk for replacement
6682500: system.l2: Create CleanEvict CleanEvict [2f500:2f53f]
6682500: system.l2: access new state is state: f (M) valid: 1 writable: 1 readable: 1 dirty: 1 tag: 2f
6682500: system.tol2bus.snoop_filter: lookupSnoop: packet CleanEvict [2f500:2f53f]
6682500: system.tol2bus.snoop_filter: lookupSnoop:Â Â old SF value 0.2
6682500: system.tol2bus.snoop_filter: lookupSnoop:Â Â new SF value 0.2 interest: 2
6682500: system.cpu.dcache: handleSnoop: for CleanEvict [2f500:2f53f]
6682500: system.cpu.dcache: handleSnoop: snoop miss for CleanEvict [2f500:2f53f]
6682500: system.membus.snoop_filter: lookupRequest: src system.membus.slave[1] packet CleanEvict [2f500:2f53f]
6682500: system.membus.snoop_filter: lookupRequest:Â Â SF value 0.1
6682500: system.membus.snoop_filter: lookupRequest:Â Â new SF value 0.0
6682500: system.membus.snoop_filter: eraseIfNullEntry:Â Â Removed SF entry.
6682500: system.tol2bus.snoop_filter: lookupRequest: src system.tol2bus.slave[1] packet WritebackDirty [2f500:2f53f]
6682500: system.tol2bus.snoop_filter: lookupRequest:Â Â SF value 0.2
6682500: system.tol2bus.snoop_filter: lookupRequest:Â Â new SF value 0.0
6682500: system.tol2bus.snoop_filter: eraseIfNullEntry:Â Â Removed SF entry.
6682500: system.l2: access for WritebackDirty [2f500:2f53f] miss
6682500: system.l2.compressor: Compressed cache line from 512 to 200 bits. Compression latency: 13, decompression latency: 8
===+===
6709500: system.cpu.dcache: functionalAccess: WriteReq [2f500:2f51f]
6709500: system.cpu.icache: functionalAccess: WriteReq [2f500:2f51f]
6709500: system.cpu.itb_walker_cache: functionalAccess: WriteReq [2f500:2f51f]
6709500: system.cpu.dtb_walker_cache: functionalAccess: WriteReq [2f500:2f51f]
6709500: system.l2: functionalAccess: WriteReq [2f500:2f51f] valid
===+===
6762500: system.l2.tags: set 2, way 6, superblock offset 3: selecting blk for replacement
6762500: system.l2: Create Writeback WritebackDirty [2f500:2f53f] writable: 1, dirty: 1
6762500: system.l2: Block addr 0xc4c0 (ns) moving from state 0 to state: 7 (E) valid: 1 writable: 1 readable: 1 dirty: 0 tag: c
6762500: system.tol2bus.snoop_filter: lookupSnoop: packet WritebackDirty [2f500:2f53f]
6762500: system.membus.snoop_filter: lookupRequest: src system.membus.slave[1] packet WritebackDirty [2f500:2f53f]
6762500: system.membus.snoop_filter: lookupRequest:Â Â SF value 0.0
Comments on what I think is going on during execution:
4559500: First miss that inserts addr 2f500 in the caches. For the L2 cache it is inserted at set 2, way 5, superblock offset 4 in compressed format (32 <= 512/8). For the L1D cache it is inserted at set 17A, way 0, superblock offset 3 also in compressed format (compression is being used in the whole cache hierarchy).
6682500: Access to 2f4c0 generates a miss which replaces set 2, way 5, offset 3. This is a superblock tag conflict, so 2f500 must be evicted. L1D's 2f500 became dirty at some point (can't track this), so a WritebackDirty is needed. Why did it generate a CleanEvict too?
6709500: The L1D(?) copy is modified (write), but the copy had been evicted, and no calls to insert were made. How is this possible? Does it mean that it was a functional insert that skipped the compression process?
6762500: A replacement is caused by address c4c0. I don't know what causes this writeback, since at no point was 2f500's superblock re-inserted. Maybe it was due to the functional access magic.
Could anyone help me understand the snoop filtering process (who is the requester in the debugging lines?), and check if my comments on what is hapenning actually makes sense?
Thank you in advance!