Discussion:
Deadlock problem with ALPHA FS mode
(too old to reply)
jinsong
2012-08-10 03:54:13 UTC
Permalink
Hi All,
I am running PARSEC on gem5 in ALPHA FS mode. With the following commands, a deadlock problem occurred:
###gem5 command line options###
~/gem5/build/ALPHA_MOESI_hammer/gem5.opt configs/example/ruby_fs.py --cpu-type=timing -n 16 --clock='1GHz' --l1i_size=32kB --l1d_size=32kB --l2_size=16MB --num-l2caches=16 --num-dirs=16 --cacheline_size=64 --caches --ruby --topology=Mesh --garnet-network=fixed --mesh-rows=4 --kernel=vmlinux_2.6.27-gcc_4.3.4 --script=~/gem5/configs/boot/blackscholes-ckpts.rcS

###below is the running results:###
...
warn: Prefetch instructions in Alpha do not do anything
warn: Prefetch instructions in Alpha do not do anything
warn: Prefetch instructions in Alpha do not do anything
hack: be nice to actually delete the event here
info: Entering event queue @ 3352057344000. Starting simulation...
Writing checkpoint
info: Entering event queue @ 3352057347000. Starting simulation...
info: Entering event queue @ 3352057347000. Starting simulation...
panic: Possible Deadlock detected. Aborting!
version: 0 request.paddr: 0x[0x6d48, line 0x6d40] m_readRequestTable: 1 current time: 3352557347 issue_time: 3352057347 difference: 500000
@ cycle 3352557347000
[wakeup:build/ALPHA_MOESI_hammer/mem/ruby/system/Sequencer.cc, line 108]
Memory Usage: 1325856 KBytes
Program aborted at cycle 3352557347000
Aborted

So how should I fix this problem? Any help greatly appreciated!

Best regards,
Song Jin
------------------------------------------------------------------------
Song Jin, Ph. D.
Department of Electronic and Communication Engineering
School of Electrical and Electronic Engineering
North China Electric Power University, P. R. China
Web: http://www.ncepu.edu.cn
------------------------------------------------------------------------
Dibakar Gope
2012-08-10 05:03:00 UTC
Permalink
Turn on the ProtocolTrace, RubyGenerated, RubySlicc debug-flags and begin tracing at few cycles before the deadlock occurs (3352000000000 should be fine for ur case). You will get a clear idea of why the read request could not manage to get service from the memory system

-Dibakar

On 08/09/12, jinsong wrote:
>
> BLOCKQUOTE { MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px; MARGIN-LEFT: 2em } OL { MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px } UL { MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px } P { MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px } BODY { LINE-HEIGHT: 1.5; FONT-FAMILY: 微软雅黑; COLOR: #000000; FONT-SIZE: 10.5pt }
>
> Hi All,
> I am running PARSEC on gem5 in ALPHA FS mode. With the following commands, a deadlock problem occurred:
> ###gem5 command line options###
> ~/gem5/build/ALPHA_MOESI_hammer/gem5.opt configs/example/ruby_fs.py --cpu-type=timing -n 16 --clock='1GHz' --l1i_size=32kB --l1d_size=32kB --l2_size=16MB --num-l2caches=16 --num-dirs=16 --cacheline_size=64 --caches --ruby --topology=Mesh --garnet-network=fixed --mesh-rows=4 --kernel=vmlinux_2.6.27-gcc_4.3.4 --script=~/gem5/configs/boot/blackscholes-ckpts.rcS
>
> ###below is the running results:###
> ...
> warn: Prefetch instructions in Alpha do not do anything
> warn: Prefetch instructions in Alpha do not do anything
> warn: Prefetch instructions in Alpha do not do anything
> hack: be nice to actually delete the event here
> info: Entering event queue @ 3352057344000. Starting simulation...
> Writing checkpoint
> info: Entering event queue @ 3352057347000. Starting simulation...
> info: Entering event queue @ 3352057347000. Starting simulation...
> panic: Possible Deadlock detected. Aborting!
> version: 0 request.paddr: 0x[0x6d48, line 0x6d40] m_readRequestTable: 1 current time: 3352557347 issue_time: 3352057347 difference: 500000
> @ cycle 3352557347000
> [wakeup:build/ALPHA_MOESI_hammer/mem/ruby/system/Sequencer.cc, line 108]
> Memory Usage: 1325856 KBytes
> Program aborted at cycle 3352557347000
> Aborted
>
>
> So how should I fix this problem? Any help greatly appreciated!
>
> Best regards,
> Song Jin
>
> ------------------------------------------------------------------------
> Song Jin, Ph. D.
> Department of Electronic and Communication Engineering
> School of Electrical and Electronic Engineering
> North China Electric Power University, P. R. China
> Web: http://www.ncepu.edu.cn
> ------------------------------------------------------------------------
Hao Wang
2012-08-10 16:25:53 UTC
Permalink
Hey Dibakar,

So you mean this must be the problem of the mechanism of my memory system?

I also have this problem for certain benchmark with more cpu cores. (16
cores with 4 MCs, while 4 cores with 2 MCs works).
But I've modified my scheduling policy of memory controller as FIFO, the
problem still happens, which I think does not make sense.

Hao

On Fri, Aug 10, 2012 at 12:03 AM, Dibakar Gope <***@wisc.edu> wrote:

> Turn on the ProtocolTrace, RubyGenerated, RubySlicc debug-flags and begin
> tracing at few cycles before the deadlock occurs (3352000000000 should be
> fine for ur case). You will get a clear idea of why the read request could
> not manage to get service from the memory system
>
> -Dibakar
>
> On 08/09/12, jinsong wrote:
> >
> > BLOCKQUOTE { MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px; MARGIN-LEFT: 2em } OL
> { MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px } UL { MARGIN-TOP: 0px;
> MARGIN-BOTTOM: 0px } P { MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px } BODY {
> LINE-HEIGHT: 1.5; FONT-FAMILY: 埮蜯雅黑; COLOR: #000000; FONT-SIZE: 10.5pt }
> >
> > Hi All,
> > I am running PARSEC on gem5 in ALPHA FS mode. With the following
> commands, a deadlock problem occurred:
> > ###gem5 command line options###
> > ~/gem5/build/ALPHA_MOESI_hammer/gem5.opt configs/example/ruby_fs.py
> --cpu-type=timing -n 16 --clock='1GHz' --l1i_size=32kB --l1d_size=32kB
> --l2_size=16MB --num-l2caches=16 --num-dirs=16 --cacheline_size=64 --caches
> --ruby --topology=Mesh --garnet-network=fixed --mesh-rows=4
> --kernel=vmlinux_2.6.27-gcc_4.3.4
> --script=~/gem5/configs/boot/blackscholes-ckpts.rcS
> >
> > ###below is the running results:###
> > ...
> > warn: Prefetch instructions in Alpha do not do anything
> > warn: Prefetch instructions in Alpha do not do anything
> > warn: Prefetch instructions in Alpha do not do anything
> > hack: be nice to actually delete the event here
> > info: Entering event queue @ 3352057344000. Starting simulation...
> > Writing checkpoint
> > info: Entering event queue @ 3352057347000. Starting simulation...
> > info: Entering event queue @ 3352057347000. Starting simulation...
> > panic: Possible Deadlock detected. Aborting!
> > version: 0 request.paddr: 0x[0x6d48, line 0x6d40] m_readRequestTable: 1
> current time: 3352557347 issue_time: 3352057347 difference: 500000
> > @ cycle 3352557347000
> > [wakeup:build/ALPHA_MOESI_hammer/mem/ruby/system/Sequencer.cc, line 108]
> > Memory Usage: 1325856 KBytes
> > Program aborted at cycle 3352557347000
> > Aborted
> >
> >
> > So how should I fix this problem? Any help greatly appreciated!
> >
> > Best regards,
> > Song Jin
> >
> > ------------------------------------------------------------------------
> > Song Jin, Ph. D.
> > Department of Electronic and Communication Engineering
> > School of Electrical and Electronic Engineering
> > North China Electric Power University, P. R. China
> > Web: http://www.ncepu.edu.cn
> > ------------------------------------------------------------------------
> _______________________________________________
> gem5-users mailing list
> gem5-***@gem5.org
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users




--
------------------------------------------------------
Wang, Hao
http://homepages.cae.wisc.edu/~wangh/

Ph.D. candidate
Dept. of Electrical & Computer Engineering
University of Wisconsin, Madison

B.S. from
Department of Microelectronics
School of Electronics Engineering and Computer Science
Peking University
Dibakar Gope
2012-08-10 20:08:15 UTC
Permalink
Hao,

Usually I hit the protocol deadlocks whenever I made any changes in the coherence protocols. The sequencer in Ruby periodically scans the status of inflight read/write reqs in MSHR tables and if any req does not get service after a certain time, it flags the protocol deadlock warning and aborts the program. Since the program aborts, so it is easy to trace back to the request that hasn't got service and find whatsoever reasons behind that using the debug-flags i mentioned. I do not know, making changes in # MCs will work or not as I havn't hit any deadlock due to the MCs!


Dibakar

On 08/10/12, Hao Wang
wrote:
> Hey Dibakar,
>
> So you mean this must be the problem of the mechanism of my memory system?
>
>
> I also have this problem for certain benchmark with more cpu cores. (16 cores with 4 MCs, while 4 cores with 2 MCs works).
> But I&#39;ve modified my scheduling policy of memory controller as FIFO, the problem still happens, which I think does not make sense.
>
>
> Hao
>
> On Fri, Aug 10, 2012 at 12:03 AM, Dibakar Gope <gem5-***@gem5.org <***@wisc.edu')" target="1">***@wisc.edu> wrote:
>
> > Turn on the ProtocolTrace, RubyGenerated, RubySlicc debug-flags and begin tracing at few cycles before the deadlock occurs (3352000000000 should be fine for ur case). You will get a clear idea of why the read request could not manage to get service from the memory system
> >
> > -Dibakar
> >
> > On 08/09/12, jinsong wrote:
> > >
> > > BLOCKQUOTE { MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px; MARGIN-LEFT: 2em } OL { MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px } UL { MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px } P { MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px } BODY { LINE-HEIGHT: 1.5; FONT-FAMILY: 微软雅黑; COLOR: #000000; FONT-SIZE: 10.5pt }
> > >
> > > Hi All,
> > > I am running PARSEC on gem5 in ALPHA FS mode. With the following commands, a deadlock problem occurred:
> > > ###gem5 command line options###
> > > ~/gem5/build/ALPHA_MOESI_hammer/gem5.opt configs/example/ruby_fs.py --cpu-type=timing -n 16 --clock=&#39;1GHz&#39; --l1i_size=32kB --l1d_size=32kB --l2_size=16MB --num-l2caches=16 --num-dirs=16 --cacheline_size=64 --caches --ruby --topology=Mesh --garnet-network=fixed --mesh-rows=4 --kernel=vmlinux_2.6.27-gcc_4.3.4 --script=~/gem5/configs/boot/blackscholes-ckpts.rcS
> > >
> > > ###below is the running results:###
> > > ...
> > > warn: Prefetch instructions in Alpha do not do anything
> > > warn: Prefetch instructions in Alpha do not do anything
> > > warn: Prefetch instructions in Alpha do not do anything
> > > hack: be nice to actually delete the event here
> > > info: Entering event queue @ 3352057344000. Starting simulation...
> > > Writing checkpoint
> > > info: Entering event queue @ 3352057347000. Starting simulation...
> > > info: Entering event queue @ 3352057347000. Starting simulation...
> > > panic: Possible Deadlock detected. Aborting!
> > > version: 0 request.paddr: 0x[0x6d48, line 0x6d40] m_readRequestTable: 1 current time: 3352557347 issue_time: 3352057347 difference: 500000
> > > @ cycle 3352557347000
> > > [wakeup:build/ALPHA_MOESI_hammer/mem/ruby/system/Sequencer.cc, line 108]
> > > Memory Usage: 1325856 KBytes
> > > Program aborted at cycle 3352557347000
> > > Aborted
> > >
> > >
> > > So how should I fix this problem? Any help greatly appreciated!
> > >
> > > Best regards,
> > > Song Jin
> > >
> > > ------------------------------------------------------------------------
> > > Song Jin, Ph. D.
> > > Department of Electronic and Communication Engineering
> > > School of Electrical and Electronic Engineering
> > > North China Electric Power University, P. R. China
> > > Web: http://www.ncepu.edu.cn
> > > ------------------------------------------------------------------------
> >
> >
> > _______________________________________________
> > gem5-users mailing list
> > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users(javascript:main.compose('new', 't=gem5-***@gem5.org>
> >
> >
> > --
> > ------------------------------------------------------ Wang, Hao
> > http://homepages.cae.wisc.edu/~wangh/
> >
> >
> > Ph.D. candidate
> > Dept. of Electrical & Computer Engineering
> > University of Wisconsin, Madison
> >
> > B.S. from
> > Department of Microelectronics
> > School of Electronics Engineering and Computer Science
> > Peking University
jinsong
2012-08-11 14:01:56 UTC
Permalink
Hi Dibakar:
Following your suggestion, I turn on the debug flags as you mentioned and run the simulation. I successfully obtain the trace file. However, as a gem5 beginner, I actually have no idea about the reason causing deadlock problem even the running trace presented. So would you please give me more detailed suggestion on how to retrieve related information from the trace file or how to overcome the deadlock problem? Thank you very much!

regards,
Song Jin



From: Dibakar Gope
Date: 2012-08-10 13:03
To: jinsah1977; gem5-***@gem5.org
Subject: Re: [gem5-users] Deadlock problem with ALPHA FS mode
Turn on the ProtocolTrace, RubyGenerated, RubySlicc debug-flags and begin tracing at few cycles before the deadlock occurs (3352000000000 should be fine for ur case). You will get a clear idea of why the read request could not manage to get service from the memory system

-Dibakar

On 08/09/12, jinsong wrote:
>
> BLOCKQUOTE { MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px; MARGIN-LEFT: 2em } OL { MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px } UL { MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px } P { MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px } BODY { LINE-HEIGHT: 1.5; FONT-FAMILY: 埮蜯雅黑; COLOR: #000000; FONT-SIZE: 10.5pt }
>
> Hi All,
> I am running PARSEC on gem5 in ALPHA FS mode. With the following commands, a deadlock problem occurred:
> ###gem5 command line options###
> ~/gem5/build/ALPHA_MOESI_hammer/gem5.opt configs/example/ruby_fs.py --cpu-type=timing -n 16 --clock='1GHz' --l1i_size=32kB --l1d_size=32kB --l2_size=16MB --num-l2caches=16 --num-dirs=16 --cacheline_size=64 --caches --ruby --topology=Mesh --garnet-network=fixed --mesh-rows=4 --kernel=vmlinux_2.6.27-gcc_4.3.4 --script=~/gem5/configs/boot/blackscholes-ckpts.rcS
>
> ###below is the running results:###
> ...
> warn: Prefetch instructions in Alpha do not do anything
> warn: Prefetch instructions in Alpha do not do anything
> warn: Prefetch instructions in Alpha do not do anything
> hack: be nice to actually delete the event here
> info: Entering event queue @ 3352057344000. Starting simulation...
> Writing checkpoint
> info: Entering event queue @ 3352057347000. Starting simulation...
> info: Entering event queue @ 3352057347000. Starting simulation...
> panic: Possible Deadlock detected. Aborting!
> version: 0 request.paddr: 0x[0x6d48, line 0x6d40] m_readRequestTable: 1 current time: 3352557347 issue_time: 3352057347 difference: 500000
> @ cycle 3352557347000
> [wakeup:build/ALPHA_MOESI_hammer/mem/ruby/system/Sequencer.cc, line 108]
> Memory Usage: 1325856 KBytes
> Program aborted at cycle 3352557347000
> Aborted
>
>
> So how should I fix this problem? Any help greatly appreciated!
>
> Best regards,
> Song Jin
>
> ------------------------------------------------------------------------
> Song Jin, Ph. D.
> Department of Electronic and Communication Engineering
> School of Electrical and Electronic Engineering
> North China Electric Power University, P. R. China
> Web: http://www.ncepu.edu.cn
> ------------------------------------------------------------------------
Nilay Vaish
2012-08-12 16:26:26 UTC
Permalink
Song Jin, in the error message that you had posted, the address [0x6d48,
line 0x6d40] was detected as the one in the midst of the deadlock. From
the trace obtained, try to backtrack through the coherence protocol
transitions for this particular address. Try to workout if those
transitions make sense, or is some thing wrong about the coherence
protocol or some other component of the simulator.

--
Nilay

On Sat, 11 Aug 2012, jinsong wrote:

> Hi Dibakar: Following your suggestion, I turn on the debug flags as you
> mentioned and run the simulation. I successfully obtain the trace file.
> However, as a gem5 beginner, I actually have no idea about the reason
> causing deadlock problem even the running trace presented. So would you
> please give me more detailed suggestion on how to retrieve related
> information from the trace file or how to overcome the deadlock problem?
> Thank you very much!
>
> regards,
> Song Jin
>
>
>
> From: Dibakar Gope Date: 2012-08-10 13:03 To: jinsah1977;
> gem5-***@gem5.org Subject: Re: [gem5-users] Deadlock problem with
> ALPHA FS mode Turn on the ProtocolTrace, RubyGenerated, RubySlicc
> debug-flags and begin tracing at few cycles before the deadlock occurs
> (3352000000000 should be fine for ur case). You will get a clear idea of
> why the read request could not manage to get service from the memory
> system
>
> -Dibakar
>
> On 08/09/12, jinsong wrote:
>>
>> BLOCKQUOTE { MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px; MARGIN-LEFT: 2em } OL { MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px } UL { MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px } P { MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px } BODY { LINE-HEIGHT: 1.5; FONT-FAMILY: 埮蜯雅黑; COLOR: #000000; FONT-SIZE: 10.5pt }
>>
>> Hi All,
>> I am running PARSEC on gem5 in ALPHA FS mode. With the following commands, a deadlock problem occurred:
>> ###gem5 command line options###
>> ~/gem5/build/ALPHA_MOESI_hammer/gem5.opt configs/example/ruby_fs.py --cpu-type=timing -n 16 --clock='1GHz' --l1i_size=32kB --l1d_size=32kB --l2_size=16MB --num-l2caches=16 --num-dirs=16 --cacheline_size=64 --caches --ruby --topology=Mesh --garnet-network=fixed --mesh-rows=4 --kernel=vmlinux_2.6.27-gcc_4.3.4 --script=~/gem5/configs/boot/blackscholes-ckpts.rcS
>>
>> ###below is the running results:###
>> ...
>> warn: Prefetch instructions in Alpha do not do anything
>> warn: Prefetch instructions in Alpha do not do anything
>> warn: Prefetch instructions in Alpha do not do anything
>> hack: be nice to actually delete the event here
>> info: Entering event queue @ 3352057344000. Starting simulation...
>> Writing checkpoint
>> info: Entering event queue @ 3352057347000. Starting simulation...
>> info: Entering event queue @ 3352057347000. Starting simulation...
>> panic: Possible Deadlock detected. Aborting!
>> version: 0 request.paddr: 0x[0x6d48, line 0x6d40] m_readRequestTable: 1 current time: 3352557347 issue_time: 3352057347 difference: 500000
>> @ cycle 3352557347000
>> [wakeup:build/ALPHA_MOESI_hammer/mem/ruby/system/Sequencer.cc, line 108]
>> Memory Usage: 1325856 KBytes
>> Program aborted at cycle 3352557347000
>> Aborted
>>
>>
>> So how should I fix this problem? Any help greatly appreciated!
>>
>> Best regards,
>> Song Jin
>>
>> ------------------------------------------------------------------------
>> Song Jin, Ph. D.
>> Department of Electronic and Communication Engineering
>> School of Electrical and Electronic Engineering
>> North China Electric Power University, P. R. China
>> Web: http://www.ncepu.edu.cn
>> ------------------------------------------------------------------------
Continue reading on narkive:
Loading...