Discussion:
[gem5-users] dist-gem5 panic - No 32bit reads implemented for this device.
Afoakwa, Richard
2018-08-30 14:57:36 UTC
Permalink
Hi all, this is my first time using dist-gem5, but I have a working knowledge of gem5.


I have everything setup correctly, I think, but I keep getting the following panic message: "No 32bit reads implemented for this device. Offset 0x44", and I have run out of ideas to fix or work around it.


The testsys.terminal outputs suggests that the images are all loaded correctly and things run fine until it gets to calling executing application. I have updated the image to include the mpi libraries so that I can call mpirun (armv8-linux-gnueabi-mpirun). When I boot the image in a VM, I can run the application just fine with mpirun. But it keep getting this panic message when it's run inside dist-gem5.


I am using arm64 setup. The image is aarch64-ubuntu-trysty-headless.img, the vm is vmlinux.aarch64.20140821, and the dtb is express.aarch64.20140821.dtb.


Here are the text outputs;


***** rcS *****


# --------------------------------------------

# ------ Start your tests below ... ---------

# --------------------------------------------

## Start workload

NUM_CORES=$(/sbin/m5 initparam num-cpus)

echo "Num-Cores: $NUM_CORES"


echo "[RKA] Load modules and set omp threads..."

export OMP_NUM_THREADS=$NUM_CORES #Number of threads to use


echo "[RKA] Start work..."


if [ "$MY_RANK" == "0" ]

then

echo "[RKA] Stats dump and rest..."

/sbin/m5 dumpstats

/sbin/m5 resetstats


echo "[RKA] Starting workload..."


cd /benchmarks/lulesh


mpirun -np ${MY_SIZE} ./lulesh2.0 -s 5 -i 10


/sbin/m5 exit 1

else

printf "Wait for main to finish ...\n"

while /bin/true

do

sleep 5

printf "."

done

fi


***** m5out.0/testsys.terminal *****


[RKA] bootscript.rcS running

[RKA] Rank: 0

[RKA] Size: 2

[RKA] Address: 02

[RKA] Set ethernet config...

[ 3.600382] CPU3: failed to come online

[RKA] Display updated config...

eth0 Link encap:Ethernet HWaddr 00:90:00:00:00:02

inet addr:192.168.0.2 Bcast:192.168.0.255 Mask:255.255.255.0

UP BROADCAST MULTICAST MTU:1500 Metric:1

RX packets:0 errors:0 dropped:0 overruns:0 frame:0

TX packets:0 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)


lo Link encap:Local Loopback

inet addr:127.0.0.1 Mask:255.0.0.0

UP LOOPBACK RUNNING MTU:65536 Metric:1

RX packets:0 errors:0 dropped:0 overruns:0 frame:0

TX packets:0 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:0

RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)


Preparing hosts for mpirun. Rank: 0 of 2

PING 192.168.0.2 (192.168.0.2) 56(84) bytes of data.

64 bytes from 192.168.0.2: icmp_seq=1 ttl=64 time=0.003 ms


--- 192.168.0.2 ping statistics ---

1 packets transmitted, 1 received, 0% packet loss, time 0ms

rtt min/avg/max/mdev = 0.003/0.003/0.003/0.000 ms

PING 192.168.0.3 (192.168.0.3) 56(84) bytes of data.

64 bytes from 192.168.0.3: icmp_seq=1 ttl=64 time=997 ms


--- 192.168.0.3 ping statistics ---

1 packets transmitted, 1 received, 0% packet loss, time 0ms

rtt min/avg/max/mdev = 997.900/997.900/997.900/0.000 ms

Num-Cores: 2

[RKA] Load modules and set omp threads...

[RKA] Start work...

[RKA] Stats dump and rest...

[RKA] Starting workload...

[ 4.620381] CPU2: failed to come online



***** m5out.1/testsys.terminal *****


[RKA] bootscript.rcS is running

[RKA] Rank: 1

[RKA] Size: 2

[RKA] Address: 03

[RKA] Set ethernet config...

[ 3.600382] CPU3: failed to come online

[RKA] Display updated config...

eth0 Link encap:Ethernet HWaddr 00:90:00:00:00:03

inet addr:192.168.0.3 Bcast:192.168.0.255 Mask:255.255.255.0

UP BROADCAST MULTICAST MTU:1500 Metric:1

RX packets:0 errors:0 dropped:0 overruns:0 frame:0

TX packets:0 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)


lo Link encap:Local Loopback

inet addr:127.0.0.1 Mask:255.0.0.0

UP LOOPBACK RUNNING MTU:65536 Metric:1

RX packets:0 errors:0 dropped:0 overruns:0 frame:0

TX packets:0 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:0

RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)


Preparing hosts for mpirun. Rank: 1 of 2

Num-Cores: 2

[RKA] Load modules and set omp threads...

[RKA] Start work...

Wait for main to finish ...

[ 4.620382] CPU2: failed to come online


***** Log.0 *****


command line: gem5-dist/000.init/util/dist/test/./../../../build/ARM/gem5.opt -d gem5-dist/000.init/util/dist/test/m5out.0 --debug-flags=EthernetAll,DistEthernetAll gem5-dist/000.init/util/dist/test/./../../../configs/example/fs.py --cpu-type=AtomicSimpleCPU --num-cpus=2 --machine-type=VExpress_EMM64 --disk-image=aarch64-ubuntu-trusty-headless.img --kernel=vmlinux.aarch64.20140821 --dtb-filename=vexpress.aarch64.20140821.dtb --script=gem5-dist/000.init/util/dist/test/./../../../util/dist/test/bootscript.rcS --checkpoint-dir=gem5-dist/000.init/util/dist/test/m5out.0 --dist --dist-rank=0 --dist-size=2 --dist-server-name=bhx0062 --dist-server-port=2200


info: Standard input is not a terminal, disabling listeners.

Global frequency set at 1000000000000 ticks per second

0: etherlink: Switch Link created. Delay: 10000000, Speed: 800

0: global: DistIface() ctor rank:0

warn: DRAM device capacity (8192 Mbytes) does not match the address range assigned (512 Mbytes)

info: kernel located at: gem5-dist/full_system/binaries/vmlinux.aarch64.20140821

warn: Highest ARM exception-level set to AArch32 but bootloader is for AArch64. Assuming you wanted these to match.

warn: Sockets disabled, not accepting vnc client connections

warn: Sockets disabled, not accepting terminal connections

0: etherlink: DistEtherLink::init() called

…

…

…

18290945047000: testsys.realview.ethernet: Checking interrupts icr: 0 imr: 0x9d

18290945047000: testsys.realview.ethernet: Mask cleaned all interrupts

18290945047000: testsys.realview.ethernet: ITR = 0XC3 itr.interval = 0XC3

panic: No 32bit reads implemented for this device. Offset 0x44

Memory Usage: 1243356 KBytes

Program aborted at tick 18372912712000

--- BEGIN LIBC BACKTRACE ---


***** log.1 *****


command line: gem5-dist/000.init/util/dist/test/./../../../build/ARM/gem5.opt -d gem5-dist/000.init/util/dist/test/m5out.1 --debug-flags=EthernetAll,DistEthernetAll gem5-dist/000.init/util/dist/test/./../../../configs/example/fs.py --cpu-type=AtomicSimpleCPU --num-cpus=2 --machine-type=VExpress_EMM64 --disk-image=aarch64-ubuntu-trusty-headless.img --kernel=vmlinux.aarch64.20140821 --dtb-filename=vexpress.aarch64.20140821.dtb --script=gem5-dist/000.init/util/dist/test/./../../../util/dist/test/bootscript.rcS --checkpoint-dir=gem5-dist/000.init/util/dist/test/m5out.1 --dist --dist-rank=1 --dist-size=2 --dist-server-name=bhx0062 --dist-server-port=2200


info: Standard input is not a terminal, disabling listeners.

Global frequency set at 1000000000000 ticks per second

0: etherlink: Switch Link created. Delay: 10000000, Speed: 800

0: global: DistIface() ctor rank:1

warn: DRAM device capacity (8192 Mbytes) does not match the address range assigned (512 Mbytes)

info: kernel located at: gem5-dist/full_system/binaries/vmlinux.aarch64.20140821

warn: Highest ARM exception-level set to AArch32 but bootloader is for AArch64. Assuming you wanted these to match.

warn: Sockets disabled, not accepting vnc client connections

warn: Sockets disabled, not accepting terminal connections

0: etherlink: DistEtherLink::init() called


…

…

…

18290981199500: testsys.realview.ethernet: ITR = 0XCD itr.interval = 0XCD

18290982340500: testsys.realview.ethernet: Checking interrupts icr: 0 imr: 0x9d

18290982340500: testsys.realview.ethernet: Mask cleaned all interrupts

18290982340500: testsys.realview.ethernet: ITR = 0XC3 itr.interval = 0XC3

info: recv(): Connection closed

Exiting @ tick 18372920000000 because connection to gem5 peer got closed



Any help would be appreciated.


Thanks,

Richard
Gabor Dozsa
2018-08-30 16:20:53 UTC
Permalink
Hi Richard,

I would suggest you to try to run the same MPI app on a single simulated system first to see if it is a dist-gem5 specific issue or not. Simply use vanilla gem5 instead of dist-gem5 with exactly the same configuration (e.g. gem5 flags, kernel, disk image, etc.). You will need to remove the dist-gem5 and ethernet config commands from the bootscript but the
mpirun command line should just work as it is.

- Gabor

From: gem5-users <gem5-users-***@gem5.org> on behalf of "Afoakwa, Richard" <***@ur.rochester.edu>
Reply-To: gem5 users mailing list <gem5-***@gem5.org>
Date: Thursday, 30 August 2018 at 15:57
To: "gem5-***@gem5.org" <gem5-***@gem5.org>
Subject: [gem5-users] dist-gem5 panic - No 32bit reads implemented for this device.


Hi all, this is my first time using dist-gem5, but I have a working knowledge of gem5.



I have everything setup correctly, I think, but I keep getting the following panic message: "No 32bit reads implemented for this device. Offset 0x44", and I have run out of ideas to fix or work around it.



The testsys.terminal outputs suggests that the images are all loaded correctly and things run fine until it gets to calling executing application. I have updated the image to include the mpi libraries so that I can call mpirun (armv8-linux-gnueabi-mpirun). When I boot the image in a VM, I can run the application just fine with mpirun. But it keep getting this panic message when it's run inside dist-gem5.



I am using arm64 setup. The image is aarch64-ubuntu-trysty-headless.img, the vm is vmlinux.aarch64.20140821, and the dtb is express.aarch64.20140821.dtb.



Here are the text outputs;



***** rcS *****



# --------------------------------------------

# ------ Start your tests below ... ---------

# --------------------------------------------

## Start workload

NUM_CORES=$(/sbin/m5 initparam num-cpus)

echo "Num-Cores: $NUM_CORES"



echo "[RKA] Load modules and set omp threads..."

export OMP_NUM_THREADS=$NUM_CORES #Number of threads to use



echo "[RKA] Start work..."



if [ "$MY_RANK" == "0" ]

then

echo "[RKA] Stats dump and rest..."

/sbin/m5 dumpstats

/sbin/m5 resetstats



echo "[RKA] Starting workload..."



cd /benchmarks/lulesh



mpirun -np ${MY_SIZE} ./lulesh2.0 -s 5 -i 10



/sbin/m5 exit 1

else

printf "Wait for main to finish ...\n"

while /bin/true

do

sleep 5

printf "."

done

fi



***** m5out.0/testsys.terminal *****



[RKA] bootscript.rcS running

[RKA] Rank: 0

[RKA] Size: 2

[RKA] Address: 02

[RKA] Set ethernet config...

[ 3.600382] CPU3: failed to come online

[RKA] Display updated config...

eth0 Link encap:Ethernet HWaddr 00:90:00:00:00:02

inet addr:192.168.0.2 Bcast:192.168.0.255 Mask:255.255.255.0

UP BROADCAST MULTICAST MTU:1500 Metric:1

RX packets:0 errors:0 dropped:0 overruns:0 frame:0

TX packets:0 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)



lo Link encap:Local Loopback

inet addr:127.0.0.1 Mask:255.0.0.0

UP LOOPBACK RUNNING MTU:65536 Metric:1

RX packets:0 errors:0 dropped:0 overruns:0 frame:0

TX packets:0 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:0

RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)



Preparing hosts for mpirun. Rank: 0 of 2

PING 192.168.0.2 (192.168.0.2) 56(84) bytes of data.

64 bytes from 192.168.0.2: icmp_seq=1 ttl=64 time=0.003 ms



--- 192.168.0.2 ping statistics ---

1 packets transmitted, 1 received, 0% packet loss, time 0ms

rtt min/avg/max/mdev = 0.003/0.003/0.003/0.000 ms

PING 192.168.0.3 (192.168.0.3) 56(84) bytes of data.

64 bytes from 192.168.0.3: icmp_seq=1 ttl=64 time=997 ms



--- 192.168.0.3 ping statistics ---

1 packets transmitted, 1 received, 0% packet loss, time 0ms

rtt min/avg/max/mdev = 997.900/997.900/997.900/0.000 ms

Num-Cores: 2

[RKA] Load modules and set omp threads...

[RKA] Start work...

[RKA] Stats dump and rest...

[RKA] Starting workload...

[ 4.620381] CPU2: failed to come online




***** m5out.1/testsys.terminal *****


[RKA] bootscript.rcS is running

[RKA] Rank: 1

[RKA] Size: 2

[RKA] Address: 03

[RKA] Set ethernet config...

[ 3.600382] CPU3: failed to come online

[RKA] Display updated config...

eth0 Link encap:Ethernet HWaddr 00:90:00:00:00:03

inet addr:192.168.0.3 Bcast:192.168.0.255 Mask:255.255.255.0

UP BROADCAST MULTICAST MTU:1500 Metric:1

RX packets:0 errors:0 dropped:0 overruns:0 frame:0

TX packets:0 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)



lo Link encap:Local Loopback

inet addr:127.0.0.1 Mask:255.0.0.0

UP LOOPBACK RUNNING MTU:65536 Metric:1

RX packets:0 errors:0 dropped:0 overruns:0 frame:0

TX packets:0 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:0

RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)



Preparing hosts for mpirun. Rank: 1 of 2

Num-Cores: 2

[RKA] Load modules and set omp threads...

[RKA] Start work...

Wait for main to finish ...

[ 4.620382] CPU2: failed to come online



***** Log.0 *****



command line: gem5-dist/000.init/util/dist/test/./../../../build/ARM/gem5.opt -d gem5-dist/000.init/util/dist/test/m5out.0 --debug-flags=EthernetAll,DistEthernetAll gem5-dist/000.init/util/dist/test/./../../../configs/example/fs.py --cpu-type=AtomicSimpleCPU --num-cpus=2 --machine-type=VExpress_EMM64 --disk-image=aarch64-ubuntu-trusty-headless.img --kernel=vmlinux.aarch64.20140821 --dtb-filename=vexpress.aarch64.20140821.dtb --script=gem5-dist/000.init/util/dist/test/./../../../util/dist/test/bootscript.rcS --checkpoint-dir=gem5-dist/000.init/util/dist/test/m5out.0 --dist --dist-rank=0 --dist-size=2 --dist-server-name=bhx0062 --dist-server-port=2200



info: Standard input is not a terminal, disabling listeners.

Global frequency set at 1000000000000 ticks per second

0: etherlink: Switch Link created. Delay: 10000000, Speed: 800

0: global: DistIface() ctor rank:0

warn: DRAM device capacity (8192 Mbytes) does not match the address range assigned (512 Mbytes)

info: kernel located at: gem5-dist/full_system/binaries/vmlinux.aarch64.20140821

warn: Highest ARM exception-level set to AArch32 but bootloader is for AArch64. Assuming you wanted these to match.

warn: Sockets disabled, not accepting vnc client connections

warn: Sockets disabled, not accepting terminal connections

0: etherlink: DistEtherLink::init() called










18290945047000: testsys.realview.ethernet: Checking interrupts icr: 0 imr: 0x9d

18290945047000: testsys.realview.ethernet: Mask cleaned all interrupts

18290945047000: testsys.realview.ethernet: ITR = 0XC3 itr.interval = 0XC3

panic: No 32bit reads implemented for this device. Offset 0x44

Memory Usage: 1243356 KBytes

Program aborted at tick 18372912712000

--- BEGIN LIBC BACKTRACE ---



***** log.1 *****



command line: gem5-dist/000.init/util/dist/test/./../../../build/ARM/gem5.opt -d gem5-dist/000.init/util/dist/test/m5out.1 --debug-flags=EthernetAll,DistEthernetAll gem5-dist/000.init/util/dist/test/./../../../configs/example/fs.py --cpu-type=AtomicSimpleCPU --num-cpus=2 --machine-type=VExpress_EMM64 --disk-image=aarch64-ubuntu-trusty-headless.img --kernel=vmlinux.aarch64.20140821 --dtb-filename=vexpress.aarch64.20140821.dtb --script=gem5-dist/000.init/util/dist/test/./../../../util/dist/test/bootscript.rcS --checkpoint-dir=gem5-dist/000.init/util/dist/test/m5out.1 --dist --dist-rank=1 --dist-size=2 --dist-server-name=bhx0062 --dist-server-port=2200



info: Standard input is not a terminal, disabling listeners.

Global frequency set at 1000000000000 ticks per second

0: etherlink: Switch Link created. Delay: 10000000, Speed: 800

0: global: DistIface() ctor rank:1

warn: DRAM device capacity (8192 Mbytes) does not match the address range assigned (512 Mbytes)

info: kernel located at: gem5-dist/full_system/binaries/vmlinux.aarch64.20140821

warn: Highest ARM exception-level set to AArch32 but bootloader is for AArch64. Assuming you wanted these to match.

warn: Sockets disabled, not accepting vnc client connections

warn: Sockets disabled, not accepting terminal connections

0: etherlink: DistEtherLink::init() called












18290981199500: testsys.realview.ethernet: ITR = 0XCD itr.interval = 0XCD

18290982340500: testsys.realview.ethernet: Checking interrupts icr: 0 imr: 0x9d

18290982340500: testsys.realview.ethernet: Mask cleaned all interrupts

18290982340500: testsys.realview.ethernet: ITR = 0XC3 itr.interval = 0XC3

info: recv(): Connection closed

Exiting @ tick 18372920000000 because connection to gem5 peer got closed





Any help would be appreciated.



Thanks,

Richard

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Afoakwa, Richard
2018-09-07 19:10:48 UTC
Permalink
Gabor,

Thanks very much for your response. Using the vanilla version, it appears to me that the error message was due to the fact that I was not including the host ip address(es) with my mpirun calls. Thanks again.

As a secondary question. I am trying to understand the basic framework of dist-gem5. From what I infer "gem5-dist.sh" script launches gem5 FS processes (using the same *.rcS script and linux image) onto dedicated machines. Using the *.rcS script, each gem5 process updates the network configuration of the "image". For example, in the tutorials, this is done using the line;

/sbin/ifconfig eth0 hw ether 00:90:00:00:00:${MY_ADDR_PADDED} 10.0.0.${MY_ADDR}

Subsequently, the base gem5 process (the one with RANK = 0), can ping the other processes (as evident in the tutorial screenshot). Assuming all this works without error, and am trying to run an mpi application, the RANK0 gem5 process needs a list of hosts to execute mpirun. As noted earlier, mpirun in will fail in gem5 without a list of hosts.

For this purpose, pass the list of host ip address, 10.0.0.${MY_ADDR}, to mpirun. But I keep receiving connection refused error messages after mpirun starts. Trying different ports does not work either.

I would be grateful if anyone can provide some direction on this. Thanks.

Richard



________________________________
From: gem5-users <gem5-users-***@gem5.org> on behalf of Gabor Dozsa <***@arm.com>
Sent: Thursday, August 30, 2018 12:20:53 PM
To: gem5 users mailing list
Subject: Re: [gem5-users] dist-gem5 panic - No 32bit reads implemented for this device.


Hi Richard,



I would suggest you to try to run the same MPI app on a single simulated system first to see if it is a dist-gem5 specific issue or not. Simply use vanilla gem5 instead of dist-gem5 with exactly the same configuration (e.g. gem5 flags, kernel, disk image, etc.). You will need to remove the dist-gem5 and ethernet config commands from the bootscript but the

mpirun command line should just work as it is.



- Gabor



From: gem5-users <gem5-users-***@gem5.org> on behalf of "Afoakwa, Richard" <***@ur.rochester.edu>
Reply-To: gem5 users mailing list <gem5-***@gem5.org>
Date: Thursday, 30 August 2018 at 15:57
To: "gem5-***@gem5.org" <gem5-***@gem5.org>
Subject: [gem5-users] dist-gem5 panic - No 32bit reads implemented for this device.



Hi all, this is my first time using dist-gem5, but I have a working knowledge of gem5.



I have everything setup correctly, I think, but I keep getting the following panic message: "No 32bit reads implemented for this device. Offset 0x44", and I have run out of ideas to fix or work around it.



The testsys.terminal outputs suggests that the images are all loaded correctly and things run fine until it gets to calling executing application. I have updated the image to include the mpi libraries so that I can call mpirun (armv8-linux-gnueabi-mpirun). When I boot the image in a VM, I can run the application just fine with mpirun. But it keep getting this panic message when it's run inside dist-gem5.



I am using arm64 setup. The image is aarch64-ubuntu-trysty-headless.img, the vm is vmlinux.aarch64.20140821, and the dtb is express.aarch64.20140821.dtb.



Here are the text outputs;



***** rcS *****



# --------------------------------------------

# ------ Start your tests below ... ---------

# --------------------------------------------

## Start workload

NUM_CORES=$(/sbin/m5 initparam num-cpus)

echo "Num-Cores: $NUM_CORES"



echo "[RKA] Load modules and set omp threads..."

export OMP_NUM_THREADS=$NUM_CORES #Number of threads to use



echo "[RKA] Start work..."



if [ "$MY_RANK" == "0" ]

then

echo "[RKA] Stats dump and rest..."

/sbin/m5 dumpstats

/sbin/m5 resetstats



echo "[RKA] Starting workload..."



cd /benchmarks/lulesh



mpirun -np ${MY_SIZE} ./lulesh2.0 -s 5 -i 10



/sbin/m5 exit 1

else

printf "Wait for main to finish ...\n"

while /bin/true

do

sleep 5

printf "."

done

fi



***** m5out.0/testsys.terminal *****



[RKA] bootscript.rcS running

[RKA] Rank: 0

[RKA] Size: 2

[RKA] Address: 02

[RKA] Set ethernet config...

[ 3.600382] CPU3: failed to come online

[RKA] Display updated config...

eth0 Link encap:Ethernet HWaddr 00:90:00:00:00:02

inet addr:192.168.0.2 Bcast:192.168.0.255 Mask:255.255.255.0

UP BROADCAST MULTICAST MTU:1500 Metric:1

RX packets:0 errors:0 dropped:0 overruns:0 frame:0

TX packets:0 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)



lo Link encap:Local Loopback

inet addr:127.0.0.1 Mask:255.0.0.0

UP LOOPBACK RUNNING MTU:65536 Metric:1

RX packets:0 errors:0 dropped:0 overruns:0 frame:0

TX packets:0 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:0

RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)



Preparing hosts for mpirun. Rank: 0 of 2

PING 192.168.0.2 (192.168.0.2) 56(84) bytes of data.

64 bytes from 192.168.0.2: icmp_seq=1 ttl=64 time=0.003 ms



--- 192.168.0.2 ping statistics ---

1 packets transmitted, 1 received, 0% packet loss, time 0ms

rtt min/avg/max/mdev = 0.003/0.003/0.003/0.000 ms

PING 192.168.0.3 (192.168.0.3) 56(84) bytes of data.

64 bytes from 192.168.0.3: icmp_seq=1 ttl=64 time=997 ms



--- 192.168.0.3 ping statistics ---

1 packets transmitted, 1 received, 0% packet loss, time 0ms

rtt min/avg/max/mdev = 997.900/997.900/997.900/0.000 ms

Num-Cores: 2

[RKA] Load modules and set omp threads...

[RKA] Start work...

[RKA] Stats dump and rest...

[RKA] Starting workload...

[ 4.620381] CPU2: failed to come online





***** m5out.1/testsys.terminal *****



[RKA] bootscript.rcS is running

[RKA] Rank: 1

[RKA] Size: 2

[RKA] Address: 03

[RKA] Set ethernet config...

[ 3.600382] CPU3: failed to come online

[RKA] Display updated config...

eth0 Link encap:Ethernet HWaddr 00:90:00:00:00:03

inet addr:192.168.0.3 Bcast:192.168.0.255 Mask:255.255.255.0

UP BROADCAST MULTICAST MTU:1500 Metric:1

RX packets:0 errors:0 dropped:0 overruns:0 frame:0

TX packets:0 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)



lo Link encap:Local Loopback

inet addr:127.0.0.1 Mask:255.0.0.0

UP LOOPBACK RUNNING MTU:65536 Metric:1

RX packets:0 errors:0 dropped:0 overruns:0 frame:0

TX packets:0 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:0

RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)



Preparing hosts for mpirun. Rank: 1 of 2

Num-Cores: 2

[RKA] Load modules and set omp threads...

[RKA] Start work...

Wait for main to finish ...

[ 4.620382] CPU2: failed to come online



***** Log.0 *****



command line: gem5-dist/000.init/util/dist/test/./../../../build/ARM/gem5.opt -d gem5-dist/000.init/util/dist/test/m5out.0 --debug-flags=EthernetAll,DistEthernetAll gem5-dist/000.init/util/dist/test/./../../../configs/example/fs.py --cpu-type=AtomicSimpleCPU --num-cpus=2 --machine-type=VExpress_EMM64 --disk-image=aarch64-ubuntu-trusty-headless.img --kernel=vmlinux.aarch64.20140821 --dtb-filename=vexpress.aarch64.20140821.dtb --script=gem5-dist/000.init/util/dist/test/./../../../util/dist/test/bootscript.rcS --checkpoint-dir=gem5-dist/000.init/util/dist/test/m5out.0 --dist --dist-rank=0 --dist-size=2 --dist-server-name=bhx0062 --dist-server-port=2200



info: Standard input is not a terminal, disabling listeners.

Global frequency set at 1000000000000 ticks per second

0: etherlink: Switch Link created. Delay: 10000000, Speed: 800

0: global: DistIface() ctor rank:0

warn: DRAM device capacity (8192 Mbytes) does not match the address range assigned (512 Mbytes)

info: kernel located at: gem5-dist/full_system/binaries/vmlinux.aarch64.20140821

warn: Highest ARM exception-level set to AArch32 but bootloader is for AArch64. Assuming you wanted these to match.

warn: Sockets disabled, not accepting vnc client connections

warn: Sockets disabled, not accepting terminal connections

0: etherlink: DistEtherLink::init() called

…

…

…

18290945047000: testsys.realview.ethernet: Checking interrupts icr: 0 imr: 0x9d

18290945047000: testsys.realview.ethernet: Mask cleaned all interrupts

18290945047000: testsys.realview.ethernet: ITR = 0XC3 itr.interval = 0XC3

panic: No 32bit reads implemented for this device. Offset 0x44

Memory Usage: 1243356 KBytes

Program aborted at tick 18372912712000

--- BEGIN LIBC BACKTRACE ---



***** log.1 *****



command line: gem5-dist/000.init/util/dist/test/./../../../build/ARM/gem5.opt -d gem5-dist/000.init/util/dist/test/m5out.1 --debug-flags=EthernetAll,DistEthernetAll gem5-dist/000.init/util/dist/test/./../../../configs/example/fs.py --cpu-type=AtomicSimpleCPU --num-cpus=2 --machine-type=VExpress_EMM64 --disk-image=aarch64-ubuntu-trusty-headless.img --kernel=vmlinux.aarch64.20140821 --dtb-filename=vexpress.aarch64.20140821.dtb --script=gem5-dist/000.init/util/dist/test/./../../../util/dist/test/bootscript.rcS --checkpoint-dir=gem5-dist/000.init/util/dist/test/m5out.1 --dist --dist-rank=1 --dist-size=2 --dist-server-name=bhx0062 --dist-server-port=2200



info: Standard input is not a terminal, disabling listeners.

Global frequency set at 1000000000000 ticks per second

0: etherlink: Switch Link created. Delay: 10000000, Speed: 800

0: global: DistIface() ctor rank:1

warn: DRAM device capacity (8192 Mbytes) does not match the address range assigned (512 Mbytes)

info: kernel located at: gem5-dist/full_system/binaries/vmlinux.aarch64.20140821

warn: Highest ARM exception-level set to AArch32 but bootloader is for AArch64. Assuming you wanted these to match.

warn: Sockets disabled, not accepting vnc client connections

warn: Sockets disabled, not accepting terminal connections

0: etherlink: DistEtherLink::init() called



…

…

…

18290981199500: testsys.realview.ethernet: ITR = 0XCD itr.interval = 0XCD

18290982340500: testsys.realview.ethernet: Checking interrupts icr: 0 imr: 0x9d

18290982340500: testsys.realview.ethernet: Mask cleaned all interrupts

18290982340500: testsys.realview.ethernet: ITR = 0XC3 itr.interval = 0XC3

info: recv(): Connection closed

Exiting @ tick 18372920000000 because connection to gem5 peer got closed





Any help would be appreciated.



Thanks,

Richard

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Mohammad Alian
2018-09-07 21:02:30 UTC
Permalink
Hi Richard,

I can imagine what you did based on the explanation, but can you post the
rcS script that you are are using so we can better help you?
BTW, running an application on dist-gem5 is not different from running that
on a physical system. So I recommend you to first run your application on a
physical setup and make sure that you have all the command line arguments
correct and then proceed with the dist-gem5 simulations. Cause trial and
error on dist-gem5 can take a lot of time.

Best,
Mohammad

On Fri, Sep 7, 2018 at 2:10 PM Afoakwa, Richard <***@ur.rochester.edu>
wrote:

> Gabor,
>
> Thanks very much for your response. Using the vanilla version, it appears
> to me that the error message was due to the fact that I was *not
> including* the host ip address(es) with my mpirun calls. Thanks again.
>
> As a secondary question. I am trying to understand the basic framework of
> dist-gem5. From what I infer "gem5-dist.sh" script launches gem5 FS
> processes (using the same *.rcS script and linux image) onto dedicated
> machines. Using the *.rcS script, each gem5 process updates the network
> configuration of the "image". For example, in the tutorials, this is done
> using the line;
>
> /sbin/ifconfig eth0 hw ether 00:90:00:00:00:${MY_ADDR_PADDED}
> 10.0.0.${MY_ADDR}
>
> Subsequently, the base gem5 process (the one with RANK = 0), can ping the
> other processes (as evident in the tutorial screenshot). Assuming all this
> works without error, and am trying to run an mpi application, the RANK0
> gem5 process needs a list of hosts to execute mpirun. As noted earlier,
> mpirun in will fail in gem5 without a list of hosts.
>
> For this purpose, pass the list of host ip address, 10.0.0.${MY_ADDR}, to
> mpirun. But I keep receiving *connection refused* error messages after
> mpirun starts. Trying different ports does not work either.
>
> I would be grateful if anyone can provide some direction on this. Thanks.
>
> Richard
>
>
> ------------------------------
> *From:* gem5-users <gem5-users-***@gem5.org> on behalf of Gabor Dozsa
> <***@arm.com>
> *Sent:* Thursday, August 30, 2018 12:20:53 PM
> *To:* gem5 users mailing list
> *Subject:* Re: [gem5-users] dist-gem5 panic - No 32bit reads implemented
> for this device.
>
>
> Hi Richard,
>
>
>
> I would suggest you to try to run the same MPI app on a single simulated
> system first to see if it is a dist-gem5 specific issue or not. Simply use
> vanilla gem5 instead of dist-gem5 with exactly the same configuration (e.g.
> gem5 flags, kernel, disk image, etc.). You will need to remove the
> dist-gem5 and ethernet config commands from the bootscript but the
>
> mpirun command line should just work as it is.
>
>
>
> - Gabor
>
>
>
> *From: *gem5-users <gem5-users-***@gem5.org> on behalf of "Afoakwa,
> Richard" <***@ur.rochester.edu>
> *Reply-To: *gem5 users mailing list <gem5-***@gem5.org>
> *Date: *Thursday, 30 August 2018 at 15:57
> *To: *"gem5-***@gem5.org" <gem5-***@gem5.org>
> *Subject: *[gem5-users] dist-gem5 panic - No 32bit reads implemented for
> this device.
>
>
>
> Hi all, this is my first time using dist-gem5, but I have a working
> knowledge of gem5.
>
>
>
> I have everything setup correctly, I think, but I keep getting the
> following panic message: "No 32bit reads implemented for this device.
> Offset 0x44", and I have run out of ideas to fix or work around it.
>
>
>
> The testsys.terminal outputs suggests that the images are all loaded
> correctly and things run fine until it gets to calling executing
> application. I have updated the image to include the mpi libraries so that
> I can call mpirun (armv8-linux-gnueabi-mpirun). When I boot the image in a
> VM, I can run the application just fine with mpirun. But it keep getting
> this panic message when it's run inside dist-gem5.
>
>
>
> I am using arm64 setup. The image is aarch64-ubuntu-trysty-headless.img,
> the vm is vmlinux.aarch64.20140821, and the dtb is
> express.aarch64.20140821.dtb.
>
>
>
> Here are the text outputs;
>
>
>
> ****** rcS ******
>
>
>
> # --------------------------------------------
>
> # ------ Start your tests below ... ---------
>
> # --------------------------------------------
>
> ## Start workload
>
> NUM_CORES=$(/sbin/m5 initparam num-cpus)
>
> echo "Num-Cores: $NUM_CORES"
>
>
>
> echo "[RKA] Load modules and set omp threads..."
>
> export OMP_NUM_THREADS=$NUM_CORES #Number of threads to use
>
>
>
> echo "[RKA] Start work..."
>
>
>
> if [ "$MY_RANK" == "0" ]
>
> then
>
> echo "[RKA] Stats dump and rest..."
>
> /sbin/m5 dumpstats
>
> /sbin/m5 resetstats
>
>
>
> echo "[RKA] Starting workload..."
>
>
>
> cd /benchmarks/lulesh
>
>
>
> mpirun -np ${MY_SIZE} ./lulesh2.0 -s 5 -i 10
>
>
>
> /sbin/m5 exit 1
>
> else
>
> printf "Wait for main to finish ...\n"
>
> while /bin/true
>
> do
>
> sleep 5
>
> printf "."
>
> done
>
> fi
>
>
>
> ****** m5out.0/testsys.terminal ******
>
>
>
> [RKA] bootscript.rcS running
>
> [RKA] Rank: 0
>
> [RKA] Size: 2
>
> [RKA] Address: 02
>
> [RKA] Set ethernet config...
>
> [ 3.600382] CPU3: failed to come online
>
> [RKA] Display updated config...
>
> eth0 Link encap:Ethernet HWaddr 00:90:00:00:00:02
>
> inet addr:192.168.0.2 Bcast:192.168.0.255 Mask:255.255.255.0
>
> UP BROADCAST MULTICAST MTU:1500 Metric:1
>
> RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>
> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>
> collisions:0 txqueuelen:1000
>
> RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
>
>
>
> lo Link encap:Local Loopback
>
> inet addr:127.0.0.1 Mask:255.0.0.0
>
> UP LOOPBACK RUNNING MTU:65536 Metric:1
>
> RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>
> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>
> collisions:0 txqueuelen:0
>
> RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
>
>
>
> Preparing hosts for mpirun. Rank: 0 of 2
>
> PING 192.168.0.2 (192.168.0.2) 56(84) bytes of data.
>
> 64 bytes from 192.168.0.2: icmp_seq=1 ttl=64 time=0.003 ms
>
>
>
> --- 192.168.0.2 ping statistics ---
>
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
>
> rtt min/avg/max/mdev = 0.003/0.003/0.003/0.000 ms
>
> PING 192.168.0.3 (192.168.0.3) 56(84) bytes of data.
>
> 64 bytes from 192.168.0.3: icmp_seq=1 ttl=64 time=997 ms
>
>
>
> --- 192.168.0.3 ping statistics ---
>
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
>
> rtt min/avg/max/mdev = 997.900/997.900/997.900/0.000 ms
>
> Num-Cores: 2
>
> [RKA] Load modules and set omp threads...
>
> [RKA] Start work...
>
> [RKA] Stats dump and rest...
>
> [RKA] Starting workload...
>
> [ 4.620381] CPU2: failed to come online
>
>
>
>
>
> ****** m5out.1/testsys.terminal ******
>
>
>
> [RKA] bootscript.rcS is running
>
> [RKA] Rank: 1
>
> [RKA] Size: 2
>
> [RKA] Address: 03
>
> [RKA] Set ethernet config...
>
> [ 3.600382] CPU3: failed to come online
>
> [RKA] Display updated config...
>
> eth0 Link encap:Ethernet HWaddr 00:90:00:00:00:03
>
> inet addr:192.168.0.3 Bcast:192.168.0.255 Mask:255.255.255.0
>
> UP BROADCAST MULTICAST MTU:1500 Metric:1
>
> RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>
> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>
> collisions:0 txqueuelen:1000
>
> RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
>
>
>
> lo Link encap:Local Loopback
>
> inet addr:127.0.0.1 Mask:255.0.0.0
>
> UP LOOPBACK RUNNING MTU:65536 Metric:1
>
> RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>
> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>
> collisions:0 txqueuelen:0
>
> RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
>
>
>
> Preparing hosts for mpirun. Rank: 1 of 2
>
> Num-Cores: 2
>
> [RKA] Load modules and set omp threads...
>
> [RKA] Start work...
>
> Wait for main to finish ...
>
> [ 4.620382] CPU2: failed to come online
>
>
>
> ****** Log.0 ******
>
>
>
> command line:
> gem5-dist/000.init/util/dist/test/./../../../build/ARM/gem5.opt -d
> gem5-dist/000.init/util/dist/test/m5out.0
> --debug-flags=EthernetAll,DistEthernetAll
> gem5-dist/000.init/util/dist/test/./../../../configs/example/fs.py
> --cpu-type=AtomicSimpleCPU --num-cpus=2 --machine-type=VExpress_EMM64
> --disk-image=aarch64-ubuntu-trusty-headless.img
> --kernel=vmlinux.aarch64.20140821
> --dtb-filename=vexpress.aarch64.20140821.dtb
> --script=gem5-dist/000.init/util/dist/test/./../../../util/dist/test/bootscript.rcS
> --checkpoint-dir=gem5-dist/000.init/util/dist/test/m5out.0 --dist
> --dist-rank=0 --dist-size=2 --dist-server-name=bhx0062
> --dist-server-port=2200
>
>
>
> info: Standard input is not a terminal, disabling listeners.
>
> Global frequency set at 1000000000000 ticks per second
>
> 0: etherlink: Switch Link created. Delay: 10000000, Speed: 800
>
> 0: global: DistIface() ctor rank:0
>
> warn: DRAM device capacity (8192 Mbytes) does not match the address range
> assigned (512 Mbytes)
>
> info: kernel located
> at: gem5-dist/full_system/binaries/vmlinux.aarch64.20140821
>
> warn: Highest ARM exception-level set to AArch32 but bootloader is for
> AArch64. Assuming you wanted these to match.
>
> warn: Sockets disabled, not accepting vnc client connections
>
> warn: Sockets disabled, not accepting terminal connections
>
> 0: etherlink: DistEtherLink::init() called
>
> 

>
> 

>
> 

>
> 18290945047000: testsys.realview.ethernet: Checking interrupts icr: 0 imr:
> 0x9d
>
> 18290945047000: testsys.realview.ethernet: Mask cleaned all interrupts
>
> 18290945047000: testsys.realview.ethernet: ITR = 0XC3 itr.interval = 0XC3
>
> panic: No 32bit reads implemented for this device. Offset 0x44
>
> Memory Usage: 1243356 KBytes
>
> Program aborted at tick 18372912712000
>
> --- BEGIN LIBC BACKTRACE ---
>
>
>
> ****** log.1 ******
>
>
>
> command line:
> gem5-dist/000.init/util/dist/test/./../../../build/ARM/gem5.opt -d
> gem5-dist/000.init/util/dist/test/m5out.1
> --debug-flags=EthernetAll,DistEthernetAll
> gem5-dist/000.init/util/dist/test/./../../../configs/example/fs.py
> --cpu-type=AtomicSimpleCPU --num-cpus=2 --machine-type=VExpress_EMM64
> --disk-image=aarch64-ubuntu-trusty-headless.img
> --kernel=vmlinux.aarch64.20140821
> --dtb-filename=vexpress.aarch64.20140821.dtb
> --script=gem5-dist/000.init/util/dist/test/./../../../util/dist/test/bootscript.rcS
> --checkpoint-dir=gem5-dist/000.init/util/dist/test/m5out.1 --dist
> --dist-rank=1 --dist-size=2 --dist-server-name=bhx0062
> --dist-server-port=2200
>
>
>
> info: Standard input is not a terminal, disabling listeners.
>
> Global frequency set at 1000000000000 ticks per second
>
> 0: etherlink: Switch Link created. Delay: 10000000, Speed: 800
>
> 0: global: DistIface() ctor rank:1
>
> warn: DRAM device capacity (8192 Mbytes) does not match the address range
> assigned (512 Mbytes)
>
> info: kernel located
> at: gem5-dist/full_system/binaries/vmlinux.aarch64.20140821
>
> warn: Highest ARM exception-level set to AArch32 but bootloader is for
> AArch64. Assuming you wanted these to match.
>
> warn: Sockets disabled, not accepting vnc client connections
>
> warn: Sockets disabled, not accepting terminal connections
>
> 0: etherlink: DistEtherLink::init() called
>
>
>
> 

>
> 

>
> 

>
> 18290981199500: testsys.realview.ethernet: ITR = 0XCD itr.interval = 0XCD
>
> 18290982340500: testsys.realview.ethernet: Checking interrupts icr: 0 imr:
> 0x9d
>
> 18290982340500: testsys.realview.ethernet: Mask cleaned all interrupts
>
> 18290982340500: testsys.realview.ethernet: ITR = 0XC3 itr.interval = 0XC3
>
> info: recv(): Connection closed
>
> Exiting @ tick 18372920000000 because connection to gem5 peer got closed
>
>
>
>
>
> Any help would be appreciated.
>
>
>
> Thanks,
>
> Richard
> IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose the
> contents to any other person, use it for any purpose, or store or copy the
> information in any medium. Thank you.
> _______________________________________________
> gem5-users mailing list
> gem5-***@gem5.org
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
Gabor Dozsa
2018-09-10 08:23:28 UTC
Permalink
Hi Richard,

You wrote below:

“As noted earlier, mpirun in will fail in gem5 without a list of hosts.”

This should not happen. Without a list of hosts, mpirun should launch all the mpi processes on the ‘localhost’ (i.e. where mpirun is running). mpirun is using ssh to start new processes.

Make sure that ssh is working before you try mpirun. E.g. you try to run from your rcS script:
‘ssh localhost ls’
to check if ssh works locally and
‘ssh 10.0.0.X ls’
To check if ssh can launch a new process on a remote host. If ssh does not work, check if sshd is started from the linux init.

- Gabor


From: gem5-users <gem5-users-***@gem5.org> on behalf of "Afoakwa, Richard" <***@ur.rochester.edu>
Reply-To: gem5 users mailing list <gem5-***@gem5.org>
Date: Friday, 7 September 2018 at 20:10
To: gem5 users mailing list <gem5-***@gem5.org>
Subject: Re: [gem5-users] dist-gem5 panic - No 32bit reads implemented for this device.

Gabor,

Thanks very much for your response. Using the vanilla version, it appears to me that the error message was due to the fact that I was not including the host ip address(es) with my mpirun calls. Thanks again.

As a secondary question. I am trying to understand the basic framework of dist-gem5. From what I infer "gem5-dist.sh" script launches gem5 FS processes (using the same *.rcS script and linux image) onto dedicated machines. Using the *.rcS script, each gem5 process updates the network configuration of the "image". For example, in the tutorials, this is done using the line;

/sbin/ifconfig eth0 hw ether 00:90:00:00:00:${MY_ADDR_PADDED} 10.0.0.${MY_ADDR}

Subsequently, the base gem5 process (the one with RANK = 0), can ping the other processes (as evident in the tutorial screenshot). Assuming all this works without error, and am trying to run an mpi application, the RANK0 gem5 process needs a list of hosts to execute mpirun. As noted earlier, mpirun in will fail in gem5 without a list of hosts.

For this purpose, pass the list of host ip address, 10.0.0.${MY_ADDR}, to mpirun. But I keep receiving connection refused error messages after mpirun starts. Trying different ports does not work either.

I would be grateful if anyone can provide some direction on this. Thanks.

Richard


________________________________
From: gem5-users <gem5-users-***@gem5.org> on behalf of Gabor Dozsa <***@arm.com>
Sent: Thursday, August 30, 2018 12:20:53 PM
To: gem5 users mailing list
Subject: Re: [gem5-users] dist-gem5 panic - No 32bit reads implemented for this device.


Hi Richard,



I would suggest you to try to run the same MPI app on a single simulated system first to see if it is a dist-gem5 specific issue or not. Simply use vanilla gem5 instead of dist-gem5 with exactly the same configuration (e.g. gem5 flags, kernel, disk image, etc.). You will need to remove the dist-gem5 and ethernet config commands from the bootscript but the

mpirun command line should just work as it is.



- Gabor



From: gem5-users <gem5-users-***@gem5.org> on behalf of "Afoakwa, Richard" <***@ur.rochester.edu>
Reply-To: gem5 users mailing list <gem5-***@gem5.org>
Date: Thursday, 30 August 2018 at 15:57
To: "gem5-***@gem5.org" <gem5-***@gem5.org>
Subject: [gem5-users] dist-gem5 panic - No 32bit reads implemented for this device.



Hi all, this is my first time using dist-gem5, but I have a working knowledge of gem5.



I have everything setup correctly, I think, but I keep getting the following panic message: "No 32bit reads implemented for this device. Offset 0x44", and I have run out of ideas to fix or work around it.



The testsys.terminal outputs suggests that the images are all loaded correctly and things run fine until it gets to calling executing application. I have updated the image to include the mpi libraries so that I can call mpirun (armv8-linux-gnueabi-mpirun). When I boot the image in a VM, I can run the application just fine with mpirun. But it keep getting this panic message when it's run inside dist-gem5.



I am using arm64 setup. The image is aarch64-ubuntu-trysty-headless.img, the vm is vmlinux.aarch64.20140821, and the dtb is express.aarch64.20140821.dtb.



Here are the text outputs;



***** rcS *****



# --------------------------------------------

# ------ Start your tests below ... ---------

# --------------------------------------------

## Start workload

NUM_CORES=$(/sbin/m5 initparam num-cpus)

echo "Num-Cores: $NUM_CORES"



echo "[RKA] Load modules and set omp threads..."

export OMP_NUM_THREADS=$NUM_CORES #Number of threads to use



echo "[RKA] Start work..."



if [ "$MY_RANK" == "0" ]

then

echo "[RKA] Stats dump and rest..."

/sbin/m5 dumpstats

/sbin/m5 resetstats



echo "[RKA] Starting workload..."



cd /benchmarks/lulesh



mpirun -np ${MY_SIZE} ./lulesh2.0 -s 5 -i 10



/sbin/m5 exit 1

else

printf "Wait for main to finish ...\n"

while /bin/true

do

sleep 5

printf "."

done

fi



***** m5out.0/testsys.terminal *****



[RKA] bootscript.rcS running

[RKA] Rank: 0

[RKA] Size: 2

[RKA] Address: 02

[RKA] Set ethernet config...

[ 3.600382] CPU3: failed to come online

[RKA] Display updated config...

eth0 Link encap:Ethernet HWaddr 00:90:00:00:00:02

inet addr:192.168.0.2 Bcast:192.168.0.255 Mask:255.255.255.0

UP BROADCAST MULTICAST MTU:1500 Metric:1

RX packets:0 errors:0 dropped:0 overruns:0 frame:0

TX packets:0 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)



lo Link encap:Local Loopback

inet addr:127.0.0.1 Mask:255.0.0.0

UP LOOPBACK RUNNING MTU:65536 Metric:1

RX packets:0 errors:0 dropped:0 overruns:0 frame:0

TX packets:0 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:0

RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)



Preparing hosts for mpirun. Rank: 0 of 2

PING 192.168.0.2 (192.168.0.2) 56(84) bytes of data.

64 bytes from 192.168.0.2: icmp_seq=1 ttl=64 time=0.003 ms



--- 192.168.0.2 ping statistics ---

1 packets transmitted, 1 received, 0% packet loss, time 0ms

rtt min/avg/max/mdev = 0.003/0.003/0.003/0.000 ms

PING 192.168.0.3 (192.168.0.3) 56(84) bytes of data.

64 bytes from 192.168.0.3: icmp_seq=1 ttl=64 time=997 ms



--- 192.168.0.3 ping statistics ---

1 packets transmitted, 1 received, 0% packet loss, time 0ms

rtt min/avg/max/mdev = 997.900/997.900/997.900/0.000 ms

Num-Cores: 2

[RKA] Load modules and set omp threads...

[RKA] Start work...

[RKA] Stats dump and rest...

[RKA] Starting workload...

[ 4.620381] CPU2: failed to come online





***** m5out.1/testsys.terminal *****



[RKA] bootscript.rcS is running

[RKA] Rank: 1

[RKA] Size: 2

[RKA] Address: 03

[RKA] Set ethernet config...

[ 3.600382] CPU3: failed to come online

[RKA] Display updated config...

eth0 Link encap:Ethernet HWaddr 00:90:00:00:00:03

inet addr:192.168.0.3 Bcast:192.168.0.255 Mask:255.255.255.0

UP BROADCAST MULTICAST MTU:1500 Metric:1

RX packets:0 errors:0 dropped:0 overruns:0 frame:0

TX packets:0 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)



lo Link encap:Local Loopback

inet addr:127.0.0.1 Mask:255.0.0.0

UP LOOPBACK RUNNING MTU:65536 Metric:1

RX packets:0 errors:0 dropped:0 overruns:0 frame:0

TX packets:0 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:0

RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)



Preparing hosts for mpirun. Rank: 1 of 2

Num-Cores: 2

[RKA] Load modules and set omp threads...

[RKA] Start work...

Wait for main to finish ...

[ 4.620382] CPU2: failed to come online



***** Log.0 *****



command line: gem5-dist/000.init/util/dist/test/./../../../build/ARM/gem5.opt -d gem5-dist/000.init/util/dist/test/m5out.0 --debug-flags=EthernetAll,DistEthernetAll gem5-dist/000.init/util/dist/test/./../../../configs/example/fs.py --cpu-type=AtomicSimpleCPU --num-cpus=2 --machine-type=VExpress_EMM64 --disk-image=aarch64-ubuntu-trusty-headless.img --kernel=vmlinux.aarch64.20140821 --dtb-filename=vexpress.aarch64.20140821.dtb --script=gem5-dist/000.init/util/dist/test/./../../../util/dist/test/bootscript.rcS --checkpoint-dir=gem5-dist/000.init/util/dist/test/m5out.0 --dist --dist-rank=0 --dist-size=2 --dist-server-name=bhx0062 --dist-server-port=2200



info: Standard input is not a terminal, disabling listeners.

Global frequency set at 1000000000000 ticks per second

0: etherlink: Switch Link created. Delay: 10000000, Speed: 800

0: global: DistIface() ctor rank:0

warn: DRAM device capacity (8192 Mbytes) does not match the address range assigned (512 Mbytes)

info: kernel located at: gem5-dist/full_system/binaries/vmlinux.aarch64.20140821

warn: Highest ARM exception-level set to AArch32 but bootloader is for AArch64. Assuming you wanted these to match.

warn: Sockets disabled, not accepting vnc client connections

warn: Sockets disabled, not accepting terminal connections

0: etherlink: DistEtherLink::init() called










18290945047000: testsys.realview.ethernet: Checking interrupts icr: 0 imr: 0x9d

18290945047000: testsys.realview.ethernet: Mask cleaned all interrupts

18290945047000: testsys.realview.ethernet: ITR = 0XC3 itr.interval = 0XC3

panic: No 32bit reads implemented for this device. Offset 0x44

Memory Usage: 1243356 KBytes

Program aborted at tick 18372912712000

--- BEGIN LIBC BACKTRACE ---



***** log.1 *****



command line: gem5-dist/000.init/util/dist/test/./../../../build/ARM/gem5.opt -d gem5-dist/000.init/util/dist/test/m5out.1 --debug-flags=EthernetAll,DistEthernetAll gem5-dist/000.init/util/dist/test/./../../../configs/example/fs.py --cpu-type=AtomicSimpleCPU --num-cpus=2 --machine-type=VExpress_EMM64 --disk-image=aarch64-ubuntu-trusty-headless.img --kernel=vmlinux.aarch64.20140821 --dtb-filename=vexpress.aarch64.20140821.dtb --script=gem5-dist/000.init/util/dist/test/./../../../util/dist/test/bootscript.rcS --checkpoint-dir=gem5-dist/000.init/util/dist/test/m5out.1 --dist --dist-rank=1 --dist-size=2 --dist-server-name=bhx0062 --dist-server-port=2200



info: Standard input is not a terminal, disabling listeners.

Global frequency set at 1000000000000 ticks per second

0: etherlink: Switch Link created. Delay: 10000000, Speed: 800

0: global: DistIface() ctor rank:1

warn: DRAM device capacity (8192 Mbytes) does not match the address range assigned (512 Mbytes)

info: kernel located at: gem5-dist/full_system/binaries/vmlinux.aarch64.20140821

warn: Highest ARM exception-level set to AArch32 but bootloader is for AArch64. Assuming you wanted these to match.

warn: Sockets disabled, not accepting vnc client connections

warn: Sockets disabled, not accepting terminal connections

0: etherlink: DistEtherLink::init() called












18290981199500: testsys.realview.ethernet: ITR = 0XCD itr.interval = 0XCD

18290982340500: testsys.realview.ethernet: Checking interrupts icr: 0 imr: 0x9d

18290982340500: testsys.realview.ethernet: Mask cleaned all interrupts

18290982340500: testsys.realview.ethernet: ITR = 0XC3 itr.interval = 0XC3

info: recv(): Connection closed

Exiting @ tick 18372920000000 because connection to gem5 peer got closed





Any help would be appreciated.



Thanks,

Richard
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Afoakwa, Richard
2018-09-12 16:33:34 UTC
Permalink
Hello Gabor,


I would need recheck my system configurations for any issues, then try out your suggestions and get back to you.


Thanks for your insights.


Richard

________________________________
From: gem5-users <gem5-users-***@gem5.org> on behalf of Gabor Dozsa <***@arm.com>
Sent: Monday, September 10, 2018 4:23:28 AM
To: gem5 users mailing list
Subject: Re: [gem5-users] dist-gem5 panic - No 32bit reads implemented for this device.


Hi Richard,



You wrote below:



“As noted earlier, mpirun in will fail in gem5 without a list of hosts.”



This should not happen. Without a list of hosts, mpirun should launch all the mpi processes on the ‘localhost’ (i.e. where mpirun is running). mpirun is using ssh to start new processes.



Make sure that ssh is working before you try mpirun. E.g. you try to run from your rcS script:

‘ssh localhost ls’

to check if ssh works locally and

‘ssh 10.0.0.X ls’

To check if ssh can launch a new process on a remote host. If ssh does not work, check if sshd is started from the linux init.



- Gabor





From: gem5-users <gem5-users-***@gem5.org> on behalf of "Afoakwa, Richard" <***@ur.rochester.edu>
Reply-To: gem5 users mailing list <gem5-***@gem5.org>
Date: Friday, 7 September 2018 at 20:10
To: gem5 users mailing list <gem5-***@gem5.org>
Subject: Re: [gem5-users] dist-gem5 panic - No 32bit reads implemented for this device.



Gabor,



Thanks very much for your response. Using the vanilla version, it appears to me that the error message was due to the fact that I was not including the host ip address(es) with my mpirun calls. Thanks again.



As a secondary question. I am trying to understand the basic framework of dist-gem5. From what I infer "gem5-dist.sh" script launches gem5 FS processes (using the same *.rcS script and linux image) onto dedicated machines. Using the *.rcS script, each gem5 process updates the network configuration of the "image". For example, in the tutorials, this is done using the line;



/sbin/ifconfig eth0 hw ether 00:90:00:00:00:${MY_ADDR_PADDED} 10.0.0.${MY_ADDR}



Subsequently, the base gem5 process (the one with RANK = 0), can ping the other processes (as evident in the tutorial screenshot). Assuming all this works without error, and am trying to run an mpi application, the RANK0 gem5 process needs a list of hosts to execute mpirun. As noted earlier, mpirun in will fail in gem5 without a list of hosts.



For this purpose, pass the list of host ip address, 10.0.0.${MY_ADDR}, to mpirun. But I keep receiving connection refused error messages after mpirun starts. Trying different ports does not work either.



I would be grateful if anyone can provide some direction on this. Thanks.



Richard





________________________________

From: gem5-users <gem5-users-***@gem5.org> on behalf of Gabor Dozsa <***@arm.com>
Sent: Thursday, August 30, 2018 12:20:53 PM
To: gem5 users mailing list
Subject: Re: [gem5-users] dist-gem5 panic - No 32bit reads implemented for this device.



Hi Richard,



I would suggest you to try to run the same MPI app on a single simulated system first to see if it is a dist-gem5 specific issue or not. Simply use vanilla gem5 instead of dist-gem5 with exactly the same configuration (e.g. gem5 flags, kernel, disk image, etc.). You will need to remove the dist-gem5 and ethernet config commands from the bootscript but the

mpirun command line should just work as it is.



- Gabor



From: gem5-users <gem5-users-***@gem5.org> on behalf of "Afoakwa, Richard" <***@ur.rochester.edu>
Reply-To: gem5 users mailing list <gem5-***@gem5.org>
Date: Thursday, 30 August 2018 at 15:57
To: "gem5-***@gem5.org" <gem5-***@gem5.org>
Subject: [gem5-users] dist-gem5 panic - No 32bit reads implemented for this device.



Hi all, this is my first time using dist-gem5, but I have a working knowledge of gem5.



I have everything setup correctly, I think, but I keep getting the following panic message: "No 32bit reads implemented for this device. Offset 0x44", and I have run out of ideas to fix or work around it.



The testsys.terminal outputs suggests that the images are all loaded correctly and things run fine until it gets to calling executing application. I have updated the image to include the mpi libraries so that I can call mpirun (armv8-linux-gnueabi-mpirun). When I boot the image in a VM, I can run the application just fine with mpirun. But it keep getting this panic message when it's run inside dist-gem5.



I am using arm64 setup. The image is aarch64-ubuntu-trysty-headless.img, the vm is vmlinux.aarch64.20140821, and the dtb is express.aarch64.20140821.dtb.



Here are the text outputs;



***** rcS *****



# --------------------------------------------

# ------ Start your tests below ... ---------

# --------------------------------------------

## Start workload

NUM_CORES=$(/sbin/m5 initparam num-cpus)

echo "Num-Cores: $NUM_CORES"



echo "[RKA] Load modules and set omp threads..."

export OMP_NUM_THREADS=$NUM_CORES #Number of threads to use



echo "[RKA] Start work..."



if [ "$MY_RANK" == "0" ]

then

echo "[RKA] Stats dump and rest..."

/sbin/m5 dumpstats

/sbin/m5 resetstats



echo "[RKA] Starting workload..."



cd /benchmarks/lulesh



mpirun -np ${MY_SIZE} ./lulesh2.0 -s 5 -i 10



/sbin/m5 exit 1

else

printf "Wait for main to finish ...\n"

while /bin/true

do

sleep 5

printf "."

done

fi



***** m5out.0/testsys.terminal *****



[RKA] bootscript.rcS running

[RKA] Rank: 0

[RKA] Size: 2

[RKA] Address: 02

[RKA] Set ethernet config...

[ 3.600382] CPU3: failed to come online

[RKA] Display updated config...

eth0 Link encap:Ethernet HWaddr 00:90:00:00:00:02

inet addr:192.168.0.2 Bcast:192.168.0.255 Mask:255.255.255.0

UP BROADCAST MULTICAST MTU:1500 Metric:1

RX packets:0 errors:0 dropped:0 overruns:0 frame:0

TX packets:0 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)



lo Link encap:Local Loopback

inet addr:127.0.0.1 Mask:255.0.0.0

UP LOOPBACK RUNNING MTU:65536 Metric:1

RX packets:0 errors:0 dropped:0 overruns:0 frame:0

TX packets:0 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:0

RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)



Preparing hosts for mpirun. Rank: 0 of 2

PING 192.168.0.2 (192.168.0.2) 56(84) bytes of data.

64 bytes from 192.168.0.2: icmp_seq=1 ttl=64 time=0.003 ms



--- 192.168.0.2 ping statistics ---

1 packets transmitted, 1 received, 0% packet loss, time 0ms

rtt min/avg/max/mdev = 0.003/0.003/0.003/0.000 ms

PING 192.168.0.3 (192.168.0.3) 56(84) bytes of data.

64 bytes from 192.168.0.3: icmp_seq=1 ttl=64 time=997 ms



--- 192.168.0.3 ping statistics ---

1 packets transmitted, 1 received, 0% packet loss, time 0ms

rtt min/avg/max/mdev = 997.900/997.900/997.900/0.000 ms

Num-Cores: 2

[RKA] Load modules and set omp threads...

[RKA] Start work...

[RKA] Stats dump and rest...

[RKA] Starting workload...

[ 4.620381] CPU2: failed to come online





***** m5out.1/testsys.terminal *****



[RKA] bootscript.rcS is running

[RKA] Rank: 1

[RKA] Size: 2

[RKA] Address: 03

[RKA] Set ethernet config...

[ 3.600382] CPU3: failed to come online

[RKA] Display updated config...

eth0 Link encap:Ethernet HWaddr 00:90:00:00:00:03

inet addr:192.168.0.3 Bcast:192.168.0.255 Mask:255.255.255.0

UP BROADCAST MULTICAST MTU:1500 Metric:1

RX packets:0 errors:0 dropped:0 overruns:0 frame:0

TX packets:0 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)



lo Link encap:Local Loopback

inet addr:127.0.0.1 Mask:255.0.0.0

UP LOOPBACK RUNNING MTU:65536 Metric:1

RX packets:0 errors:0 dropped:0 overruns:0 frame:0

TX packets:0 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:0

RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)



Preparing hosts for mpirun. Rank: 1 of 2

Num-Cores: 2

[RKA] Load modules and set omp threads...

[RKA] Start work...

Wait for main to finish ...

[ 4.620382] CPU2: failed to come online



***** Log.0 *****



command line: gem5-dist/000.init/util/dist/test/./../../../build/ARM/gem5.opt -d gem5-dist/000.init/util/dist/test/m5out.0 --debug-flags=EthernetAll,DistEthernetAll gem5-dist/000.init/util/dist/test/./../../../configs/example/fs.py --cpu-type=AtomicSimpleCPU --num-cpus=2 --machine-type=VExpress_EMM64 --disk-image=aarch64-ubuntu-trusty-headless.img --kernel=vmlinux.aarch64.20140821 --dtb-filename=vexpress.aarch64.20140821.dtb --script=gem5-dist/000.init/util/dist/test/./../../../util/dist/test/bootscript.rcS --checkpoint-dir=gem5-dist/000.init/util/dist/test/m5out.0 --dist --dist-rank=0 --dist-size=2 --dist-server-name=bhx0062 --dist-server-port=2200



info: Standard input is not a terminal, disabling listeners.

Global frequency set at 1000000000000 ticks per second

0: etherlink: Switch Link created. Delay: 10000000, Speed: 800

0: global: DistIface() ctor rank:0

warn: DRAM device capacity (8192 Mbytes) does not match the address range assigned (512 Mbytes)

info: kernel located at: gem5-dist/full_system/binaries/vmlinux.aarch64.20140821

warn: Highest ARM exception-level set to AArch32 but bootloader is for AArch64. Assuming you wanted these to match.

warn: Sockets disabled, not accepting vnc client connections

warn: Sockets disabled, not accepting terminal connections

0: etherlink: DistEtherLink::init() called

…

…

…

18290945047000: testsys.realview.ethernet: Checking interrupts icr: 0 imr: 0x9d

18290945047000: testsys.realview.ethernet: Mask cleaned all interrupts

18290945047000: testsys.realview.ethernet: ITR = 0XC3 itr.interval = 0XC3

panic: No 32bit reads implemented for this device. Offset 0x44

Memory Usage: 1243356 KBytes

Program aborted at tick 18372912712000

--- BEGIN LIBC BACKTRACE ---



***** log.1 *****



command line: gem5-dist/000.init/util/dist/test/./../../../build/ARM/gem5.opt -d gem5-dist/000.init/util/dist/test/m5out.1 --debug-flags=EthernetAll,DistEthernetAll gem5-dist/000.init/util/dist/test/./../../../configs/example/fs.py --cpu-type=AtomicSimpleCPU --num-cpus=2 --machine-type=VExpress_EMM64 --disk-image=aarch64-ubuntu-trusty-headless.img --kernel=vmlinux.aarch64.20140821 --dtb-filename=vexpress.aarch64.20140821.dtb --script=gem5-dist/000.init/util/dist/test/./../../../util/dist/test/bootscript.rcS --checkpoint-dir=gem5-dist/000.init/util/dist/test/m5out.1 --dist --dist-rank=1 --dist-size=2 --dist-server-name=bhx0062 --dist-server-port=2200



info: Standard input is not a terminal, disabling listeners.

Global frequency set at 1000000000000 ticks per second

0: etherlink: Switch Link created. Delay: 10000000, Speed: 800

0: global: DistIface() ctor rank:1

warn: DRAM device capacity (8192 Mbytes) does not match the address range assigned (512 Mbytes)

info: kernel located at: gem5-dist/full_system/binaries/vmlinux.aarch64.20140821

warn: Highest ARM exception-level set to AArch32 but bootloader is for AArch64. Assuming you wanted these to match.

warn: Sockets disabled, not accepting vnc client connections

warn: Sockets disabled, not accepting terminal connections

0: etherlink: DistEtherLink::init() called



…

…

…

18290981199500: testsys.realview.ethernet: ITR = 0XCD itr.interval = 0XCD

18290982340500: testsys.realview.ethernet: Checking interrupts icr: 0 imr: 0x9d

18290982340500: testsys.realview.ethernet: Mask cleaned all interrupts

18290982340500: testsys.realview.ethernet: ITR = 0XC3 itr.interval = 0XC3

info: recv(): Connection closed

Exiting @ tick 18372920000000 because connection to gem5 peer got closed





Any help would be appreciated.



Thanks,

Richard

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Loading...