Timothy Hayes
2018-12-05 18:34:16 UTC
Iâm benchmarking some multithreaded ARMv8 codes on Gem5-Ruby and am constantly running into live lock. Iâve started investigating this and am a little confused the with regards to the way load-linked and store-conditional (LDEX/STEX) are treated in Ruby and its protocols.
In all of the coherence protocol implementations that Iâve looked at so far, RubyRequestType::ATOMIC types are treated the same as stores. This means that the LDEX will try to load its address in modified state. This doesnât appear to play well with the idiomatic way of writing a spinlock in ARMv8, e.g.
spin_mutex_lock(lock):
spinlock_unlock_wait(lock);
spinlock_lock(lock);
spinlock_unlock_wait(lock):
int tmp;
__asm__ volatile(
" sevl\n"
" 1: wfe\n"
" ldaxr %w0, %1\n"
" cbnz %w0, 1b\n"
: "=&r" (tmp)
: "Q" (lock));
spinlock_lock(lock):
int lockval = 1;
int tmp;
__asm__ volatile(
" sevl\n"
" prfm pstl1strm, %1\n"
" 1: wfe\n"
" 2: ldaxr %w0, %2\n"
" cbnz %w0, 1b\n"
" stxr %w0, %w3, %1\n"
" cbnz %w0, 2b\n"
: "=&r" (tmp), "+Q" (lock)
: "Q" (lock), "r" (lockval)
: "memory");
spinlock_unlock(lock):
atomic_store_explicit(&lck->lock, 0, memory_order_release);
The issue I see is that both unlock_wait and lock use LDEX which loads the lock in modified state. If there is contention for the lock, the LDEX will cause the lock to ping-pong in modified state before it is ever taken which can lead to live lock under modest contention.
Could somebody explain the rationale behind this implementation of load linked/store conditional? Iâm guessing it was written with another architecture in mind (Alpha?) but Iâm unsure how a mutex could be correctly implemented given these characteristics.
In all of the coherence protocol implementations that Iâve looked at so far, RubyRequestType::ATOMIC types are treated the same as stores. This means that the LDEX will try to load its address in modified state. This doesnât appear to play well with the idiomatic way of writing a spinlock in ARMv8, e.g.
spin_mutex_lock(lock):
spinlock_unlock_wait(lock);
spinlock_lock(lock);
spinlock_unlock_wait(lock):
int tmp;
__asm__ volatile(
" sevl\n"
" 1: wfe\n"
" ldaxr %w0, %1\n"
" cbnz %w0, 1b\n"
: "=&r" (tmp)
: "Q" (lock));
spinlock_lock(lock):
int lockval = 1;
int tmp;
__asm__ volatile(
" sevl\n"
" prfm pstl1strm, %1\n"
" 1: wfe\n"
" 2: ldaxr %w0, %2\n"
" cbnz %w0, 1b\n"
" stxr %w0, %w3, %1\n"
" cbnz %w0, 2b\n"
: "=&r" (tmp), "+Q" (lock)
: "Q" (lock), "r" (lockval)
: "memory");
spinlock_unlock(lock):
atomic_store_explicit(&lck->lock, 0, memory_order_release);
The issue I see is that both unlock_wait and lock use LDEX which loads the lock in modified state. If there is contention for the lock, the LDEX will cause the lock to ping-pong in modified state before it is ever taken which can lead to live lock under modest contention.
Could somebody explain the rationale behind this implementation of load linked/store conditional? Iâm guessing it was written with another architecture in mind (Alpha?) but Iâm unsure how a mutex could be correctly implemented given these characteristics.
--
Timothy Hayes
Senior Research Engineer
Arm Research
Phone: +44-1223405170
***@arm.com
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Timothy Hayes
Senior Research Engineer
Arm Research
Phone: +44-1223405170
***@arm.com
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.