2120465fbb
There is a not-so-subtle race in QSLIST_INSERT_HEAD_ATOMIC.
Because atomic_cmpxchg returns the old value instead of a success flag,
QSLIST_INSERT_HEAD_ATOMIC was checking for success by comparing against
the second argument to atomic_cmpxchg. Unfortunately, this only works
if the second argument is a local or thread-local variable.
If it is in memory, it can be subject to common subexpression elimination
(and then everything's fine) or reloaded after the atomic_cmpxchg,
depending on the compiler's whims. If the latter happens, the race can
happen. A thread can sneak in, doing something on elm->field.sle_next
after the atomic_cmpxchg and before the comparison. This causes a wrong
failure, and then two threads are using "elm" at the same time. In the
case discovered by Christian, the sequence was likely something like this:
thread 1 | thread 2
QSLIST_INSERT_HEAD_ATOMIC |
atomic_cmpxchg succeeds |
elm added to list |
| steal release_pool
| QSLIST_REMOVE_HEAD
| elm removed from list
| ...
| QSLIST_INSERT_HEAD_ATOMIC
| (overwrites sle_next)
spurious failure |
atomic_cmpxchg succeeds |
elm added to list again |
|
steal release_pool |
QSLIST_REMOVE_HEAD |
elm removed again |
The last three steps could be done by a third thread as well.
A reproducer that failed in a matter of seconds is as follows:
- the guest has 32 VCPUs on a 28 core host (hyperthreading was enabled),
memory was 16G just to err on the safe side (the host has 64G, but hey
at least you need no s390)
- the guest has 24 null-aio virtio-blk devices using dataplane
(-object iothread,id=ioN -drive if=none,id=blkN,driver=null-aio,size=500G
-device virtio-blk-pci,iothread=ioN,drive=blkN)
- the guest also has a single network interface. It's only doing loopback
tests so slirp vs. tap and the model doesn't matter.
- the guest is running fio with the following script:
[global]
rw=randread
blocksize=16k
ioengine=libaio
runtime=10m
buffered=0
fallocate=none
time_based
iodepth=32
[virtio1a]
filename=/dev/block/252\:16
[virtio1b]
filename=/dev/block/252\:16
...
[virtio24a]
filename=/dev/block/252\:384
[virtio24b]
filename=/dev/block/252\:384
[listen1]
protocol=tcp
ioengine=net
port=12345
listen
rw=read
bs=4k
size=1000g
[connect1]
protocol=tcp
hostname=localhost
ioengine=net
port=12345
protocol=tcp
rw=write
startdelay=1
size=1000g
...
[listen8]
protocol=tcp
ioengine=net
port=12352
listen
rw=read
bs=4k
size=1000g
[connect8]
protocol=tcp
hostname=localhost
ioengine=net
port=12352
rw=write
startdelay=1
size=1000g
Moral of the story: I should refrain from writing more clever stuff.
At least it looks like it is not too clever to be undebuggable.
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1426002357-6889-1-git-send-email-pbonzini@redhat.com
Fixes:
|
||
---|---|---|
.. | ||
acl.h | ||
aes.h | ||
atomic.h | ||
bitmap.h | ||
bitops.h | ||
bswap.h | ||
compatfd.h | ||
compiler.h | ||
config-file.h | ||
crc32c.h | ||
envlist.h | ||
error-report.h | ||
event_notifier.h | ||
fifo8.h | ||
hbitmap.h | ||
host-utils.h | ||
int128.h | ||
iov.h | ||
log.h | ||
main-loop.h | ||
module.h | ||
notify.h | ||
option_int.h | ||
option.h | ||
osdep.h | ||
queue.h | ||
range.h | ||
ratelimit.h | ||
rcu_queue.h | ||
rcu.h | ||
readline.h | ||
rfifolock.h | ||
seqlock.h | ||
sockets.h | ||
thread-posix.h | ||
thread-win32.h | ||
thread.h | ||
throttle.h | ||
timer.h | ||
tls.h | ||
typedefs.h | ||
uri.h | ||
xattr.h |