qemu-e2k/include/qemu
Paolo Bonzini 2120465fbb queue: fix QSLIST_INSERT_HEAD_ATOMIC race
There is a not-so-subtle race in QSLIST_INSERT_HEAD_ATOMIC.

Because atomic_cmpxchg returns the old value instead of a success flag,
QSLIST_INSERT_HEAD_ATOMIC was checking for success by comparing against
the second argument to atomic_cmpxchg.  Unfortunately, this only works
if the second argument is a local or thread-local variable.

If it is in memory, it can be subject to common subexpression elimination
(and then everything's fine) or reloaded after the atomic_cmpxchg,
depending on the compiler's whims.  If the latter happens, the race can
happen.  A thread can sneak in, doing something on elm->field.sle_next
after the atomic_cmpxchg and before the comparison.  This causes a wrong
failure, and then two threads are using "elm" at the same time.  In the
case discovered by Christian, the sequence was likely something like this:

    thread 1                   | thread 2
    QSLIST_INSERT_HEAD_ATOMIC  |
      atomic_cmpxchg succeeds  |
      elm added to list        |
                               | steal release_pool
                               | QSLIST_REMOVE_HEAD
                               | elm removed from list
                               | ...
                               | QSLIST_INSERT_HEAD_ATOMIC
                               |   (overwrites sle_next)
      spurious failure         |
      atomic_cmpxchg succeeds  |
      elm added to list again  |
                               |
    steal release_pool         |
    QSLIST_REMOVE_HEAD         |
    elm removed again          |

The last three steps could be done by a third thread as well.
A reproducer that failed in a matter of seconds is as follows:

- the guest has 32 VCPUs on a 28 core host (hyperthreading was enabled),
  memory was 16G just to err on the safe side (the host has 64G, but hey
  at least you need no s390)

- the guest has 24 null-aio virtio-blk devices using dataplane
  (-object iothread,id=ioN -drive if=none,id=blkN,driver=null-aio,size=500G
  -device virtio-blk-pci,iothread=ioN,drive=blkN)

- the guest also has a single network interface.  It's only doing loopback
  tests so slirp vs. tap and the model doesn't matter.

- the guest is running fio with the following script:

     [global]
     rw=randread
     blocksize=16k
     ioengine=libaio
     runtime=10m
     buffered=0
     fallocate=none
     time_based
     iodepth=32

     [virtio1a]
     filename=/dev/block/252\:16

     [virtio1b]
     filename=/dev/block/252\:16

     ...

     [virtio24a]
     filename=/dev/block/252\:384

     [virtio24b]
     filename=/dev/block/252\:384

     [listen1]
     protocol=tcp
     ioengine=net
     port=12345
     listen
     rw=read
     bs=4k
     size=1000g

     [connect1]
     protocol=tcp
     hostname=localhost
     ioengine=net
     port=12345
     protocol=tcp
     rw=write
     startdelay=1
     size=1000g

     ...

     [listen8]
     protocol=tcp
     ioengine=net
     port=12352
     listen
     rw=read
     bs=4k
     size=1000g

     [connect8]
     protocol=tcp
     hostname=localhost
     ioengine=net
     port=12352
     rw=write
     startdelay=1
     size=1000g

Moral of the story: I should refrain from writing more clever stuff.
At least it looks like it is not too clever to be undebuggable.

Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1426002357-6889-1-git-send-email-pbonzini@redhat.com
Fixes: c740ad92d0
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-03-12 17:41:23 +00:00
..
acl.h
aes.h include/qemu/aes.h: Avoid conflicts with FreeBSD AES functions 2014-06-19 16:13:38 +01:00
atomic.h rcu: add rcu library 2015-02-02 16:55:10 +01:00
bitmap.h bitmap.h: Don't include qemu-common.h 2014-11-02 10:04:34 +03:00
bitops.h bitops.h: sextract64() return type should be int64_t, not uint64_t 2015-03-11 13:21:06 +00:00
bswap.h cpu_ldst.h, cpu-all.h, bswap.h: Update documentation on ld/st accessors 2015-01-20 15:19:35 +00:00
compatfd.h
compiler.h qemu/compiler: Define QEMU_ARTIFICIAL 2014-09-29 14:55:28 -04:00
config-file.h qemu-option: introduce qemu_find_opts_singleton 2014-04-27 13:04:18 +04:00
crc32c.h include/qemu/crc32c.h: Rename include guards to match filename 2014-02-26 17:20:07 +00:00
envlist.h
error-report.h qemu-error: Add error_vreport() 2014-10-09 15:36:15 +02:00
event_notifier.h
fifo8.h
hbitmap.h
host-utils.h target-ppc: Add ISA2.06 divde[o] Instructions 2014-03-05 03:06:39 +01:00
int128.h int128: Add int128_exts64() 2014-05-30 13:00:28 -06:00
iov.h
log.h qemu-log: add log category for MMU info 2014-12-16 18:43:19 +00:00
main-loop.h async: aio_context_new(): Handle event_notifier_init failure 2014-09-22 11:39:48 +01:00
module.h
notify.h
option_int.h QemuOpts: change opt->name|str from (const char *) to (char *) 2014-06-16 17:23:20 +08:00
option.h qemu-img: Suppress unhelpful extra errors in convert, amend 2015-02-26 14:51:21 +01:00
osdep.h memory: expose alignment used for allocating RAM as MemoryRegion API 2014-11-23 12:11:30 +02:00
queue.h queue: fix QSLIST_INSERT_HEAD_ATOMIC race 2015-03-12 17:41:23 +00:00
range.h Introduce signed range. 2014-06-19 18:44:19 +03:00
ratelimit.h
rcu_queue.h rcu: introduce RCU-enabled QLIST 2015-02-16 17:30:19 +01:00
rcu.h rcu: add g_free_rcu 2015-02-16 17:30:19 +01:00
readline.h
rfifolock.h rfifolock: add recursive FIFO lock 2014-03-13 14:42:21 +01:00
seqlock.h
sockets.h socket shutdown 2015-01-16 13:06:17 +05:30
thread-posix.h
thread-win32.h
thread.h rcu: add rcu library 2015-02-02 16:55:10 +01:00
throttle.h throttle: add throttle_detach/attach_aio_context() 2014-06-04 09:56:12 +02:00
timer.h cpu-exec: simplify init_delay_params 2015-02-02 16:55:11 +01:00
tls.h
typedefs.h Add device listener interface 2015-01-20 14:24:07 +00:00
uri.h
xattr.h