qemu-e2k/include
Paolo Bonzini 2120465fbb queue: fix QSLIST_INSERT_HEAD_ATOMIC race
There is a not-so-subtle race in QSLIST_INSERT_HEAD_ATOMIC.

Because atomic_cmpxchg returns the old value instead of a success flag,
QSLIST_INSERT_HEAD_ATOMIC was checking for success by comparing against
the second argument to atomic_cmpxchg.  Unfortunately, this only works
if the second argument is a local or thread-local variable.

If it is in memory, it can be subject to common subexpression elimination
(and then everything's fine) or reloaded after the atomic_cmpxchg,
depending on the compiler's whims.  If the latter happens, the race can
happen.  A thread can sneak in, doing something on elm->field.sle_next
after the atomic_cmpxchg and before the comparison.  This causes a wrong
failure, and then two threads are using "elm" at the same time.  In the
case discovered by Christian, the sequence was likely something like this:

    thread 1                   | thread 2
    QSLIST_INSERT_HEAD_ATOMIC  |
      atomic_cmpxchg succeeds  |
      elm added to list        |
                               | steal release_pool
                               | QSLIST_REMOVE_HEAD
                               | elm removed from list
                               | ...
                               | QSLIST_INSERT_HEAD_ATOMIC
                               |   (overwrites sle_next)
      spurious failure         |
      atomic_cmpxchg succeeds  |
      elm added to list again  |
                               |
    steal release_pool         |
    QSLIST_REMOVE_HEAD         |
    elm removed again          |

The last three steps could be done by a third thread as well.
A reproducer that failed in a matter of seconds is as follows:

- the guest has 32 VCPUs on a 28 core host (hyperthreading was enabled),
  memory was 16G just to err on the safe side (the host has 64G, but hey
  at least you need no s390)

- the guest has 24 null-aio virtio-blk devices using dataplane
  (-object iothread,id=ioN -drive if=none,id=blkN,driver=null-aio,size=500G
  -device virtio-blk-pci,iothread=ioN,drive=blkN)

- the guest also has a single network interface.  It's only doing loopback
  tests so slirp vs. tap and the model doesn't matter.

- the guest is running fio with the following script:

     [global]
     rw=randread
     blocksize=16k
     ioengine=libaio
     runtime=10m
     buffered=0
     fallocate=none
     time_based
     iodepth=32

     [virtio1a]
     filename=/dev/block/252\:16

     [virtio1b]
     filename=/dev/block/252\:16

     ...

     [virtio24a]
     filename=/dev/block/252\:384

     [virtio24b]
     filename=/dev/block/252\:384

     [listen1]
     protocol=tcp
     ioengine=net
     port=12345
     listen
     rw=read
     bs=4k
     size=1000g

     [connect1]
     protocol=tcp
     hostname=localhost
     ioengine=net
     port=12345
     protocol=tcp
     rw=write
     startdelay=1
     size=1000g

     ...

     [listen8]
     protocol=tcp
     ioengine=net
     port=12352
     listen
     rw=read
     bs=4k
     size=1000g

     [connect8]
     protocol=tcp
     hostname=localhost
     ioengine=net
     port=12352
     rw=write
     startdelay=1
     size=1000g

Moral of the story: I should refrain from writing more clever stuff.
At least it looks like it is not too clever to be undebuggable.

Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1426002357-6889-1-git-send-email-pbonzini@redhat.com
Fixes: c740ad92d0
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-03-12 17:41:23 +00:00
..
block block: add bdrv functions for geometry and blocksize 2015-03-10 14:02:21 +01:00
disas
exec pc-dimm: add a function to calculate VM's current RAM size 2015-03-04 13:00:04 -05:00
fpu softfloat: expand out STATUS macro 2015-02-06 16:11:38 +00:00
hw misc fixes and cleanups 2015-03-12 09:13:07 +00:00
libdecnumber
migration Add more VMSTATE_*_TEST variants for integers 2015-03-09 14:59:56 +01:00
monitor Clean up around error_get_pretty(), qerror_report_err() 2015-02-26 07:01:08 +00:00
net virtio-net,tap: use standard-headers 2015-02-26 13:04:04 +01:00
qapi qerror.h: Swap definitions that were not in alphabetical order 2015-03-10 08:15:33 +03:00
qemu queue: fix QSLIST_INSERT_HEAD_ATOMIC race 2015-03-12 17:41:23 +00:00
qom cpu: Add missing documentation for some CPUClass methods 2015-03-10 17:07:27 +01:00
standard-headers misc fixes and cleanups 2015-03-12 09:13:07 +00:00
sysemu misc fixes and cleanups 2015-03-12 09:13:07 +00:00
ui vnc bugfixes. 2015-03-10 19:28:09 +00:00
config.h
elf.h elf-loader: Provide the possibility to relocate s390 ELF files 2015-03-10 09:26:27 +01:00
glib-compat.h
qemu-common.h qxl: refactor rounding up to a nearest power of 2 2015-03-03 08:33:08 +01:00
qemu-io.h qemu-io: Use BlockBackend 2015-02-16 15:07:19 +00:00
qjson.h
trace-tcg.h
trace.h