Commit Graph

48281 Commits

Author SHA1 Message Date
Peter Maydell
8ede883cfa -----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJX15GWAAoJEL2+eyfA3jBX7M4QALZlqygFJ9qzyURdVwnGFx89
 zlFSwX39W4osJZVSZKKEk/60twyZpPnpcbghh6/HhaARB/erK3WNnk8KmY701BEP
 fus3/2aMyH1Sg7RjCxIWmYItsBOmmJFYJLFFNMNHvvFVDhVpSyc2G746TtcZMMIk
 gsaWkxxlAfdUJRv2AuWXsclu2yaWR/MSc4q6kYbzd/Y3LF97KrNu+XbAzJuIFO4N
 tDgxpFIaMqfpEu1UbrLrUWjVC0PWjt+6KzkxYJKC4g8UmeFidlKx3clzANvzh+dy
 LombZ7vuYAHRZiz2BhVX+A6HpuQG3OLumo/xjw7KC3IN8+iEhOQTzC7vVq6kzGYk
 /BUrQ2SVwQKiOrnKwLnJcgWZgZcp1h8gNkqLPI4p4APdN8qLrtGKLIpdLiZPkYBU
 DVcPeXq+SpdIen4c0yZEwPJkS+5hZPyd1CYBYp2dYZ/sVIrXMQ54LeIzDU4LzlUO
 MKGZslDiJiTMOveFS0DWKcbN11uXNUwbFlMTyvEFKgKRwWFKdRuqjqY8iC6pNV74
 gnYvLpVCOzQlIQtufnW+O7lAsJH7rFHckP3Ffh5rhPhgbmvZLxnil/mhculpP1yn
 Ze0OPTzMjT7f1m9k8ptKOZbs8QO81ueIWQVEmpXeL7F0GK7qSbrK/LTbgzqDjK9m
 2ofD/lRB3A68kz+dUDBa
 =0HpW
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/cody/tags/block-pull-request' into staging

# gpg: Signature made Tue 13 Sep 2016 06:41:42 BST
# gpg:                using RSA key 0xBDBE7B27C0DE3057
# gpg: Good signature from "Jeffrey Cody <jcody@redhat.com>"
# gpg:                 aka "Jeffrey Cody <jeff@codyprime.org>"
# gpg:                 aka "Jeffrey Cody <codyprime@gmail.com>"
# Primary key fingerprint: 9957 4B4D 3474 90E7 9D98  D624 BDBE 7B27 C0DE 3057

* remotes/cody/tags/block-pull-request:
  qapi/block-core: add doc describing GlusterServer vs. SocketAddress
  block/gluster: add support to choose libgfapi logfile

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-09-13 11:40:21 +01:00
Li Qiang
b53dd4495c usb:xhci:fix memory leak in usb_xhci_exit
If the xhci uses msix, it doesn't free the corresponding
memory, thus leading a memory leak. This patch avoid this.

Signed-off-by: Li Qiang <liqiang6-s@360.cn>
Message-id: 57d7d2e0.d4301c0a.d13e9.9a55@mx.google.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-09-13 12:33:09 +02:00
Fam Zheng
dce8921b2b iothread: Stop threads before main() quits
Right after main_loop ends, we release various things but keep iothread
alive. The latter is not prepared to the sudden change of resources.

Specifically, after bdrv_close_all(), virtio-scsi dataplane get a
surprise at the empty BlockBackend:

(gdb) bt
    at /usr/src/debug/qemu-2.6.0/hw/scsi/virtio-scsi.c:543
    at /usr/src/debug/qemu-2.6.0/hw/scsi/virtio-scsi.c:577

It is because the d->conf.blk->root is set to NULL, then
blk_get_aio_context() returns qemu_aio_context, whereas s->ctx is still
pointing to the iothread:

    hw/scsi/virtio-scsi.c:543:

    if (s->dataplane_started) {
        assert(blk_get_aio_context(d->conf.blk) == s->ctx);
    }

To fix this, let's stop iothreads before doing bdrv_close_all().

Cc: qemu-stable@nongnu.org
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1473326931-9699-1-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-13 11:00:57 +01:00
Laurent Vivier
e49f827725 tests: fix qvirtqueue_kick
vq->avail.idx and vq->avail->ring[] are a 16bit values,
so read and write them with readw()/writew() instead of
readl()/writel().

To read/write a 16bit value with a 32bit accessor works fine
on little-endian CPU but not on big endian CPU.

[An equivalent patch for the writew() calls was also sent by
Zhang Shuai <zhangshuai13@huawei.com>.
--Stefan]

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-id: 1472330054-22607-1-git-send-email-lvivier@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-13 11:00:57 +01:00
Changlong Xie
049105a3c1 MAINTAINERS: add maintainer for replication
As per Stefan's suggestion, add Wen and I as co-maintainers
of replication.

Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Message-id: 1469602913-20979-13-git-send-email-xiecl.fnst@cn.fujitsu.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-13 11:00:56 +01:00
Wen Congyang
82ac554345 support replication driver in blockdev-add
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Signed-off-by: Wang WeiWei <wangww.fnst@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1469602913-20979-12-git-send-email-xiecl.fnst@cn.fujitsu.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-13 11:00:56 +01:00
Changlong Xie
b311046696 tests: add unit test case for replication
[Rename get_error test cases to get_error_all to avoid tripping up
scripts that grep for "error:" in test output.  It also reflects the
actual replication API function name better.
-Stefan]

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Signed-off-by: Wang WeiWei <wangww.fnst@cn.fujitsu.com>
Message-id: 1469602913-20979-11-git-send-email-xiecl.fnst@cn.fujitsu.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-13 11:00:56 +01:00
Wen Congyang
29ff789060 replication: Implement new driver for block replication
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Signed-off-by: Wang WeiWei <wangww.fnst@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Message-id: 1469602913-20979-10-git-send-email-xiecl.fnst@cn.fujitsu.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-13 11:00:56 +01:00
Changlong Xie
190b9a8b55 replication: Introduce new APIs to do replication operation
This commit introduces six replication interfaces(for block, network etc).
Firstly we can use replication_(new/remove) to create/destroy replication
instances, then in migration we can use replication_(start/stop/do_checkpoint
/get_error)_all to handle all replication operations. More detail please
refer to replication.h

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Signed-off-by: Wang WeiWei <wangww.fnst@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Message-id: 1469602913-20979-9-git-send-email-xiecl.fnst@cn.fujitsu.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-13 11:00:56 +01:00
Changlong Xie
a6b1d4c081 configure: support replication
configure --(enable/disable)-replication to switch replication
support on/off, and it is on by default.
We later introduce replation support.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Signed-off-by: Wang WeiWei <wangww.fnst@cn.fujitsu.com>
Message-id: 1469602913-20979-8-git-send-email-xiecl.fnst@cn.fujitsu.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-13 11:00:56 +01:00
Wen Congyang
b49f7ead8d mirror: auto complete active commit
Auto complete mirror job in background to prevent from
blocking synchronously

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Signed-off-by: Wang WeiWei <wangww.fnst@cn.fujitsu.com>
Message-id: 1469602913-20979-7-git-send-email-xiecl.fnst@cn.fujitsu.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-13 11:00:56 +01:00
Wen Congyang
68365a3843 docs: block replication's description
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Signed-off-by: Wang WeiWei <wangww.fnst@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1469602913-20979-6-git-send-email-xiecl.fnst@cn.fujitsu.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-13 11:00:56 +01:00
Wen Congyang
258854ad1d block: Link backup into block core
Some programs that add a dependency on it will use
the block layer directly.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Signed-off-by: Wang WeiWei <wangww.fnst@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 1469602913-20979-5-git-send-email-xiecl.fnst@cn.fujitsu.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-13 11:00:56 +01:00
Changlong Xie
a8bbee0edf Backup: export interfaces for extra serialization
Normal backup(sync='none') workflow:
step 1. NBD peformance I/O write from client to server
   qcow2_co_writev
    bdrv_co_writev
     ...
       bdrv_aligned_pwritev
        notifier_with_return_list_notify -> backup_do_cow
         bdrv_driver_pwritev // write new contents

step 2. drive-backup sync=none
   backup_do_cow
   {
    wait_for_overlapping_requests
    cow_request_begin
    for(; start < end; start++) {
            bdrv_co_readv_no_serialising //read old contents from Secondary disk
            bdrv_co_writev // write old contents to hidden-disk
    }
    cow_request_end
   }

step 3. Then roll back to "step 1" to write new contents to Secondary disk.

And for replication, we must make sure that we only read the old contents from
Secondary disk in order to keep contents consistent.

1) Replication workflow of Secondary
                                                         virtio-blk
                                                              ^
------->  1 NBD                                               |
   ||     server                                       3 replication
   ||        ^                                                ^
   ||        |           backing                 backing      |
   ||  Secondary disk 6<-------- hidden-disk 5 <-------- active-disk 4
   ||        |                         ^
   ||        '-------------------------'
   ||           drive-backup sync=none 2

Hence, we need these interfaces to implement coarse-grained serialization between
COW of Secondary disk and the read operation of replication.

Example codes about how to use them:

*#include "block/block_backup.h"

static coroutine_fn int xxx_co_readv()
{
        CowRequest req;
        BlockJob *job = secondary_disk->bs->job;

        if (job) {
              backup_wait_for_overlapping_requests(job, start, end);
              backup_cow_request_begin(&req, job, start, end);
              ret = bdrv_co_readv();
              backup_cow_request_end(&req);
              goto out;
        }
        ret = bdrv_co_readv();
out:
        return ret;
}

Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Wang WeiWei <wangww.fnst@cn.fujitsu.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1469602913-20979-4-git-send-email-xiecl.fnst@cn.fujitsu.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-13 11:00:56 +01:00
Wen Congyang
49d3e828f8 Backup: clear all bitmap when doing block checkpoint
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Signed-off-by: Wang WeiWei <wangww.fnst@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1469602913-20979-3-git-send-email-xiecl.fnst@cn.fujitsu.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-13 11:00:56 +01:00
Wen Congyang
e9d6456e95 block: unblock backup operations in backing file
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Signed-off-by: Wang WeiWei <wangww.fnst@cn.fujitsu.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Kashyap Chamarthy <kchamart@redhat.com>
Message-id: 1469602913-20979-2-git-send-email-xiecl.fnst@cn.fujitsu.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-13 11:00:56 +01:00
Changlong Xie
b5c7ceaf4b virtio-blk: rename virtio_device_info to virtio_blk_info
The old one is confusing with @virtio_device_info in virtio.c,
so make it more appropriate.

Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Message-id: 1470214147-32560-1-git-send-email-xiecl.fnst@cn.fujitsu.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-13 11:00:56 +01:00
Roman Pen
0ed93d84ed linux-aio: process completions from ioq_submit()
In order to reduce completion latency it makes sense to harvest completed
requests ASAP.  Very fast backend device can complete requests just after
submission, so it is worth trying to check ring buffer in order to peek
completed requests directly after io_submit() has been called.

Indeed, this patch reduces the completions latencies and increases the
overall throughput, e.g. the following is the percentiles of number of
completed requests at once:

        1th 10th  20th  30th  40th  50th  60th  70th  80th  90th  99.99th
Before    2    4    42   112   128   128   128   128   128   128    128
 After    1    1     4    14    33    45    47    48    50    51    108

That means, that before the current patch is applied the ring buffer is
observed as full (128 requests were consumed at once) in 60% of calls.

After patch is applied the distribution of number of completed requests
is "smoother" and the queue (requests in-flight) is almost never full.

The fio read results are the following (write results are almost the
same and are not showed here):

  Before
  ------
job: (groupid=0, jobs=8): err= 0: pid=2227: Tue Jul 19 11:29:50 2016
  Description  : [Emulation of Storage Server Access Pattern]
  read : io=54681MB, bw=1822.7MB/s, iops=179779, runt= 30001msec
    slat (usec): min=172, max=16883, avg=338.35, stdev=109.66
    clat (usec): min=1, max=21977, avg=1051.45, stdev=299.29
     lat (usec): min=317, max=22521, avg=1389.83, stdev=300.73
    clat percentiles (usec):
     |  1.00th=[  346],  5.00th=[  596], 10.00th=[  708], 20.00th=[  852],
     | 30.00th=[  932], 40.00th=[  996], 50.00th=[ 1048], 60.00th=[ 1112],
     | 70.00th=[ 1176], 80.00th=[ 1256], 90.00th=[ 1384], 95.00th=[ 1496],
     | 99.00th=[ 1800], 99.50th=[ 1928], 99.90th=[ 2320], 99.95th=[ 2672],
     | 99.99th=[ 4704]
    bw (KB  /s): min=205229, max=553181, per=12.50%, avg=233278.26, stdev=18383.51

  After
  ------
job: (groupid=0, jobs=8): err= 0: pid=2220: Tue Jul 19 11:31:51 2016
  Description  : [Emulation of Storage Server Access Pattern]
  read : io=57637MB, bw=1921.2MB/s, iops=189529, runt= 30002msec
    slat (usec): min=169, max=20636, avg=329.61, stdev=124.18
    clat (usec): min=2, max=19592, avg=988.78, stdev=251.04
     lat (usec): min=381, max=21067, avg=1318.42, stdev=243.58
    clat percentiles (usec):
     |  1.00th=[  310],  5.00th=[  580], 10.00th=[  748], 20.00th=[  876],
     | 30.00th=[  908], 40.00th=[  948], 50.00th=[ 1012], 60.00th=[ 1064],
     | 70.00th=[ 1080], 80.00th=[ 1128], 90.00th=[ 1224], 95.00th=[ 1288],
     | 99.00th=[ 1496], 99.50th=[ 1608], 99.90th=[ 1960], 99.95th=[ 2256],
     | 99.99th=[ 5408]
    bw (KB  /s): min=212149, max=390160, per=12.49%, avg=245746.04, stdev=11606.75

Throughput increased from 1822MB/s to 1921MB/s, average completion latencies
decreased from 1051us to 988us.

Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
Message-id: 1468931263-32667-4-git-send-email-roman.penyaev@profitbricks.com
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: qemu-devel@nongnu.org
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-13 11:00:56 +01:00
Roman Pen
3407de572b linux-aio: split processing events function
Prepare processing events function to be called from ioq_submit(),
thus split function on two parts: the first harvests completed IO
requests, the second submits pending requests.

Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
Message-id: 1468931263-32667-3-git-send-email-roman.penyaev@profitbricks.com
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: qemu-devel@nongnu.org
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-13 11:00:55 +01:00
Roman Pen
9e909a5829 linux-aio: consume events in userspace instead of calling io_getevents
AIO context in userspace is represented as a simple ring buffer, which
can be consumed directly without entering the kernel, which obviously
can bring some performance gain.  QEMU does not use timeout value for
waiting for events completions, so we can consume all events from
userspace.

Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
Message-id: 1468931263-32667-2-git-send-email-roman.penyaev@profitbricks.com
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: qemu-devel@nongnu.org
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-13 11:00:55 +01:00
Stefan Hajnoczi
0647d47cc1 qcow2: avoid memcpy(dst, NULL, len)
Section "7.1.4 Use of library functions" in the C99 standard says:

  If an argument to a function has an invalid value (such as [...]
  a null pointer [...]) [...] the behavior is undefined.

Additionally the "searching and sorting" functions are specified as
requiring valid pointer values as described in 7.1.4.

This patch fixes the following sanitizer errors:

  block/qcow2.c:1807:41: runtime error: null pointer passed as argument 2, which is declared to never be null
  block/qcow2-cluster.c:86:26: runtime error: null pointer passed as argument 2, which is declared to never be null

Reported-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1473758138-19260-1-git-send-email-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-13 11:00:55 +01:00
Gerd Hoffmann
c2843e9390 virtio-vga: adapt to page-per-vq=off
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 1473319012-27560-1-git-send-email-kraxel@redhat.com
2016-09-13 09:28:10 +02:00
Gerd Hoffmann
597966d110 virtio-gpu-pci: tag as not hotpluggable
We can't hotplug display adapters in qemu, tag virtio-gpu-pci
accordingly (virtio-vga already has this).

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Message-id: 1473319037-27645-1-git-send-email-kraxel@redhat.com
2016-09-13 09:26:58 +02:00
Prasad J Pandit
167d97a3de vmsvga: correct bitmap and pixmap size checks
When processing svga command DEFINE_CURSOR in vmsvga_fifo_run,
the computed BITMAP and PIXMAP size are checked against the
'cursor.mask[]' and 'cursor.image[]' array sizes in bytes.
Correct these checks to avoid OOB memory access.

Reported-by: Qinghao Tang <luodalongde@gmail.com>
Reported-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-id: 1473338754-15430-1-git-send-email-ppandit@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-09-13 09:24:35 +02:00
Gerd Hoffmann
6a71123469 usb-host: fix streams detection in usb_host_speed_compat
The companion descriptor is present on all usb3 devices, not only
those with streams support.  We need to check attributes to see
whenever the device uses streams or not.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 1473406890-30164-1-git-send-email-kraxel@redhat.com
2016-09-13 09:19:26 +02:00
Hans Petter Selasky
b66ad1f1aa xhci: Fix remainder field for TR_SETUP completion event.
Previously the code would incorrectly report the remainder as 8 bytes. A
remainder of 0 bytes should be reported when the SETUP packet is
successfully transferred. Found using FreeBSD's XHCI driver.

Signed-off-by: Hans Petter Selasky <hps@selasky.org>

[ kraxel: codestyle fixup ]

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-09-13 09:07:18 +02:00
Gonglei
3e10c3ecfc vnc: fix qemu crash because of SIGSEGV
The backtrace is:

0x00007f0b75cdf880 in pixman_image_get_stride () from /lib64/libpixman-1.so.0
0x00007f0b77bcb3cf in vnc_server_fb_stride (vd=0x7f0b7a1a2bb0) at ui/vnc.c:680
vnc_dpy_copy (dcl=0x7f0b7a1a2c00, src_x=224, src_y=263, dst_x=319, dst_y=363, w=1, h=1) at ui/vnc.c:915
0x00007f0b77bbcc35 in dpy_gfx_copy (con=0x7f0b7a146210, src_x=src_x@entry=224, src_y=src_y@entry=263, dst_x=dst_x@entry=319,
dst_y=dst_y@entry=363, w=1, h=1) at ui/console.c:1575
0x00007f0b77bbda4e in qemu_console_copy (con=<optimized out>, src_x=src_x@entry=224, src_y=src_y@entry=263, dst_x=dst_x@entry=319,
dst_y=dst_y@entry=363, w=<optimized out>, h=<optimized out>) at ui/console.c:2111
0x00007f0b77ac0980 in cirrus_do_copy (h=<optimized out>, w=<optimized out>, src=<optimized out>, dst=<optimized out>, s=0x7f0b7b086090) at hw/display/cirrus_vga.c:774
cirrus_bitblt_videotovideo_copy (s=0x7f0b7b086090) at hw/display/cirrus_vga.c:793
cirrus_bitblt_videotovideo (s=0x7f0b7b086090) at hw/display/cirrus_vga.c:915
cirrus_bitblt_start (s=0x7f0b7b086090) at hw/display/cirrus_vga.c:1056
0x00007f0b77965cfb in memory_region_write_accessor (mr=0x7f0b7b096e40, addr=320, value=<optimized out>, size=1, shift=<optimized out>,mask=<optimized out>, attrs=...) at /root/rpmbuild/BUILD/master/qemu/memory.c:525
0x00007f0b77963f59 in access_with_adjusted_size (addr=addr@entry=320, value=value@entry=0x7f0b69a268d8, size=size@entry=4,
access_size_min=<optimized out>, access_size_max=<optimized out>, access=access@entry=0x7f0b77965c80 <memory_region_write_accessor>,
mr=mr@entry=0x7f0b7b096e40, attrs=attrs@entry=...) at /root/rpmbuild/BUILD/master/qemu/memory.c:591
0x00007f0b77968315 in memory_region_dispatch_write (mr=mr@entry=0x7f0b7b096e40, addr=addr@entry=320, data=18446744073709551362,
size=size@entry=4, attrs=attrs@entry=...) at /root/rpmbuild/BUILD/master/qemu/memory.c:1262
0x00007f0b779256a9 in address_space_write_continue (mr=0x7f0b7b096e40, l=4, addr1=320, len=4, buf=0x7f0b77713028 "\002\377\377\377",
attrs=..., addr=4273930560, as=0x7f0b7827d280 <address_space_memory>) at /root/rpmbuild/BUILD/master/qemu/exec.c:2544
address_space_write (as=<optimized out>, addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>) at /root/rpmbuild/BUILD/master/qemu/exec.c:2601
0x00007f0b77925c1d in address_space_rw (as=<optimized out>, addr=<optimized out>, attrs=..., attrs@entry=...,
buf=buf@entry=0x7f0b77713028 "\002\377\377\377", len=<optimized out>, is_write=<optimized out>) at /root/rpmbuild/BUILD/master/qemu/exec.c:2703
0x00007f0b77962f53 in kvm_cpu_exec (cpu=cpu@entry=0x7f0b79fcc2d0) at /root/rpmbuild/BUILD/master/qemu/kvm-all.c:1965
0x00007f0b77950cc6 in qemu_kvm_cpu_thread_fn (arg=0x7f0b79fcc2d0) at /root/rpmbuild/BUILD/master/qemu/cpus.c:1078
0x00007f0b744b3dc5 in start_thread (arg=0x7f0b69a27700) at pthread_create.c:308
0x00007f0b70d3d66d in clone () from /lib64/libc.so.6

The code path while meeting segfault:
 vnc_dpy_copy
   vnc_update_client
     vnc_disconnect_finish [while vnc_disconnect_start() is invoked because somethins wrong]
       vnc_update_server_surface
         vd->server = NULL;
   vnc_server_fb_stride
     pixman_image_get_stride(vd->server)

Let's add a non-NULL check before calling vnc_server_fb_stride() to avoid segmentation fault.

Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Daniel P. Berrange <berrange@redhat.com>
Reported-by: Yanying Zhuang <ann.zhuangyanying@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 1472788698-120964-1-git-send-email-arei.gonglei@huawei.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-09-13 08:01:39 +02:00
Li Zhijian
93ca519ec4 qemu-options.hx: correct spice options streaming-video default document value to 'off'
since f1d3e58, the code had changed the default value to 'off', so this patch
make document and code are consistent.

Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Message-id: 1470024419-10886-1-git-send-email-lizhijian@cn.fujitsu.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-09-13 08:01:39 +02:00
Peter Maydell
99a9ef44dc ui/curses.c: Clean up nextchr logic
Coverity identifies that at the top of the while(1) loop
in curses_refresh() the variable nextchr is always ERR,
and so the else case of the first if() is dead code.
Remove this dead code, and narrow the scope of the
nextchr variable to the place where it's used.

(This confused logic has been present since the curses
code was added to QEMU in 2008.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1470925407-23850-3-git-send-email-peter.maydell@linaro.org
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-09-13 08:01:39 +02:00
Peter Maydell
bba4e1b591 ui/curses.c: Ensure we don't read off the end of curses2qemu array
Coverity spots that there is no bounds check before we
access the curses2qemu[] array.  Add one, bringing this
code path into line with the one that looks up entries
in curses2keysym[].

In theory getch() shouldn't return out of range keycodes,
but it's better not to assume this.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1470925407-23850-2-git-send-email-peter.maydell@linaro.org
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-09-13 08:01:39 +02:00
Prasanna Kumar Kalever
c76d7aab81 qapi/block-core: add doc describing GlusterServer vs. SocketAddress
Added documentation describing relation between GlusterServer and
SocketAddress qapi schemas.

Thanks to Markus Armbruster <armbru@redhat.com>

Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Message-id: 1471715924-3642-1-git-send-email-prasanna.kalever@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-09-13 01:34:55 -04:00
Prasanna Kumar Kalever
e9db8ff38e block/gluster: add support to choose libgfapi logfile
currently all the libgfapi logs defaults to '/dev/stderr' as it was hardcoded
in a call to glfs logging api. When the debug level is chosen to DEBUG/TRACE,
gfapi logs will be huge and fill/overflow the console view.

This patch provides a commandline option to mention log file path which helps
in logging to the specified file and also help in persisting the gfapi logs.

Usage:
-----
 *URI Style:
  ---------
  -drive file=gluster://hostname/volname/image.qcow2,file.debug=9,\
                      file.logfile=/var/log/qemu/qemu-gfapi.log

 *JSON Style:
  ----------
  'json:{
           "driver":"qcow2",
           "file":{
              "driver":"gluster",
              "volume":"volname",
              "path":"image.qcow2",
              "debug":"9",
              "logfile":"/var/log/qemu/qemu-gfapi.log",
              "server":[
                 {
                    "type":"tcp",
                    "host":"1.2.3.4",
                    "port":24007
                 },
                 {
                    "type":"unix",
                    "socket":"/var/run/glusterd.socket"
                 }
              ]
           }
        }'

Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-09-13 01:34:47 -04:00
Peter Maydell
7263da7804 Update OpenBIOS images
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQEcBAABAgAGBQJX1oidAAoJEFvCxW+uDzIfFYUH/3Jg8V4MmqeHCPqOsyvkwww/
 CtLgZcNxKYGO9sjwjfcUT+/PaCX1F9k30WpPGE3FZ5IQ1aKTRP5asiOWtpGWxFnS
 WsIKkDNRk1oMRJr0LNIcuojiyJRSro1HPAhSygisPo7B/Y9m/zy2A/ASb1Nd48k6
 Dn3foygqnbgRg/N0pDaNrxIdn/nKfGL0cJX0GJXYZ700QUtKYnGdLHNUXcHMIxvP
 +J5qp4rWaDYCTy6WfUgOHG1ide/F6Kf0k0SLWU5an2XHsZX3WG9G6GcSk2SIsX2j
 gqCObEFBOZinNfpRKncrRB6cA/aFlZoAzkJRpHxpZgyAblhtXNflGwzTiWBM/7M=
 =wSuC
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/mcayland/tags/qemu-openbios-signed' into staging

Update OpenBIOS images

# gpg: Signature made Mon 12 Sep 2016 11:51:09 BST
# gpg:                using RSA key 0x5BC2C56FAE0F321F
# gpg: Good signature from "Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>"
# Primary key fingerprint: CC62 1AB9 8E82 200D 915C  C9C4 5BC2 C56F AE0F 321F

* remotes/mcayland/tags/qemu-openbios-signed:
  Update OpenBIOS images to c5542f2 built from submodule.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-09-12 15:09:47 +01:00
Peter Maydell
d4c61988b8 Merge qcrypto 2016/09/12 v1
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJX1os8AAoJEL6G67QVEE/fp+wP/jif//hLmY8hA0lFMzNRrpbf
 wofUbnMnuTjh5s9fgE82BhmyahQutC6IOYu/TqX5mZd+NbQgdh/sxnii9n6rLFxJ
 jXxfPVQ6r8n/73Nza7zU9J+sUFWd6vV352tfcbxc1X2AWltxgXAYY+Y0NqAtdeip
 qlMXyaZg6ZQzOEradrt5o4hcKnvDzOpV3/Xn2Ci1G72suUU7dXth6fYBpxB5WqRK
 JrgN22tDl3Xn5Ly7CxlBoQdQ0VgMQC/Wm1SKwxnNkEpQ7WKbmIqUxchjahug1BsA
 bSHGmlYzOD+MQGLN4qUYlMEA8FQqdzHmVZn98usnXM43YxT6oZJx9ZH0xO9tfV4V
 tzhx0jstNTO90mcxa3RYN24tbPDnxz/raFjlRjW6bvrPLb16QdpEczzkC1Cpz9E2
 /r/3IF5ZNrH8QV/jCc+FFI3oL+83y7vVcsd71KreddEJLXt2PBs2qXnw1lE7i78e
 qciHr92ggNpu1+su4zcgtSiiX+5kwxwovTe8MhaHw7vykH8yc8xhEYzKQa7tI75h
 jRJKWQK2Qo4dDCFRLC4kOzVsqk9lgU8AkJ/TUS57Ug5BVoNGCM/vmmrG6ZBTaWPp
 gL08He2/Px0dn/wCydbuFe2ZnDk/ROPOJ/jKn5WgJdiAtA9LULPDdI6bljZxXHjc
 O30NO6NLhENXLyE6GJDj
 =x7IS
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/berrange/tags/pull-qcrypto-2016-09-12-1' into staging

Merge qcrypto 2016/09/12 v1

# gpg: Signature made Mon 12 Sep 2016 12:02:20 BST
# gpg:                using RSA key 0xBE86EBB415104FDF
# gpg: Good signature from "Daniel P. Berrange <dan@berrange.com>"
# gpg:                 aka "Daniel P. Berrange <berrange@redhat.com>"
# Primary key fingerprint: DAF3 A6FD B26B 6291 2D0E  8E3F BE86 EBB4 1510 4FDF

* remotes/berrange/tags/pull-qcrypto-2016-09-12-1:
  crypto: report enum strings instead of values in errors
  crypto: fix building complaint
  crypto: ensure XTS is only used with ciphers with 16 byte blocks

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-09-12 12:48:47 +01:00
Daniel P. Berrange
90d6f60d07 crypto: report enum strings instead of values in errors
Several error messages print out the raw enum value, which
is less than helpful to users, as these values are not
documented, nor stable across QEMU releases. Switch to use
the enum string instead.

The nettle impl also had two typos where it mistakenly
said "algorithm" instead of "mode", and actually reported
the algorithm value too.

Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2016-09-12 12:00:52 +01:00
Gonglei
d9269b274a crypto: fix building complaint
gnutls commit 846753877d renamed LIBGNUTLS_VERSION_NUMBER to GNUTLS_VERSION_NUMBER.
If using gnutls before that verion, we'll get the below warning:
crypto/tlscredsx509.c:618:5: warning: "GNUTLS_VERSION_NUMBER" is not defined

Because gnutls 3.x still defines LIBGNUTLS_VERSION_NUMBER for back compat, Let's
use LIBGNUTLS_VERSION_NUMBER instead of GNUTLS_VERSION_NUMBER to fix building
complaint.

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2016-09-12 12:00:52 +01:00
Daniel P. Berrange
a5d2f44d0d crypto: ensure XTS is only used with ciphers with 16 byte blocks
The XTS cipher mode needs to be used with a cipher which has
a block size of 16 bytes. If a mis-matching block size is used,
the code will either corrupt memory beyond the IV array, or
not fully encrypt/decrypt the IV.

This fixes a memory corruption crash when attempting to use
cast5-128 with xts, since the former has an 8 byte block size.

A test case is added to ensure the cipher creation fails with
such an invalid combination.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2016-09-12 12:00:06 +01:00
Peter Maydell
c569c537e5 virtio,vhost,pc: fixes and updates
balloon fixes wrt migration
 virtio-vsock device support
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJX0ytNAAoJECgfDbjSjVRp7GYH/0eRSoQhStOIa8LA8AmNTl6O
 +qf1oTXKIIVgnTqcs/YR/ELLiS2ncSVyMqsXD9Cm87RtqBKRgdfVR8lJ/2LfGXMD
 bRiyUdvceJ83Jz9F6UQJddpAAigQaUOzef+wKGofl3nCveSl/PdpJxektOlS2RCu
 nJvRRQza9m2XIGK2rKzY9vXNpubpcrwHoG7m74VtirAdg/s5p51e2UJ9KQuwr9nG
 zleZTs3RBrwh5W+iUOZFk3aQFGX2fnlw7P6F16QmfuQ7N/OMaaQ6WAmdz4uzzWic
 s3s9VZzZ4tIaZBZjUVgsDUubUKW4Gy5yFEUE3796VFCpiF6y7bI2gVGRT+0Qo7o=
 =/X+O
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging

virtio,vhost,pc: fixes and updates

balloon fixes wrt migration
virtio-vsock device support

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

# gpg: Signature made Fri 09 Sep 2016 22:36:13 BST
# gpg:                using RSA key 0x281F0DB8D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>"
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
#      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* remotes/mst/tags/for_upstream:
  vhost-vsock: add virtio sockets device
  tests/acpi: speedup acpi tests
  virtio-pci: minor refactoring
  vhost: don't set vring call if no vector
  virtio-pci: error out when both legacy and modern modes are disabled
  virtio-balloon: fix stats vq migration
  virtio: add virtqueue_rewind()
  virtio-balloon: discard virtqueue element on reset
  virtio: zero vq->inuse in virtio_reset()
  virtio-pci: reduce modern_mem_bar size
  target-i386: present virtual L3 cache info for vcpus
  pc: Add 2.8 machine
  virtio-pci: use size from correct structure
  virtio: Tell the user what went wrong when event_notifier_init failed

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-09-12 11:25:40 +01:00
Mark Cave-Ayland
a26f7f2cb8 Update OpenBIOS images to c5542f2 built from submodule.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
2016-09-12 08:14:50 +01:00
Stefan Hajnoczi
fc0b9b0e1c vhost-vsock: add virtio sockets device
Implement the new virtio sockets device for host<->guest communication
using the Sockets API.  Most of the work is done in a vhost kernel
driver so that virtio-vsock can hook into the AF_VSOCK address family.
The QEMU vhost-vsock device handles configuration and live migration
while the rx/tx happens in the vhost_vsock.ko Linux kernel driver.

The vsock device must be given a CID (host-wide unique address):

  # qemu -device vhost-vsock-pci,id=vhost-vsock-pci0,guest-cid=3 ...

For more information see:
http://qemu-project.org/Features/VirtioVsock

[Endianness fixes and virtio-ccw support by Claudio Imbrenda
<imbrenda@linux.vnet.ibm.com>]

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
[mst: rebase to master]
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-09-10 00:28:08 +03:00
Marcel Apfelbaum
947b205fdb tests/acpi: speedup acpi tests
Use kvm acceleration if available.
Disable kernel-irqchip and use qemu64 cpu
for both kvm and tcg cases.

Using kvm acceleration saves about a second
and disabling kernel-irqchip has no visible
performance impact.

Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-09-10 00:08:28 +03:00
Michael S. Tsirkin
71d19fc513 virtio-pci: minor refactoring
!legacy && !modern is shorter than !(legacy || modern).
I also perfer this (less ()s) as a matter of taste.

Cc: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-09-09 20:58:34 +03:00
Jason Wang
96a3d98d2c vhost: don't set vring call if no vector
We used to set vring call fd unconditionally even if guest driver does
not use MSIX for this vritqueue at all. This will cause lots of
unnecessary userspace access and other checks for drivers does not use
interrupt at all (e.g virtio-net pmd). So check and clean vring call
fd if guest does not use any vector for this virtqueue at
all.

Perf diffs (on rx) shows lots of cpus wasted on vhost_signal() were saved:

#
    28.12%  -27.82%  [vhost]           [k] vhost_signal
    14.44%   -1.69%  [kernel.vmlinux]  [k] copy_user_generic_string
     7.05%   +1.53%  [kernel.vmlinux]  [k] __free_page_frag
     6.51%   +5.53%  [vhost]           [k] vhost_get_vq_desc
...

Pktgen tests shows 15.8% improvement on rx pps and 6.5% on tx pps.

Before: RX 2.08Mpps TX 1.35Mpps
After:  RX 2.41Mpps TX 1.44Mpps

Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-09-09 20:58:34 +03:00
Greg Kurz
3eff376977 virtio-pci: error out when both legacy and modern modes are disabled
Without presuming if we got there because of a user mistake or some
more subtle bug in the tooling, it really does not make sense to
implement a non-functional device.

Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-09-09 20:58:34 +03:00
Ladi Prosek
4a1e48beca virtio-balloon: fix stats vq migration
The statistics virtqueue is not migrated properly because virtio-balloon
does not include s->stats_vq_elem in the migration stream.

After migration the statistics virtqueue hangs because the host never
completes the last element (s->stats_vq_elem is NULL on the destination
QEMU).  Therefore the guest never submits new elements and the virtqueue
is hung.

Instead of changing the migration stream format in an incompatible way,
detect the migration case and rewind the virtqueue so the last element
can be completed.

Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Roman Kagan <rkagan@virtuozzo.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Suggested-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-09-09 20:58:34 +03:00
Stefan Hajnoczi
297a75e6c5 virtio: add virtqueue_rewind()
virtqueue_discard() requires a VirtQueueElement but virtio-balloon does
not migrate its in-use element.  Introduce a new function that is
similar to virtqueue_discard() but doesn't require a VirtQueueElement.

This will allow virtio-balloon to access element again after migration
with the usual proviso that the guest may have modified the vring since
last time.

Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Roman Kagan <rkagan@virtuozzo.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-09-09 20:58:34 +03:00
Ladi Prosek
104e70cae7 virtio-balloon: discard virtqueue element on reset
The one pending element is being freed but not discarded on device
reset, which causes svq->inuse to creep up, eventually hitting the
"Virtqueue size exceeded" error.

Properly discarding the element on device reset makes sure that its
buffers are unmapped and the inuse counter stays balanced.

Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Roman Kagan <rkagan@virtuozzo.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-09-09 20:58:34 +03:00
Stefan Hajnoczi
4b7f91ed02 virtio: zero vq->inuse in virtio_reset()
vq->inuse must be zeroed upon device reset like most other virtqueue
fields.

In theory, virtio_reset() just needs assert(vq->inuse == 0) since
devices must clean up in-flight requests during reset (requests cannot
not be leaked!).

In practice, it is difficult to achieve vq->inuse == 0 across reset
because balloon, blk, 9p, etc implement various different strategies for
cleaning up requests.  Most devices call g_free(elem) directly without
telling virtio.c that the VirtQueueElement is cleaned up.  Therefore
vq->inuse is not decremented during reset.

This patch zeroes vq->inuse and trusts that devices are not leaking
VirtQueueElements across reset.

I will send a follow-up series that refactors request life-cycle across
all devices and converts vq->inuse = 0 into assert(vq->inuse == 0) but
this more invasive approach is not appropriate for stable trees.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Cc: qemu-stable <qemu-stable@nongnu.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Ladi Prosek <lprosek@redhat.com>
2016-09-09 20:58:34 +03:00
Marcel Apfelbaum
d9997d89a4 virtio-pci: reduce modern_mem_bar size
Currently each VQ Notification Virtio Capability is allocated
on a different page. The idea is to enable split drivers within
guests, however there are no known plans to do that.
The allocation will result in a 8MB BAR, more than various
guest firmwares pre-allocates for PCI Bridges hotplug process.

Reserve 4 bytes per VQ by default and add a new parameter
"page-per-vq" to be used with split drivers.

Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-09-09 20:58:34 +03:00
Longpeng(Mike)
14c985cffa target-i386: present virtual L3 cache info for vcpus
Some software algorithms are based on the hardware's cache info, for example,
for x86 linux kernel, when cpu1 want to wakeup a task on cpu2, cpu1 will trigger
a resched IPI and told cpu2 to do the wakeup if they don't share low level
cache. Oppositely, cpu1 will access cpu2's runqueue directly if they share llc.
The relevant linux-kernel code as bellow:

	static void ttwu_queue(struct task_struct *p, int cpu)
	{
		struct rq *rq = cpu_rq(cpu);
		......
		if (... && !cpus_share_cache(smp_processor_id(), cpu)) {
			......
			ttwu_queue_remote(p, cpu); /* will trigger RES IPI */
			return;
		}
		......
		ttwu_do_activate(rq, p, 0); /* access target's rq directly */
		......
	}

In real hardware, the cpus on the same socket share L3 cache, so one won't
trigger a resched IPIs when wakeup a task on others. But QEMU doesn't present a
virtual L3 cache info for VM, then the linux guest will trigger lots of RES IPIs
under some workloads even if the virtual cpus belongs to the same virtual socket.

For KVM, there will be lots of vmexit due to guest send IPIs.
The workload is a SAP HANA's testsuite, we run it one round(about 40 minuates)
and observe the (Suse11sp3)Guest's amounts of RES IPIs which triggering during
the period:
        No-L3           With-L3(applied this patch)
cpu0:	363890		44582
cpu1:	373405		43109
cpu2:	340783		43797
cpu3:	333854		43409
cpu4:	327170		40038
cpu5:	325491		39922
cpu6:	319129		42391
cpu7:	306480		41035
cpu8:	161139		32188
cpu9:	164649		31024
cpu10:	149823		30398
cpu11:	149823		32455
cpu12:	164830		35143
cpu13:	172269		35805
cpu14:	179979		33898
cpu15:	194505		32754
avg:	268963.6	40129.8

The VM's topology is "1*socket 8*cores 2*threads".
After present virtual L3 cache info for VM, the amounts of RES IPIs in guest
reduce 85%.

For KVM, vcpus send IPIs will cause vmexit which is expensive, so it can cause
severe performance degradation. We had tested the overall system performance if
vcpus actually run on sparate physical socket. With L3 cache, the performance
improves 7.2%~33.1%(avg:15.7%).

Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-09-09 20:58:34 +03:00