Commit Graph

46751 Commits

Author SHA1 Message Date
Eric Blake f3c32fce36 nbd: Detect servers that send unexpected error values
Add some debugging to flag servers that are not compliant to
the NBD protocol.  This would have flagged the server bug
fixed in commit c0301fcc.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alex Bligh <alex@alex.org.uk>

Message-Id: <1463006384-7734-11-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16 18:39:05 +02:00
Eric Blake f57e2416aa nbd: Clean up ioctl handling of qemu-nbd -c
The kernel ioctl() interface into NBD is limited to 'unsigned long';
we MUST pass in input with that type (and not int or size_t, as
there may be platform ABIs where the wrong types promote incorrectly
through var-args).  Furthermore, on 32-bit platforms, the kernel
is limited to a maximum export size of 2T (our BLKSIZE of 512 times
a SIZE_BLOCKS constrained by 32 bit unsigned long).

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1463006384-7734-8-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16 18:39:05 +02:00
Eric Blake 98494e3b92 nbd: Group all Linux-specific ioctl code in one place
NBD ioctl()s are used to manage an NBD client session where
initial handshake is done in userspace, but then the transmission
phase is handed off to the kernel through a /dev/nbdX device.
As such, all ioctls sent to the kernel on the /dev/nbdX fd belong
in client.c; nbd_disconnect() was out-of-place in server.c.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1463006384-7734-7-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16 18:39:05 +02:00
Eric Blake ab7c548e26 nbd: Reject unknown request flags
The NBD protocol says that clients should not send a command flag
that has not been negotiated (whether by the client requesting an
option during a handshake, or because we advertise support for the
flag in response to NBD_OPT_EXPORT_NAME), and that servers should
reject invalid flags with EINVAL.  We were silently ignoring the
flags instead.  The client can't rely on our behavior, since it is
their fault for passing the bad flag in the first place, but it's
better to be robust up front than to possibly behave differently
than the client was expecting with the attempted flag.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alex Bligh <alex@alex.org.uk>
Message-Id: <1463006384-7734-6-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16 18:39:05 +02:00
Eric Blake 29b6c3b319 nbd: Improve server handling of bogus commands
We have a few bugs in how we handle invalid client commands:

- A client can send an NBD_CMD_DISC where from + len overflows,
convincing us to reply with an error and stay connected, even
though the protocol requires us to silently disconnect. Fix by
hoisting the special case sooner.

- A client can send an NBD_CMD_WRITE where from + len overflows,
where we reply to the client with EINVAL without consuming the
payload; this will normally cause us to fail if the next thing
read is not the right magic, but in rare cases, could cause us
to interpret the data payload as valid commands and do things
not requested by the client. Fix by adding a complete flag to
track whether we are in sync or must disconnect.

Furthermore, we have split the checks for bogus from/len across
two functions, when it is easier to do it all at once.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1463006384-7734-5-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16 18:39:05 +02:00
Eric Blake 63d5ef869e nbd: Quit server after any write error
We should never ignore failure from nbd_negotiate_send_rep(); if
we are unable to write to the client, then it is not worth trying
to continue the negotiation.  Fortunately, the problem is not
too severe - chances are that the errors being ignored here (mainly
inability to write the reply to the client) are indications of
a closed connection or something similar, which will also affect
the next attempt to interact with the client and eventually reach
a point where the errors are detected to end the loop.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1463006384-7734-4-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16 18:39:05 +02:00
Eric Blake 2cb347493c nbd: More debug typo fixes, use correct formats
Clean up some debug message oddities missed earlier; this includes
some typos, and recognizing that %d is not necessarily compatible
with uint32_t. Also add a couple messages that I found useful
while debugging things.

Signed-off-by: Eric Blake <eblake@redhat.com>

Message-Id: <1463006384-7734-3-git-send-email-eblake@redhat.com>
[Do not use PRIx16, clang complains. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16 18:39:05 +02:00
Eric Blake a0c303693e nbd: Use BDRV_REQ_FUA for better FUA where supported
Rather than always flushing ourselves, let the block layer
forward the FUA on to the underlying device - where all
underlying layers also understand FUA, we are now more
efficient; and where any underlying layer doesn't understand
it, now the block layer takes care of the full flush fallback
on our behalf.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1463006384-7734-2-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16 18:39:04 +02:00
Richard W.M. Jones 37146e7eaf vl.c: Add '-L help' which lists data dirs.
QEMU compiles a list of data directories from various sources.  When
consuming a QEMU binary it's useful to be able to get this list of
data directories: a primary reason is so you can list what BIOSes or
keymaps ship with this version of QEMU.  However without reproducing
the method that QEMU uses internally, it's not possible to get the
list of data directories.

This commit adds a simple '-L help' option that just lists out the
data directories as qemu calculates them:

$ ./x86_64-softmmu/qemu-system-x86_64 -L help
/home/rjones/d/qemu/pc-bios
/usr/local/share/qemu

$ ./x86_64-softmmu/qemu-system-x86_64 -L /tmp -L help
/tmp
/home/rjones/d/qemu/pc-bios
/usr/local/share/qemu

Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1463416475-11728-2-git-send-email-rjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16 18:39:04 +02:00
Greg Kurz f31e326637 KVM: use KVM_CAP_MAX_VCPU_ID
As stated in linux/Documentation/virtual/kvm/api.txt:

The maximum possible value for max_vcpu_id can be retrieved using the
KVM_CAP_MAX_VCPU_ID of the KVM_CHECK_EXTENSION ioctl() at run-time.

If the KVM_CAP_MAX_VCPU_ID does not exist, you should assume that
max_vcpu_id is the same as the value returned from KVM_CAP_MAX_VCPUS.

Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Message-Id: <146424974323.5666.5471538288045048119.stgit@bahia.huguette.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16 18:39:04 +02:00
Thomas Huth 142c21455b scsi-disk: Use (unsigned long) typecasts when using "%lu" format string
Some source code analyzers like cppcheck spill out a warning if
the sign of the argument does not match the format string.

Ticket: https://bugs.launchpad.net/qemu/+bug/1589564
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1465805418-15906-1-git-send-email-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16 18:39:04 +02:00
Chao Peng 494e95e910 target-i386: kvm: cache KVM_GET_SUPPORTED_CPUID data
KVM_GET_SUPPORTED_CPUID ioctl is called frequently when initializing
CPU. Depends on CPU features and CPU count, the number of calls can be
extremely high which slows down QEMU booting significantly. In our
testing, we saw 5922 calls with switches:

    -cpu SandyBridge -smp 6,sockets=6,cores=1,threads=1

This ioctl takes more than 100ms, which is almost half of the total
QEMU startup time.

While for most cases the data returned from two different invocations
are not changed, that means, we can cache the data to avoid trapping
into kernel for the second time. To make sure the cache safe one
assumption is desirable: the ioctl is stateless. This is not true for
CPUID leaves in general (such as CPUID leaf 0xD, whose value depends
on guest XCR0 and IA32_XSS) but it is true of KVM_GET_SUPPORTED_CPUID,
which runs before there is a value for XCR0 and IA32_XSS.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Message-Id: <1465784487-23482-1-git-send-email-chao.p.peng@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16 18:39:04 +02:00
Paolo Bonzini 56af2dda98 nbd: simplify the nbd_request and nbd_reply structs
These structs are never used to represent the bytes that go over the
network.  The big-endian network data is built into a uint8_t array
in nbd_{receive,send}_{request,reply}.  Remove the unused magic field,
reorder the struct to avoid holes, and remove the packed attribute.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16 18:39:04 +02:00
Peter Maydell f6be672084 nbd: Don't use cpu_to_*w() functions
The cpu_to_*w() functions just compose a pointer dereference
with a byteswap. Instead use st*_p(), which handles potential
pointer misalignment and avoids the need to cast the pointer.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <1465575342-12146-1-git-send-email-peter.maydell@linaro.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16 18:39:04 +02:00
Peter Maydell 773dce3c72 nbd: Don't use *_to_cpup() functions
The *_to_cpup() functions are not very useful, as they simply do
a pointer dereference and then a *_to_cpu(). Instead use either:
 * ld*_*_p(), if the data is at an address that might not be
   correctly aligned for the load
 * a local dereference and *_to_cpu(), if the pointer is
   the correct type and known to be correctly aligned

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <1465570836-22211-1-git-send-email-peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16 18:39:04 +02:00
Thomas Huth 0fb2331254 configure: Remove unused CONFIG_SIGEV_THREAD_ID switch
The CONFIG_SIGEV_THREAD_ID switch is unused since the related code
has been removed by commit 6d32717155
("aio / timers: Remove alarm timers"), so it can safely be removed
nowadays.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1465571084-19885-1-git-send-email-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16 18:39:04 +02:00
Dr. David Alan Gilbert 4fb8320a2e avx2 configure: Use primitives in test
Use the avx2 primitives during the test, thus making sure that the
compiler and assembler could actually use avx2.

This also detects the failure case on gcc 4.8.x with -save-temps
and avoids the need for the gcc version check in cutils.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <1465557378-24105-3-git-send-email-dgilbert@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16 18:39:04 +02:00
Dr. David Alan Gilbert fc6e1de9d8 Make avx2 configure test work with -O2
When configured with --extra-cflags=-O2 gcc optimised out the test
and the readelf failed the check leaving avx2 disabled.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <1465557378-24105-2-git-send-email-dgilbert@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16 18:39:04 +02:00
Sergey Fedorov ac99c624c6 Makefile: Fix tag file generation targets
"ctags" produces a file named "tags", not "ctags". It doesn't look
reasonable to use phony target name as a file name to remove. Just use
exact file names to remove in "ctags" and "TAGS" target receipts.

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Message-Id: <1465495115-24665-1-git-send-email-sergey.fedorov@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16 18:39:04 +02:00
Thomas Huth e4650c81b3 configure: Enable -Werror for MinGW builds, too
MinGW seems to compile currently without warnings, so it should
be safe to enable -Werror now for this environment, too.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1465373606-18486-1-git-send-email-thuth@redhat.com>
Tested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16 18:39:04 +02:00
Paolo Bonzini e9abfcb57f clean-includes: run it once more
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16 18:39:03 +02:00
Paolo Bonzini 02d0e09503 os-posix: include sys/mman.h
qemu/osdep.h checks whether MAP_ANONYMOUS is defined, but this check
is bogus without a previous inclusion of sys/mman.h.  Include it in
sysemu/os-posix.h and remove it from everywhere else.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16 18:39:03 +02:00
Thomas Huth 89266923df configure: Remove unused CONFIG_ZERO_MALLOC setting
CONFIG_ZERO_MALLOC was only used in qemu-malloc.c and
this file has been removed with the following commit:

	41a748265f
	Remove qemu_malloc/qemu_free

So we don't need this configuration setting anymore.
This patch also removes the z_version variable, since
this is now also not needed anymore.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Message-Id: <1465398683-3152-1-git-send-email-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16 18:32:35 +02:00
Peter Maydell dc278c58fa Block layer patches
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJXYrE3AAoJEH8JsnLIjy/WYAgP/0EG2PZ4YmxbbN9H9Z2jn7Zc
 KCXgYnM3IkL7hcKPqT4UqRDcMlEUt2RqRJYcybLWMuIb36h5SF+Hz7z3lSV92oPC
 0n719DLp325+jsCcMm4kWT69lDMOCd7Xj69zjtgpu6eZgf1zpRZlWDWoZ1XphvC7
 jnXxjXnS9JUbzND2Bq0YpIo24qatHifsuh7h2We3kVDTEEnwyK2og9cWiWNnlfU8
 dC98sgaDMCo0BHbxvraFGDS56Hmh9Uh2uNzZ9J+g/kQyv4ySZcGxarsmoFX0mqXY
 jkYLrStTAXixdIRMo3mRNItXn7sXwBA1z+4DqO+FY+6wlYb6NX/5/1Lpdnb3ph/W
 HxJa+tc+aUocbPiT4eODbSxlRweyCU2TSPxv/36wuOyBh//S1zixbrMtJK1VZsXB
 5wt2sGPi6seYRFG3ywolxuB2OE1eKhxJhmVwyVbq5lWF5anuGaRT1yyUYQGq9LR9
 QKp1nzCt3UqhEg/k3qOrcVLPB4X/m2R09M3qdUjbtNTiRVKHSgdJWFgMwsTXXu1A
 fLIX6wSe0nSVPh8QMHWYlXHR1RZMUP5IYNPIOoGgBsaERjr3ewDsyRvYRa6DFdtw
 7To3CfasFyeXss8Uva8Cc+Iv4fKX89upCnNIuTk+YWuikreInrblFvWHQ1CJ65XQ
 eEn0T7XfTcZwR50z7YEk
 =oSJo
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches

# gpg: Signature made Thu 16 Jun 2016 15:01:27 BST
# gpg:                using RSA key 0x7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream: (39 commits)
  hbitmap: add 'pos < size' asserts
  iotests: Add test for oVirt-like storage migration
  iotests: Add test for post-mirror backing chains
  block/null: Implement bdrv_refresh_filename()
  block/mirror: Fix target backing BDS
  block: Allow replacement of a BDS by its overlay
  rbd:change error_setg() to error_setg_errno()
  iotests: 095: Clean up QEMU before showing image info
  block: Create the commit block job before reopening any image
  block: Prevent sleeping jobs from resuming if they have been paused
  block: use the block job list in qmp_query_block_jobs()
  block: use the block job list in bdrv_drain_all()
  block: Fix snapshot=on with aio=native
  block: Remove bs->zero_beyond_eof
  qcow2: Let vmstate call qcow2_co_preadv/pwrite directly
  block: Make bdrv_load/save_vmstate coroutine_fns
  block: Allow .bdrv_load/save_vmstate() to return 0/-errno
  block: Make .bdrv_load_vmstate() vectored
  block: Introduce bdrv_preadv()
  doc: Fix mailing list address in tests/qemu-iotests/README
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-06-16 15:22:56 +01:00
Kevin Wolf 60251f4d3e Block patches
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJXYqffAAoJEDuxQgLoOKytiuoH+wWkxRsuRvuNZf2feQOyyznd
 XJdycJKNnJp5PscryaHqJzc1tAapEKDE257URYkXI+hF7Vue1r6jNrfgfR6AXysK
 gVfJ0BbELYWly7ID04Q8C9P1RUmEjbYqQRnB7nua33wq9P/92RIR373p/kGVJBix
 RM4e+xYfvGYOgNODF9jJKw4R5Sw2ZVmchWlwjcYjyRW8gOiS8OaFwX7FIB3+kj+P
 ew4hsZkZmK8uroMmfC3Oe5iZfvLXzKBaMT89XiL6lUXhDizYvSkPOJoIyLrfeQ3e
 5AAv0AnQhrSfG2YNjOA3SsFiIIUEjLf8jr05Cr0YLXqr4OHk3Zoc7vsKDnY3ai8=
 =QRX6
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'mreitz/tags/pull-block-for-kevin-2016-06-16' into queue-block

Block patches

# gpg: Signature made Thu Jun 16 15:21:35 2016 CEST
# gpg:                using RSA key 0x3BB14202E838ACAD
# gpg: Good signature from "Max Reitz <mreitz@redhat.com>"
# Primary key fingerprint: 91BE B60A 30DB 3E88 57D1  1829 F407 DB00 61D5 CF40
#      Subkey fingerprint: 58B3 81CE 2DC8 9CF9 9730  EE64 3BB1 4202 E838 ACAD

* mreitz/tags/pull-block-for-kevin-2016-06-16:
  hbitmap: add 'pos < size' asserts
  iotests: Add test for oVirt-like storage migration
  iotests: Add test for post-mirror backing chains
  block/null: Implement bdrv_refresh_filename()
  block/mirror: Fix target backing BDS
  block: Allow replacement of a BDS by its overlay
  rbd:change error_setg() to error_setg_errno()
  iotests: 095: Clean up QEMU before showing image info
  block: Create the commit block job before reopening any image
  block: Prevent sleeping jobs from resuming if they have been paused
  block: use the block job list in qmp_query_block_jobs()
  block: use the block job list in bdrv_drain_all()

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-06-16 15:22:18 +02:00
Vladimir Sementsov-Ogievskiy 0e32119122 hbitmap: add 'pos < size' asserts
For now, fail in hbitmap_set on start + count > size will come from
hbitmap_set
  hb_count_between
    hbitmap_iter_init
      assert(pos < hb->size)

This patch adds such checks to set/get/reset functions of hbitmap.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 1465924093-76875-2-git-send-email-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-06-16 15:20:37 +02:00
Max Reitz 3dd48fdc55 iotests: Add test for oVirt-like storage migration
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20160610185750.30956-6-mreitz@redhat.com
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-06-16 15:20:37 +02:00
Max Reitz 298c6009dc iotests: Add test for post-mirror backing chains
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20160610185750.30956-5-mreitz@redhat.com
Reviewed-by: Fam Zheng <famz@redhat.com>
[mreitz@redhat.com: Removed unnecessary imports]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-06-16 15:20:37 +02:00
Max Reitz 67882b1535 block/null: Implement bdrv_refresh_filename()
The null block driver ignores any filename used for creating its BDSs,
which allows creating such BDSs even without any filename at all. In
that case, we currently construct a JSON filename when queried instead
of a plain "null-co://" or "null-aio://". This patch implements
bdrv_refresh_filename() to remedy this behavior.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20160610185750.30956-4-mreitz@redhat.com
[mreitz@redhat.com: Added commit message]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-06-16 15:20:37 +02:00
Max Reitz 274fccee2b block/mirror: Fix target backing BDS
Currently, we are trying to move the backing BDS from the source to the
target in bdrv_replace_in_backing_chain() which is called from
mirror_exit(). However, mirror_complete() already tries to open the
target's backing chain with a call to bdrv_open_backing_file().

First, we should only set the target's backing BDS once. Second, the
mirroring block job has a better idea of what to set it to than the
generic code in bdrv_replace_in_backing_chain() (in fact, the latter's
conditions on when to move the backing BDS from source to target are not
really correct).

Therefore, remove that code from bdrv_replace_in_backing_chain() and
leave it to mirror_complete().

Depending on what kind of mirroring is performed, we furthermore want to
use different strategies to open the target's backing chain:

- If blockdev-mirror is used, we can assume the user made sure that the
  target already has the correct backing chain. In particular, we should
  not try to open a backing file if the target does not have any yet.

- If drive-mirror with mode=absolute-paths is used, we can and should
  reuse the already existing chain of nodes that the source BDS is in.
  In case of sync=full, no backing BDS is required; with sync=top, we
  just link the source's backing BDS to the target, and with sync=none,
  we use the source BDS as the target's backing BDS.
  We should not try to open these backing files anew because this would
  lead to two BDSs existing per physical file in the backing chain, and
  we would like to avoid such concurrent access.

- If drive-mirror with mode=existing is used, we have to use the
  information provided in the physical image file which means opening
  the target's backing chain completely anew, just as it has been done
  already.
  If the target's backing chain shares images with the source, this may
  lead to multiple BDSs per physical image file. But since we cannot
  reliably ascertain this case, there is nothing we can do about it.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20160610185750.30956-3-mreitz@redhat.com
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-06-16 15:20:37 +02:00
Max Reitz 9bd910e2cb block: Allow replacement of a BDS by its overlay
change_parent_backing_link() asserts that the BDS to be replaced is not
used as a backing file. However, we may want to replace a BDS by its
overlay in which case that very link should not be redirected.

For instance, when doing a sync=none drive-mirror operation, we may have
the following BDS/BB forest before block job completion:

  target

  base <- source <- BlockBackend

During job completion, we want to establish the source BDS as the
target's backing node:

          target
            |
            v
  base <- source <- BlockBackend

This makes the target a valid replacement for the source:

          target <- BlockBackend
            |
            v
  base <- source

Without this modification to change_parent_backing_link() we have to
inject the target into the graph before the source is its backing node,
thus temporarily creating a wrong graph:

  target <- BlockBackend

  base <- source

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20160610185750.30956-2-mreitz@redhat.com
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-06-16 15:20:37 +02:00
Vikhyat Umrao 87cd3d20e1 rbd:change error_setg() to error_setg_errno()
Ceph RBD block driver does not use error_setg_errno() where
it is possible to use. This patch replaces error_setg()
from error_setg_errno().

Signed-off-by: Vikhyat Umrao <vumrao@redhat.com>
Message-id: 1462780319-5796-1-git-send-email-vumrao@redhat.com
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-06-16 15:20:37 +02:00
Fam Zheng 6ea66b590c iotests: 095: Clean up QEMU before showing image info
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1464944872-24484-1-git-send-email-famz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-06-16 15:20:37 +02:00
Alberto Garcia 834fe28ddf block: Create the commit block job before reopening any image
If the base or overlay images need to be reopened in read-write mode
but the block_job_create() call fails then no one will put those
images back in read-only mode.

We can solve this problem easily by calling block_job_create() first.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: aa495045770a6f1a7cc5d408397a17c75097fdd8.1464346103.git.berto@igalia.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-06-16 15:20:37 +02:00
Alberto Garcia 0824afda0c block: Prevent sleeping jobs from resuming if they have been paused
If we pause a block job and drain its BlockDriverState we want that
the job remains inactive until we call block_job_resume() again.

However if we pause the job while it is sleeping then it will resume
when the sleep timer fires.

This patch prevents that from happening by checking if the job has
been paused after it comes back from sleeping.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 3d9011151512326b890d22bdab3530244ef349d7.1464346103.git.berto@igalia.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-06-16 15:20:37 +02:00
Alberto Garcia f0f55deda2 block: use the block job list in qmp_query_block_jobs()
qmp_query_block_jobs() uses bdrv_next() to look for block jobs, but
this function can only find those in top-level BlockDriverStates.

This patch uses block_job_next() instead.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: a8b7e5497b7c1fa67c12fcceae1630d01c3b1f96.1464346103.git.berto@igalia.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-06-16 15:20:37 +02:00
Alberto Garcia eb1364ceac block: use the block job list in bdrv_drain_all()
bdrv_drain_all() pauses all block jobs by using bdrv_next() to iterate
over all top-level BlockDriverStates. Therefore the code is unable to
find block jobs in other nodes.

This patch uses block_job_next() to iterate over all block jobs.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 55ee7d7d4a65c28aa1a1b28823897ef326f328e2.1464346103.git.berto@igalia.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-06-16 15:20:37 +02:00
Kevin Wolf 418690447a block: Fix snapshot=on with aio=native
snapshot=on creates a temporary overlay that is always opened with
cache=unsafe (the cache mode specified by the user is only for the
actual image file and its children). This means that we must not inherit
the BDRV_O_NATIVE_AIO flag for the temporary overlay because trying to
use Linux AIO with cache=unsafe results in an error.

Reproducer without this patch:

$ x86_64-softmmu/qemu-system-x86_64 -drive file=/tmp/test.qcow2,cache=none,aio=native,snapshot=on
qemu-system-x86_64: -drive file=/tmp/test.qcow2,cache=none,aio=native,snapshot=on: aio=native was
specified, but it requires cache.direct=on, which was not specified.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-06-16 15:19:56 +02:00
Kevin Wolf c9d20029f4 block: Remove bs->zero_beyond_eof
It is always true for open images now.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-06-16 15:19:56 +02:00
Kevin Wolf 734a77584a qcow2: Let vmstate call qcow2_co_preadv/pwrite directly
We don't really want to go through the block layer in order to read from
or write to the vmstate in a qcow2 image. Doing so required a few ugly
hacks like saving and restoring the old image size (because writing to
vmstate offsets would increase the image size) or disabling the "reads
after EOF = zeroes" logic. When calling the right functions directly,
these hacks aren't necessary any more.

Note that .bdrv_vmstate_load/save() return 0 instead of the number of
bytes in case of success now.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-06-16 15:19:56 +02:00
Kevin Wolf 1a8ae82217 block: Make bdrv_load/save_vmstate coroutine_fns
This allows drivers to share code between normal I/O and vmstate
accesses.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-06-16 15:19:56 +02:00
Kevin Wolf b433d9424d block: Allow .bdrv_load/save_vmstate() to return 0/-errno
The return value of .bdrv_load/save_vmstate() can be any non-negative
number in case of success now. It used to be bytes/-errno.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-06-16 15:19:55 +02:00
Kevin Wolf 5ddda0b8f0 block: Make .bdrv_load_vmstate() vectored
This brings it in line with .bdrv_save_vmstate().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-06-16 15:19:55 +02:00
Kevin Wolf f1e8474115 block: Introduce bdrv_preadv()
We already have a byte-based bdrv_pwritev(), but the read counterpart
was still missing. This commit adds it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-06-16 15:19:55 +02:00
Thomas Huth 48bea96572 doc: Fix mailing list address in tests/qemu-iotests/README
The address of the mailing list is qemu-devel@nongnu.org
instead of qemu-devel@savannah.nongnu.org. And while we're
at it, also mention the qemu-block mailing list here.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-06-16 15:19:55 +02:00
Kevin Wolf ccb9dc1012 linux-aio: Cancel BH if not needed
linux-aio uses a BH in order to make sure that the remaining completions
are processed even in nested event loops of completion callbacks in
order to avoid deadlocks.

There is no need, however, to have the BH overhead for the first call
into qemu_laio_completion_bh() or after all pending completions have
already been processed. Therefore, this patch calls directly into
qemu_laio_completion_bh() in qemu_laio_completion_cb() and cancels
the BH after qemu_laio_completion_bh() has processed all pending
completions.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16 15:19:55 +02:00
Kevin Wolf 23b0d9fb1d block: Don't enforce 512 byte minimum alignment
If block drivers say that they can do an alignment < 512 bytes, let's
just suppose they mean it. raw-posix used to be an offender with respect
to this, but it can actually deal with byte-aligned requests now.

The default is still 512 bytes for any drivers that only implement
sector-based interfaces, but it is 1 now for drivers that implement
.bdrv_co_preadv.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-06-16 15:19:55 +02:00
Kevin Wolf 9d52aa3c38 raw-posix: Implement .bdrv_co_preadv/pwritev
The raw-posix block driver actually supports byte-aligned requests now
on non-O_DIRECT images, like it already (and previously incorrectly)
claimed in bs->request_alignment.

For some block drivers this means that a RMW cycle can be avoided when
they write sub-sector metadata e.g. for cluster allocation.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-06-16 15:19:55 +02:00
Kevin Wolf 2174f12bde raw-posix: Switch to bdrv_co_* interfaces
In order to use the modern byte-based .bdrv_co_preadv/pwritev()
interface, this patch switches raw-posix to coroutine-based interfaces
as a first step. In terms of semantics and performance, it doesn't make
a difference with the existing code whether we go from a coroutine to a
callback-based interface already in block/io.c or only in linux-aio.c

As there have been concerns in the past that this change may be a step
in the wrong direction with respect to a possible AIO fast path, the
old callback-based interface for linux-aio is left around and can be
reactivated when a fast path (e.g. directly from virtio-blk dataplane,
bypassing the whole block layer) is implemented.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-06-16 15:19:55 +02:00
Kevin Wolf 9896c8765f block: Prepare bdrv_aligned_pwritev() for byte-aligned requests
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-06-16 15:19:55 +02:00