Commit Graph

4033 Commits

Author SHA1 Message Date
Eric Blake
3a7404b31e vhdx: Switch to byte-based calls
We are gradually moving away from sector-based interfaces, towards
byte-based.  Make the change for the last few sector-based calls
into the block layer from the vhdx driver.

Ideally, the vhdx driver should switch to doing everything
byte-based, but that's a more invasive change that requires a
bit more auditing.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-29 14:20:56 +02:00
Eric Blake
04a11d87d1 replication: Switch to byte-based calls
We are gradually moving away from sector-based interfaces, towards
byte-based.  Make the change for the last few sector-based calls
into the block layer from the replication driver.

Ideally, the replication driver should switch to doing everything
byte-based, but that's a more invasive change that requires a
bit more auditing.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-29 14:20:56 +02:00
Eric Blake
609841a3f8 qcow: Switch to a byte-based driver
We are gradually moving away from sector-based interfaces, towards
byte-based.  The qcow driver is now ready to fully utilize the
byte-based callback interface, as long as we override the default
alignment to still be 512 (needed at least for asserts present
because of encryption, but easier to do everywhere than to audit
which sub-sector requests are handled correctly, especially since
we no longer recommend qcow for new disk images).

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-29 14:20:56 +02:00
Eric Blake
d1326a786d qcow: Switch qcow_co_writev to byte-based calls
We are gradually moving away from sector-based interfaces, towards
byte-based.  Make the change for the internals of the qcow
driver write function, by iterating over offset/bytes instead of
sector_num/nb_sectors, and with a rename of index_in_cluster and
repurposing of n to track bytes instead of sectors.

A later patch will then switch the qcow driver as a whole over
to byte-based operation.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-29 14:20:56 +02:00
Eric Blake
a15312b017 qcow: Switch qcow_co_readv to byte-based calls
We are gradually moving away from sector-based interfaces, towards
byte-based.  Make the change for the internals of the qcow
driver read function, by iterating over offset/bytes instead of
sector_num/nb_sectors, and with a rename of index_in_cluster and
repurposing of n to track bytes instead of sectors.

A later patch will then switch the qcow driver as a whole over
to byte-based operation.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-29 14:20:56 +02:00
Eric Blake
787993a543 qcow: Switch get_cluster_offset to be byte-based
We are gradually moving away from sector-based interfaces, towards
byte-based.  Make the change for the internal helper function
get_cluster_offset(), by changing n_start and n_end to be byte
offsets rather than sector indices within the cluster being
allocated.  However, assert that these values are still
sector-aligned (at least qcrypto_block_encrypt() still wants that).
For now we get that alignment for free because we still use
sector-based driver callbacks.

A later patch will then switch the qcow driver as a whole over
to byte-based operation; but will still leave things at sector
alignments as it is not worth auditing the qcow image format
to worry about sub-sector requests.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-29 14:20:56 +02:00
Eric Blake
d08c2a245f parallels: Switch to byte-based calls
We are gradually moving away from sector-based interfaces, towards
byte-based.  Make the change for the last few sector-based calls
into the block layer from the parallels driver.

Ideally, the parallels driver should switch to doing everything
byte-based, but that's a more invasive change that requires a
bit more auditing.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-29 14:20:56 +02:00
Fam Zheng
c436e3d014 file-posix: Fix EINTR handling
EINTR should be checked against errno, not ret. While fixing the bug,
collect the branches with a switch block.

Also, change the return value from -ENOSTUP to -ENOSPC when the actual
issue is request range passes EOF, which should be distinguishable from
the case of error == ENOSYS by the caller, so that it could still retry
with other byte ranges, whereas it shouldn't retry anymore upon ENOSYS.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-29 14:20:56 +02:00
Fam Zheng
1439b9c110 iscsi: Don't blindly use designator length in response for memcpy
Per SCSI definition the designator_length we receive from INQUIRY is 8,
12 or at most 16, but we should be careful because the remote iscsi
target may misbehave, otherwise we could have a buffer overflow.

Reported-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-29 14:20:56 +02:00
Fam Zheng
e06f4639d8 qcow2: Fix src_offset in copy offloading
Not updating src_offset will result in wrong data being written to dst
image.

Reported-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-29 14:20:56 +02:00
Kevin Wolf
33d70fb6fa file-posix: Implement co versions of discard/flush
This simplifies file-posix by implementing the coroutine variants of
the discard and flush BlockDriver callbacks. These were the last
remaining users of paio_submit(), which can be removed now.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2018-06-29 14:20:56 +02:00
Kevin Wolf
8b24cd1415 qcow2: Free allocated clusters on write error
If we managed to allocate the clusters, but then failed to write the
data, there's a good chance that we'll still be able to free the
clusters again in order to avoid cluster leaks (the refcounts are
cached, so even if we can't write them out right now, we may be able to
do so when the VM is resumed after a werror=stop/enospc pause).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Tested-by: Eric Blake <eblake@redhat.com>
2018-06-29 14:20:56 +02:00
Markus Armbruster
796d323945 block/crypto: Simplify block_crypto_{open,create}_opts_init()
block_crypto_open_opts_init() and block_crypto_create_opts_init()
contain a virtual visit of QCryptoBlockOptions and
QCryptoBlockCreateOptions less member "format", respectively.

Change their callers to put member "format" in the QDict, so they can
use the generated visitors for these types instead.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-29 14:20:56 +02:00
Fam Zheng
37aec7d75e block: Move request tracking to children in copy offloading
in_flight and tracked requests need to be tracked in every layer during
recursion. For now the only user is qemu-img convert where overlapping
requests and IOThreads don't exist, therefore this change doesn't make
much difference form user point of view, but it is incorrect as part of
the API. Fix it.

Reported-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-29 14:20:56 +02:00
Fam Zheng
354d930dc6 qcow2: Remove dead check on !ret
In the beginning of the function, we initialize the local variable to 0,
and in the body of the function, we check the assigned values and exit
the loop immediately. So here it can never be non-zero.

Reported-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-29 14:20:56 +02:00
Kevin Wolf
93f4e2ff4b file-posix: Make .bdrv_co_truncate asynchronous
This moves the code to resize an image file to the thread pool to avoid
blocking.

Creating large images with preallocation with blockdev-create is now
actually a background job instead of blocking the monitor (and most
other things) until the preallocation has completed.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-29 14:20:56 +02:00
Kevin Wolf
1bc5f09f2e block: Use tracked request for truncate
When growing an image, block drivers (especially protocol drivers) may
initialise the newly added area. I/O requests to the same area need to
wait for this initialisation to be completed so that data writes don't
get overwritten and reads don't read uninitialised data.

To avoid overhead in the fast I/O path by adding new locking in the
protocol drivers and to restrict the impact to requests that actually
touch the new area, reuse the existing tracked request infrastructure in
block/io.c and mark all discard requests as serialising.

With this change, it is safe for protocol drivers to make
.bdrv_co_truncate actually asynchronous.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-29 14:20:56 +02:00
Kevin Wolf
3d9f2d2af6 block: Move bdrv_truncate() implementation to io.c
This moves the bdrv_truncate() implementation from block.c to block/io.c
so it can have access to the tracked requests infrastructure.

This involves making refresh_total_sectors() public (in block_int.h).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-29 14:20:56 +02:00
Kevin Wolf
47e86b868d qcow2: Remove coroutine trampoline for preallocate_co()
All callers are coroutine_fns now, so we can just directly call
preallocate_co().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-29 14:20:56 +02:00
Kevin Wolf
061ca8a368 block: Convert .bdrv_truncate callback to coroutine_fn
bdrv_truncate() is an operation that can block (even for a quite long
time, depending on the PreallocMode) in I/O paths that shouldn't block.
Convert it to a coroutine_fn so that we have the infrastructure for
drivers to make their .bdrv_co_truncate implementation asynchronous.

This change could potentially introduce new race conditions because
bdrv_truncate() isn't necessarily executed atomically any more. Whether
this is a problem needs to be evaluated for each block driver that
supports truncate:

* file-posix/win32, gluster, iscsi, nfs, rbd, ssh, sheepdog: The
  protocol drivers are trivially safe because they don't actually yield
  yet, so there is no change in behaviour.

* copy-on-read, crypto, raw-format: Essentially just filter drivers that
  pass the request to a child node, no problem.

* qcow2: The implementation modifies metadata, so it needs to hold
  s->lock to be safe with concurrent I/O requests. In order to avoid
  double locking, this requires pulling the locking out into
  preallocate_co() and using qcow2_write_caches() instead of
  bdrv_flush().

* qed: Does a single header update, this is fine without locking.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-29 14:20:56 +02:00
Kevin Wolf
ae5475e82f qcow2: Fix qcow2_truncate() error return value
If qcow2_alloc_clusters_at() returns an error, we do need to negate it
to get back the positive errno code for error_setg_errno(), but we still
need to return the negative error code.

Fixes: 772d1f973f
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-29 14:20:56 +02:00
Markus Armbruster
e6af90f3fb block/crypto: Pacify Coverity after commit f853465aac
Coverity can't see that qobject_input_visitor_new_flat_confused()
returns non-null when it doesn't set @local_err.  Check the return
value instead, like all the other callers do.

Fixes: CID 1393615
Fixes: CID 1393616
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-29 14:20:56 +02:00
Nishanth Aravamudan
ed6e216171 linux-aio: properly bubble up errors from initialization
laio_init() can fail for a couple of reasons, which will lead to a NULL
pointer dereference in laio_attach_aio_context().

To solve this, add a aio_setup_linux_aio() function which is called
early in raw_open_common. If this fails, propagate the error up. The
signature of aio_get_linux_aio() was not modified, because it seems
preferable to return the actual errno from the possible failing
initialization calls.

Additionally, when the AioContext changes, we need to associate a
LinuxAioState with the new AioContext. Use the bdrv_attach_aio_context
callback and call the new aio_setup_linux_aio(), which will allocate a
new AioContext if needed, and return errors on failures. If it fails for
any reason, fallback to threaded AIO with an error message, as the
device is already in-use by the guest.

Add an assert that aio_get_linux_aio() cannot return NULL.

Signed-off-by: Nishanth Aravamudan <naravamudan@digitalocean.com>
Message-id: 20180622193700.6523-1-naravamudan@digitalocean.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-27 13:06:34 +01:00
Anton Nefedov
29cd0403f1 qapi: remove empty flat union branches and types
Flat unions may now have uncovered branches, so it is possible to get rid
of empty types defined for that purpose only.

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1529311206-76847-3-git-send-email-anton.nefedov@virtuozzo.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2018-06-22 16:33:46 +02:00
Max Reitz
481debaa32 block/mirror: Add copy mode QAPI interface
This patch allows the user to specify whether to use active or only
background mode for mirror block jobs.  Currently, this setting will
remain constant for the duration of the entire block job.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20180613181823.13618-14-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-18 17:05:16 +02:00
Max Reitz
d06107ade0 block/mirror: Add active mirroring
This patch implements active synchronous mirroring.  In active mode, the
passive mechanism will still be in place and is used to copy all
initially dirty clusters off the source disk; but every write request
will write data both to the source and the target disk, so the source
cannot be dirtied faster than data is mirrored to the target.  Also,
once the block job has converged (BLOCK_JOB_READY sent), source and
target are guaranteed to stay in sync (unless an error occurs).

Active mode is completely optional and currently disabled at runtime.  A
later patch will add a way for users to enable it.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20180613181823.13618-13-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-18 17:05:15 +02:00
Max Reitz
429076e88d block/mirror: Add MirrorBDSOpaque
This will allow us to access the block job data when the mirror block
driver becomes more complex.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20180613181823.13618-11-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-18 17:04:59 +02:00
Max Reitz
72d10a9421 block/dirty-bitmap: Add bdrv_dirty_iter_next_area
This new function allows to look for a consecutively dirty area in a
dirty bitmap.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20180613181823.13618-10-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-18 17:04:57 +02:00
Max Reitz
a33fbb4f8b hbitmap: Add @advance param to hbitmap_iter_next()
This new parameter allows the caller to just query the next dirty
position without moving the iterator.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20180613181823.13618-8-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-18 17:04:55 +02:00
Max Reitz
138f9fffb8 block/mirror: Use source as a BdrvChild
With this, the mirror_top_bs is no longer just a technically required
node in the BDS graph but actually represents the block job operation.

Also, drop MirrorBlockJob.source, as we can reach it through
mirror_top_bs->backing.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20180613181823.13618-6-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-18 17:04:54 +02:00
Max Reitz
1181e19a6d block/mirror: Wait for in-flight op conflicts
This patch makes the mirror code differentiate between simply waiting
for any operation to complete (mirror_wait_for_free_in_flight_slot())
and specifically waiting for all operations touching a certain range of
the virtual disk to complete (mirror_wait_on_conflicts()).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20180613181823.13618-5-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-18 17:04:53 +02:00
Max Reitz
12aa40822d block/mirror: Use CoQueue to wait on in-flight ops
Attach a CoQueue to each in-flight operation so if we need to wait for
any we can use it to wait instead of just blindly yielding and hoping
for some operation to wake us.

A later patch will use this infrastructure to allow requests accessing
the same area of the virtual disk to specifically wait for each other.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20180613181823.13618-4-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-18 17:04:52 +02:00
Max Reitz
2e1990b26e block/mirror: Convert to coroutines
In order to talk to the source BDS (and maybe in the future to the
target BDS as well) directly, we need to convert our existing AIO
requests into coroutine I/O requests.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20180613181823.13618-3-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-18 17:04:48 +02:00
Max Reitz
4295c5fc61 block/mirror: Pull out mirror_perform()
When converting mirror's I/O to coroutines, we are going to need a point
where these coroutines are created.  mirror_perform() is going to be
that point.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20180613181823.13618-2-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-18 17:04:43 +02:00
Greg Kurz
f45280cbf6 block: fix QEMU crash with scsi-hd and drive_del
Removing a drive with drive_del while it is being used to run an I/O
intensive workload can cause QEMU to crash.

An AIO flush can yield at some point:

blk_aio_flush_entry()
 blk_co_flush(blk)
  bdrv_co_flush(blk->root->bs)
   ...
    qemu_coroutine_yield()

and let the HMP command to run, free blk->root and give control
back to the AIO flush:

    hmp_drive_del()
     blk_remove_bs()
      bdrv_root_unref_child(blk->root)
       child_bs = blk->root->bs
       bdrv_detach_child(blk->root)
        bdrv_replace_child(blk->root, NULL)
         blk->root->bs = NULL
        g_free(blk->root) <============== blk->root becomes stale
       bdrv_unref(child_bs)
        bdrv_delete(child_bs)
         bdrv_close()
          bdrv_drained_begin()
           bdrv_do_drained_begin()
            bdrv_drain_recurse()
             aio_poll()
              ...
              qemu_coroutine_switch()

and the AIO flush completion ends up dereferencing blk->root:

  blk_aio_complete()
   scsi_aio_complete()
    blk_get_aio_context(blk)
     bs = blk_bs(blk)
 ie, bs = blk->root ? blk->root->bs : NULL
            ^^^^^
            stale

The problem is that we should avoid making block driver graph
changes while we have in-flight requests. Let's drain all I/O
for this BB before calling bdrv_root_unref_child().

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18 15:03:25 +02:00
Kevin Wolf
0f12264e7a block: Allow graph changes in bdrv_drain_all_begin/end sections
bdrv_drain_all_*() used bdrv_next() to iterate over all root nodes and
did a subtree drain for each of them. This works fine as long as the
graph is static, but sadly, reality looks different.

If the graph changes so that root nodes are added or removed, we would
have to compensate for this. bdrv_next() returns each root node only
once even if it's the root node for multiple BlockBackends or for a
monitor-owned block driver tree, which would only complicate things.

The much easier and more obviously correct way is to fundamentally
change the way the functions work: Iterate over all BlockDriverStates,
no matter who owns them, and drain them individually. Compensation is
only necessary when a new BDS is created inside a drain_all section.
Removal of a BDS doesn't require any action because it's gone afterwards
anyway.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18 15:03:25 +02:00
Kevin Wolf
6cd5c9d7b2 block: ignore_bds_parents parameter for drain functions
In the future, bdrv_drained_all_begin/end() will drain all invidiual
nodes separately rather than whole subtrees. This means that we don't
want to propagate the drain to all parents any more: If the parent is a
BDS, it will already be drained separately. Recursing to all parents is
unnecessary work and would make it an O(n²) operation.

Prepare the drain function for the changed drain_all by adding an
ignore_bds_parents parameter to the internal implementation that
prevents the propagation of the drain to BDS parents. We still (have to)
propagate it to non-BDS parents like BlockBackends or Jobs because those
are not drained separately.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18 15:03:25 +02:00
Kevin Wolf
c8ca33d06d block: Move bdrv_drain_all_begin() out of coroutine context
Before we can introduce a single polling loop for all nodes in
bdrv_drain_all_begin(), we must make sure to run it outside of coroutine
context like we already do for bdrv_do_drained_begin().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18 15:03:25 +02:00
Kevin Wolf
0109e7e6f8 block: Defer .bdrv_drain_begin callback to polling phase
We cannot allow aio_poll() in bdrv_drain_invoke(begin=true) until we're
done with propagating the drain through the graph and are doing the
single final BDRV_POLL_WHILE().

Just schedule the coroutine with the callback and increase bs->in_flight
to make sure that the polling phase will wait for it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18 15:03:25 +02:00
Kevin Wolf
dcf94a23b1 block: Don't poll in parent drain callbacks
bdrv_do_drained_begin() is only safe if we have a single
BDRV_POLL_WHILE() after quiescing all affected nodes. We cannot allow
that parent callbacks introduce a nested polling loop that could cause
graph changes while we're traversing the graph.

Split off bdrv_do_drained_begin_quiesce(), which only quiesces a single
node without waiting for its requests to complete. These requests will
be waited for in the BDRV_POLL_WHILE() call down the call chain.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18 15:03:25 +02:00
Kevin Wolf
fe4f0614ef block: Drain recursively with a single BDRV_POLL_WHILE()
Anything can happen inside BDRV_POLL_WHILE(), including graph
changes that may interfere with its callers (e.g. child list iteration
in recursive callers of bdrv_do_drained_begin).

Switch to a single BDRV_POLL_WHILE() call for the whole subtree at the
end of bdrv_do_drained_begin() to avoid such effects. The recursion
happens now inside the loop condition. As the graph can only change
between bdrv_drain_poll() calls, but not inside of it, doing the
recursion here is safe.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18 15:03:25 +02:00
Kevin Wolf
d30b8e64b7 block: Remove bdrv_drain_recurse()
For bdrv_drain(), recursively waiting for child node requests is
pointless because we didn't quiesce their parents, so new requests could
come in anyway. Letting the function work only on a single node makes it
more consistent.

For subtree drains and drain_all, we already have the recursion in
bdrv_do_drained_begin(), so the extra recursion doesn't add anything
either.

Remove the useless code.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-18 15:03:25 +02:00
Kevin Wolf
89bd030533 block: Really pause block jobs on drain
We already requested that block jobs be paused in .bdrv_drained_begin,
but no guarantee was made that the job was actually inactive at the
point where bdrv_drained_begin() returned.

This introduces a new callback BdrvChildRole.bdrv_drained_poll() and
uses it to make bdrv_drain_poll() consider block jobs using the node to
be drained.

For the test case to work as expected, we have to switch from
block_job_sleep_ns() to qemu_co_sleep_ns() so that the test job is even
considered active and must be waited for when draining the node.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18 15:03:25 +02:00
Kevin Wolf
1cc8e54ada block: Avoid unnecessary aio_poll() in AIO_WAIT_WHILE()
Commit 91af091f92 added an additional aio_poll() to BDRV_POLL_WHILE()
in order to make sure that all pending BHs are executed on drain. This
was the wrong place to make the fix, as it is useless overhead for all
other users of the macro and unnecessarily complicates the mechanism.

This patch effectively reverts said commit (the context has changed a
bit and the code has moved to AIO_WAIT_WHILE()) and instead polls in the
loop condition for drain.

The effect is probably hard to measure in any real-world use case
because actual I/O will dominate, but if I run only the initialisation
part of 'qemu-img convert' where it calls bdrv_block_status() for the
whole image to find out how much data there is copy, this phase actually
needs only roughly half the time after this patch.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-18 15:03:25 +02:00
Kevin Wolf
c13ad59f01 block: Don't manually poll in bdrv_drain_all()
All involved nodes are already idle, we called bdrv_do_drain_begin() on
them.

The comment in the code suggested that this was not correct because the
completion of a request on one node could spawn a new request on a
different node (which might have been drained before, so we wouldn't
drain the new request). In reality, new requests to different nodes
aren't spawned out of nothing, but only in the context of a parent
request, and they aren't submitted to random nodes, but only to child
nodes. As long as we still poll for the completion of the parent request
(which we do), draining each root node separately is good enough.

Remove the additional polling code from bdrv_drain_all_begin() and
replace it with an assertion that all nodes are already idle after we
drained them separately.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-18 15:03:25 +02:00
Kevin Wolf
7d40d9ef9d block: Remove 'recursive' parameter from bdrv_drain_invoke()
All callers pass false for the 'recursive' parameter now. Remove it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-18 15:03:25 +02:00
Kevin Wolf
79ab8b21dc block: Use bdrv_do_drain_begin/end in bdrv_drain_all()
bdrv_do_drain_begin/end() implement already everything that
bdrv_drain_all_begin/end() need and currently still do manually: Disable
external events, call parent drain callbacks, call block driver
callbacks.

It also does two more things:

The first is incrementing bs->quiesce_counter. bdrv_drain_all() already
stood out in the test case by behaving different from the other drain
variants. Adding this is not only safe, but in fact a bug fix.

The second is calling bdrv_drain_recurse(). We already do that later in
the same function in a loop, so basically doing an early first iteration
doesn't hurt.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-18 15:03:25 +02:00
Kevin Wolf
bb67568954 test-bdrv-drain: bdrv_drain() works with cross-AioContext events
As long as nobody keeps the other I/O thread from working, there is no
reason why bdrv_drain() wouldn't work with cross-AioContext events. The
key is that the root request we're waiting for is in the AioContext
we're polling (which it always is for bdrv_drain()) so that aio_poll()
is woken up in the end.

Add a test case that shows that it works. Remove the comment in
bdrv_drain() that claims otherwise.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18 15:03:25 +02:00
Kevin Wolf
b008326744 block: Remove deprecated -drive option serial
The -drive option serial was deprecated in QEMU 2.10. It's time to
remove it.

Tests need to be updated to set the serial number with -global instead
of using the -drive option.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
2018-06-15 14:49:44 +02:00
Markus Armbruster
d083f954a9 rbd: New parameter key-secret
Legacy -drive supports "password-secret" parameter that isn't
available with -blockdev / blockdev-add.  That's because we backed out
our first try to provide it there due to interface design doubts, in
commit 577d8c9a81, v2.9.0.

This is the second try.  It brings back the parameter, except it's
named "key-secret" now.

Let's review our reasons for backing out the first try, as stated in
the commit message:

    * BlockdevOptionsRbd member @password-secret isn't actually a
      password, it's a key generated by Ceph.

Addressed by the rename.

    * We're not sure where member @password-secret belongs (see the
      previous commit).

See previous commit.

    * How @password-secret interacts with settings from a configuration
      file specified with @conf is undocumented.

Not actually true, the documentation for @conf says "Values in the
configuration file will be overridden by options specified via QAPI",
and we've tested this.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-15 14:49:44 +02:00
Markus Armbruster
a3699de4dd rbd: New parameter auth-client-required
Parameter auth-client-required lets you configure authentication
methods.  We tried to provide that in v2.9.0, but backed out due to
interface design doubts (commit 464444fcc1).

This commit is similar to what we backed out, but simpler: we use a
list of enumeration values instead of a list of objects with a member
of enumeration type.

Let's review our reasons for backing out the first try, as stated in
the commit message:

    * The implementation uses deprecated rados_conf_set() key
      "auth_supported".  No biggie.

Fixed: we use "auth-client-required".

    * The implementation makes -drive silently ignore invalid parameters
      "auth" and "auth-supported.*.X" where X isn't "auth".  Fixable (in
      fact I'm going to fix similar bugs around parameter server), so
      again no biggie.

That fix is commit 2836284db6.  This commit doesn't bring the bugs
back.

    * BlockdevOptionsRbd member @password-secret applies only to
      authentication method cephx.  Should it be a variant member of
      RbdAuthMethod?

We've had time to ponder, and we decided to stick to the way Ceph
configuration works: the key configured separately, and silently
ignored if the authentication method doesn't use it.

    * BlockdevOptionsRbd member @user could apply to both methods cephx
      and none, but I'm not sure it's actually used with none.  If it
      isn't, should it be a variant member of RbdAuthMethod?

Likewise.

    * The client offers a *set* of authentication methods, not a list.
      Should the methods be optional members of BlockdevOptionsRbd instead
      of members of list @auth-supported?  The latter begs the question
      what multiple entries for the same method mean.  Trivial question
      now that RbdAuthMethod contains nothing but @type, but less so when
      RbdAuthMethod acquires other members, such the ones discussed above.

Again, we decided to stick to the way Ceph configuration works, except
we make auth-client-required a list of enumeration values instead of a
string containing keywords separated by delimiters.

    * How BlockdevOptionsRbd member @auth-supported interacts with
      settings from a configuration file specified with @conf is
      undocumented.  I suspect it's untested, too.

Not actually true, the documentation for @conf says "Values in the
configuration file will be overridden by options specified via QAPI",
and we've tested this.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-15 14:49:44 +02:00
Markus Armbruster
f853465aac block: Make remaining uses of qobject input visitor more robust
Remaining uses of qobject_input_visitor_new_keyval() in the block
subsystem:

* block_crypto_open_opts_init()
  Currently doesn't visit any non-string scalars, thus safe.  It's
  called from
  - block_crypto_open_luks()
    Creates the QDict with qemu_opts_to_qdict_filtered(), which
    creates only string scalars, but has a TODO asking for other types.
  - qcow_open()
  - qcow2_open(), qcow2_co_invalidate_cache(), qcow2_reopen_prepare()

* block_crypto_create_opts_init(), called from
  - block_crypto_co_create_opts_luks()
    Also creates the QDict with qemu_opts_to_qdict_filtered().

* vdi_co_create_opts()
  Also creates the QDict with qemu_opts_to_qdict_filtered().

Replace these uses by qobject_input_visitor_new_flat_confused() for
robustness.  This adds crumpling.  Right now, that's a no-op, but if
we ever extend these things in non-flat ways, crumpling will be
needed.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-15 14:49:44 +02:00
Markus Armbruster
af91062ee1 block: Factor out qobject_input_visitor_new_flat_confused()
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-15 14:49:44 +02:00
Markus Armbruster
92adf9dbcd block: Clean up a misuse of qobject_to() in .bdrv_co_create_opts()
The following pattern occurs in the .bdrv_co_create_opts() methods of
parallels, qcow, qcow2, qed, vhdx and vpc:

    qobj = qdict_crumple_for_keyval_qiv(qdict, errp);
    qobject_unref(qdict);
    qdict = qobject_to(QDict, qobj);
    if (qdict == NULL) {
         ret = -EINVAL;
         goto done;
    }

    v = qobject_input_visitor_new_keyval(QOBJECT(qdict));
    [...]
    ret = 0;
done:
    qobject_unref(qdict);
    [...]
    return ret;

If qobject_to() fails, we return failure without setting errp.  That's
wrong.  As far as I can tell, it cannot fail here.  Clean it up
anyway, by removing the useless conversion.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-15 14:49:44 +02:00
Markus Armbruster
374c52467a block: Fix -drive for certain non-string scalars
The previous commit fixed -blockdev breakage due to misuse of the
qobject input visitor's keyval flavor in bdrv_file_open().  The commit
message explain why using the plain flavor would be just as wrong; it
would break -drive.  Turns out we break it in three places:
nbd_open(), sd_open() and ssh_file_open().  They are even marked
FIXME.  Example breakage:

    $ qemu-system-x86 -drive node-name=n1,driver=nbd,server.type=inet,server.host=localhost,server.port=1234,server.numeric=off
    qemu-system-x86: -drive node-name=n1,driver=nbd,server.type=inet,server.host=localhost,server.port=1234,server.numeric=off: Invalid parameter type for 'numeric', expected: boolean

Fix it the same way: replace qdict_crumple() by
qdict_crumple_for_keyval_qiv(), and switch from plain to the keyval
flavor.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-15 14:49:44 +02:00
Markus Armbruster
e5af0da1dc block: Fix -blockdev for certain non-string scalars
Configuration flows through the block subsystem in a rather peculiar
way.  Configuration made with -drive enters it as QemuOpts.
Configuration made with -blockdev / blockdev-add enters it as QAPI
type BlockdevOptions.  The block subsystem uses QDict, QemuOpts and
QAPI types internally.  The precise flow is next to impossible to
explain (I tried for this commit message, but gave up after wasting
several hours).  What I can explain is a flaw in the BlockDriver
interface that leads to this bug:

    $ qemu-system-x86_64 -blockdev node-name=n1,driver=nfs,server.type=inet,server.host=localhost,path=/foo/bar,user=1234
    qemu-system-x86_64: -blockdev node-name=n1,driver=nfs,server.type=inet,server.host=localhost,path=/foo/bar,user=1234: Internal error: parameter user invalid

QMP blockdev-add is broken the same way.

Here's what happens.  The block layer passes configuration represented
as flat QDict (with dotted keys) to BlockDriver methods
.bdrv_file_open().  The QDict's members are typed according to the
QAPI schema.

nfs_file_open() converts it to QAPI type BlockdevOptionsNfs, with
qdict_crumple() and a qobject input visitor.

This visitor comes in two flavors.  The plain flavor requires scalars
to be typed according to the QAPI schema.  That's the case here.  The
keyval flavor requires string scalars.  That's not the case here.
nfs_file_open() uses the latter, and promptly falls apart for members
@user, @group, @tcp-syn-count, @readahead-size, @page-cache-size,
@debug.

Switching to the plain flavor would fix -blockdev, but break -drive,
because there the scalars arrive in nfs_file_open() as strings.

The proper fix would be to replace the QDict by QAPI type
BlockdevOptions in the BlockDriver interface.  Sadly, that's beyond my
reach right now.

Next best would be to fix the block layer to always pass correctly
typed QDicts to the BlockDriver methods.  Also beyond my reach.

What I can do is throw another hack onto the pile: have
nfs_file_open() convert all members to string, so use of the keyval
flavor actually works, by replacing qdict_crumple() by new function
qdict_crumple_for_keyval_qiv().

The pattern "pass result of qdict_crumple() to
qobject_input_visitor_new_keyval()" occurs several times more:

* qemu_rbd_open()

  Same issue as nfs_file_open(), but since BlockdevOptionsRbd has only
  string members, its only a latent bug.  Fix it anyway.

* parallels_co_create_opts(), qcow_co_create_opts(),
  qcow2_co_create_opts(), bdrv_qed_co_create_opts(),
  sd_co_create_opts(), vhdx_co_create_opts(), vpc_co_create_opts()

  These work, because they create the QDict with
  qemu_opts_to_qdict_filtered(), which creates only string scalars.
  The function sports a TODO comment asking for better typing; that's
  going to be fun.  Use qdict_crumple_for_keyval_qiv() to be safe.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-15 14:49:44 +02:00
Max Reitz
609f45ea95 block: Add block-specific QDict header
There are numerous QDict functions that have been introduced for and are
used only by the block layer.  Move their declarations into an own
header file to reflect that.

While qdict_extract_subqdict() is in fact used outside of the block
layer (in util/qemu-config.c), it is still a function related very
closely to how the block layer works with nested QDicts, namely by
sometimes flattening them.  Therefore, its declaration is put into this
header as well and util/qemu-config.c includes it with a comment stating
exactly which function it needs.

Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20180509165530.29561-7-mreitz@redhat.com>
[Copyright note tweaked, superfluous includes dropped]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-15 14:49:44 +02:00
Markus Armbruster
deadbb8ebb iscsi: Drop deprecated -drive parameter "filename"
Parameter "filename" is deprecated since commit 5c3ad1a6a8, v2.10.0.
Time to get rid of it.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-15 14:49:44 +02:00
Markus Armbruster
bb9f762ff3 rbd: Drop deprecated -drive parameter "filename"
Parameter "filename" is deprecated since commit 91589d9e5c, v2.10.0.
Time to get rid of it.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-15 14:49:44 +02:00
Vladimir Sementsov-Ogievskiy
b598e531f1 qapi: add x-block-dirty-bitmap-merge
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 20180606182449.1607-5-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
2018-06-11 14:53:32 -04:00
Vladimir Sementsov-Ogievskiy
8b1402ce80 block/dirty-bitmap: add lock to bdrv_enable/disable_dirty_bitmap
Add locks and remove comments about BQL accordingly to
dirty_bitmap_mutex definition in block_int.h.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 20180606182449.1607-2-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
2018-06-11 14:53:31 -04:00
Paolo Bonzini
b133c27f5d block: simplify code around releasing bitmaps
QLIST_REMOVE does not require walking the list, and once the "bitmap"
argument is removed from bdrv_do_release_matching_dirty_bitmap_locked
the code simplifies a lot and it is worth inlining everything in the
callers of bdrv_do_release_matching_dirty_bitmap.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20180326104037.6894-1-pbonzini@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
2018-06-11 14:53:31 -04:00
Paolo Bonzini
ab41fc4853 block: remove bdrv_dirty_bitmap_make_anon
All this function is doing will be repeated by
bdrv_do_release_matching_dirty_bitmap_locked, except
resetting bm->persistent.  But even that does not matter
because the bitmap will be freed.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20180323164254.26487-1-pbonzini@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
2018-06-11 14:53:31 -04:00
Max Reitz
ddf3b47ef4 qcow2: Do not mark inactive images corrupt
When signaling a corruption on a read-only image, qcow2 already makes
fatal events non-fatal (i.e., they will not result in the image being
closed, and the image header's corrupt flag will not be set).  This is
necessary because we cannot set the corrupt flag on read-only images,
and it is possible because further corruption of read-only images is
impossible.

Inactive images are effectively read-only, too, so we should do the same
for them.  bdrv_is_writable() can tell us whether an image can actually
be written to, so use its result instead of !bs->read_only.

(Otherwise, the assert(!(bs->open_flags & BDRV_O_INACTIVE)) in
bdrv_co_pwritev() will fail, crashing qemu.)

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180606193702.7113-3-mreitz@redhat.com
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-11 16:18:45 +02:00
Alberto Garcia
bc33c047d1 throttle: Fix crash on reopen
The throttle block filter can be reopened, and with this it is
possible to change the throttle group that the filter belongs to.

The way the code does that is the following:

  - On throttle_reopen_prepare(): create a new ThrottleGroupMember
    and attach it to the new throttle group.

  - On throttle_reopen_commit(): detach the old ThrottleGroupMember,
    delete it and replace it with the new one.

The problem with this is that by replacing the ThrottleGroupMember the
previous value of io_limits_disabled is lost, causing an assertion
failure in throttle_co_drain_end().

This problem can be reproduced by reopening a throttle node:

   $QEMU -monitor stdio
   -object throttle-group,id=tg0,x-iops-total=1000 \
   -blockdev node-name=hd0,driver=qcow2,file.driver=file,file.filename=hd.qcow2 \
   -blockdev node-name=root,driver=throttle,throttle-group=tg0,file=hd0,read-only=on

   (qemu) block_stream root
   block/throttle.c:214: throttle_co_drain_end: Assertion `tgm->io_limits_disabled' failed.

Since we only want to change the throttle group on reopen there's no
need to create a ThrottleGroupMember and discard the old one. It's
easier if we simply detach it from its current group and attach it to
the new one.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 20180608151536.7378-1-berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-11 16:18:45 +02:00
Vladimir Sementsov-Ogievskiy
7eb24009db block/qcow2-bitmap: fix free_bitmap_clusters
This assert may fail, because bitmap_table is not initialized. Just
drop it, as it's obvious, that bitmap_table_load sets bitmap_table
parameter only when returning zero.

Reported-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20180608101225.2575-1-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-11 16:18:45 +02:00
Max Reitz
3cce51c919 qcow2: Repair OFLAG_COPIED when fixing leaks
Repairing OFLAG_COPIED is usually safe because it is done after the
refcounts have been repaired.  Therefore, it we did not find anyone else
referencing a data or L2 cluster, it makes no sense to not set
OFLAG_COPIED -- and the other direction (clearing OFLAG_COPIED) is
always safe, anyway, it may just induce leaks.

Furthermore, if OFLAG_COPIED is actually consistent with a wrong (leaky)
refcount, we will decrement the refcount with -r leaks, but OFLAG_COPIED
will then be wrong.  qemu-img check should not produce images that are
more corrupted afterwards then they were before.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1527085
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20180509200059.31125-2-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-11 16:18:45 +02:00
Max Reitz
d1402b5026 block: Add Error parameter to bdrv_amend_options
Looking at the qcow2 code that is riddled with error_report() calls,
this is really how it should have been from the start.

Along the way, turn the target_version/current_version comparisons at
the beginning of qcow2_downgrade() into assertions (the caller has to
make sure these conditions are met), and rephrase the error message on
using compat=1.1 to get refcount widths other than 16 bits.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180509210023.20283-3-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-11 16:18:45 +02:00
Max Reitz
b8cf1913a9 block/file-posix: File locking during creation
When creating a file, we should take the WRITE and RESIZE permissions.
We do not need either for the creation itself, but we do need them for
clearing and resizing it.  So we can take the proper permissions by
replacing O_TRUNC with an explicit truncation to 0, and by taking the
appropriate file locks between those two steps.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180509215336.31304-3-mreitz@redhat.com
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-11 16:18:45 +02:00
Max Reitz
d0a96155de block/file-posix: Pass FD to locking helpers
raw_apply_lock_bytes() and raw_check_lock_bytes() currently take a
BDRVRawState *, but they only use the lock_fd field.  During image
creation, we do not have a BDRVRawState, but we do have an FD; so if we
want to reuse the functions there, we should modify them to receive only
the FD.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20180509215336.31304-2-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-11 16:18:45 +02:00
Paolo Bonzini
68acc99f14 sheepdog: remove huge BSS object
block/sheepdog.o has a 4M static variable that is 90% of QEMU's whole .bss
section.  Replace it with a heap-allocated block, and make it smaller too
since only the inode header is actually being used.

bss size goes down from 4464280 to 269976.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20180523160721.14018-3-pbonzini@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
2018-06-05 10:15:12 -04:00
Paolo Bonzini
03b036cc0c sheepdog: cleanup repeated expression
The expression "SD_INODE_SIZE - sizeof(inode.data_vdi_id)" already has a macro
defined for the same value (though with a nicer definition using offsetof).
Replace it.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20180523160721.14018-2-pbonzini@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
2018-06-05 10:15:12 -04:00
Peter Maydell
0d514fa234 Pull request
* Copy offloading for qemu-img convert (iSCSI, raw, and qcow2)
 
    If the underlying storage supports copy offloading, qemu-img convert will
    use it instead of performing reads and writes.  This avoids data transfers
    and thus frees up storage bandwidth for other purposes.  SCSI EXTENDED COPY
    and Linux copy_file_range(2) are used to implement this optimization.
 
  * Drop spurious "WARNING: I\/O thread spun for 1000 iterations" warning
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJbFSBoAAoJEJykq7OBq3PISpEIAIcMao4/rzinAWXzS+ncK9LO
 6FtRVpgutpHaWX2ayySaz5n2CdR3cNMrpCI7sjY2Kw0lrdkqxPgl5n0SWD+VCl4W
 7+JLz/uF0iUV8X+99e7WGAjZbm9LSlxgn5AQKfrrwyPf0ZfzoYQ5nBMcQ6xjEeQP
 48j2WqJqN9/u8RBD07o11yn0+CE5g56/f12xVjR5ASVodzsAmcZ2OQRMQbM01isU
 1mBekJQkDxJkt5l13Rql8+t+vWz8/9BEW2c/eIDKvoayMqYJpdfKv4DqLloIuHnc
 3RkquA0zUuKtl7xEnEkH/We7fi4QPGW/vyBN7ychS/zKzZFQrXmwqrAuFSw3dKU=
 =vZp+
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging

Pull request

 * Copy offloading for qemu-img convert (iSCSI, raw, and qcow2)

   If the underlying storage supports copy offloading, qemu-img convert will
   use it instead of performing reads and writes.  This avoids data transfers
   and thus frees up storage bandwidth for other purposes.  SCSI EXTENDED COPY
   and Linux copy_file_range(2) are used to implement this optimization.

 * Drop spurious "WARNING: I\/O thread spun for 1000 iterations" warning

# gpg: Signature made Mon 04 Jun 2018 12:20:08 BST
# gpg:                using RSA key 9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>"
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35  775A 9CA4 ABB3 81AB 73C8

* remotes/stefanha/tags/block-pull-request:
  main-loop: drop spin_counter
  qemu-img: Convert with copy offloading
  block-backend: Add blk_co_copy_range
  iscsi: Implement copy offloading
  iscsi: Create and use iscsi_co_wait_for_task
  iscsi: Query and save device designator when opening
  file-posix: Implement bdrv_co_copy_range
  qcow2: Implement copy offloading
  raw: Implement copy offloading
  raw: Check byte range uniformly
  block: Introduce API for copy offloading

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-06-04 18:34:04 +01:00
Peter Maydell
f67c9b693a acpi, vhost, misc: fixes, features
vDPA support, fix to vhost blk RO bit handling, some include path
 cleanups, NFIT ACPI table.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJbEXNvAAoJECgfDbjSjVRpc8gH/R8xrcFrV+k9wwbgYcOcGb6Y
 LWjseE31pqJcxRV80vLOdzYEuLStZQKQQY7xBDMlA5vdyvZxIA6FLO2IsiJSbFAk
 EK8pclwhpwQAahr8BfzenabohBv2UO7zu5+dqSvuJCiMWF3jGtPAIMxInfjXaOZY
 odc1zY2D2EgsC7wZZ1hfraRbISBOiRaez9BoGDKPOyBY9G1ASEgxJgleFgoBLfsK
 a1XU+fDM6hAVdxftfkTm0nibyf7PWPDyzqghLqjR9WXLvZP3Cqud4p8N29mY51pR
 KSTjA4FYk6Z9EVMltyBHfdJs6RQzglKjxcNGdlrvacDfyFi79fGdiosVllrjfJM=
 =3+V0
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging

acpi, vhost, misc: fixes, features

vDPA support, fix to vhost blk RO bit handling, some include path
cleanups, NFIT ACPI table.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

# gpg: Signature made Fri 01 Jun 2018 17:25:19 BST
# gpg:                using RSA key 281F0DB8D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>"
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
#      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* remotes/mst/tags/for_upstream: (31 commits)
  vhost-blk: turn on pre-defined RO feature bit
  ACPI testing: test NFIT platform capabilities
  nvdimm, acpi: support NFIT platform capabilities
  tests/.gitignore: add entry for generated file
  arch_init: sort architectures
  ui: use local path for local headers
  qga: use local path for local headers
  colo: use local path for local headers
  migration: use local path for local headers
  usb: use local path for local headers
  sd: fix up include
  vhost-scsi: drop an unused include
  ppc: use local path for local headers
  rocker: drop an unused include
  e1000e: use local path for local headers
  ioapic: fix up includes
  ide: use local path for local headers
  display: use local path for local headers
  trace: use local path for local headers
  migration: drop an unused include
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-06-04 10:15:16 +01:00
Fam Zheng
b5679fa49c block-backend: Add blk_co_copy_range
It's a BlockBackend wrapper of the BDS interface.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20180601092648.24614-10-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-01 14:41:48 +01:00
Fam Zheng
604dfaaa32 iscsi: Implement copy offloading
Issue EXTENDED COPY (LID1) command to implement the copy_range API.

The parameter data construction code is modified from libiscsi's
iscsi-dd.c.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20180601092648.24614-9-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-01 14:41:48 +01:00
Fam Zheng
66e75c03b2 iscsi: Create and use iscsi_co_wait_for_task
This loop is repeated a growing number times. Make a helper.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20180601092648.24614-8-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-01 14:41:48 +01:00
Fam Zheng
cc9743c236 iscsi: Query and save device designator when opening
The device designator data returned in INQUIRY command will be useful to
fill in source/target fields during copy offloading. Do this when
connecting to the target and save the data for later use.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20180601092648.24614-7-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-01 14:41:48 +01:00
Fam Zheng
1efad060d7 file-posix: Implement bdrv_co_copy_range
With copy_file_range(2), we can implement the bdrv_co_copy_range
semantics.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 20180601092648.24614-6-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-01 14:41:48 +01:00
Fam Zheng
fd9fcd37a8 qcow2: Implement copy offloading
The two callbacks are implemented quite similarly to the read/write
functions: bdrv_co_copy_range_from maps for read and calls into bs->file
or bs->backing depending on the allocation status; bdrv_co_copy_range_to
maps for write and calls into bs->file.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 20180601092648.24614-5-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-01 14:41:48 +01:00
Fam Zheng
72d219e2f9 raw: Implement copy offloading
Just pass down to ->file.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20180601092648.24614-4-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-01 14:41:47 +01:00
Fam Zheng
3844553852 raw: Check byte range uniformly
We don't verify the request range against s->size in the I/O callbacks
except for raw_co_pwritev. This is inconsistent (especially for
raw_co_pwrite_zeroes and raw_co_pdiscard), so fix them, in the meanwhile
make the helper reusable by the coming new callbacks.

Note that in most cases the block layer already verifies the request
byte range against our reported image length, before invoking the driver
callbacks.  The exception is during image creating, after
blk_set_allow_write_beyond_eof(blk, true) is called. But in that case,
the requests are not directly from the user or guest. So there is no
visible behavior change in adding the check code.

The int64_t -> uint64_t inconsistency, as shown by the type casting, is
pre-existing due to the interface.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 20180601092648.24614-3-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-01 14:41:47 +01:00
Fam Zheng
fcc6767836 block: Introduce API for copy offloading
Introduce the bdrv_co_copy_range() API for copy offloading.  Block
drivers implementing this API support efficient copy operations that
avoid reading each block from the source device and writing it to the
destination devices.  Examples of copy offload primitives are SCSI
EXTENDED COPY and Linux copy_file_range(2).

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20180601092648.24614-2-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-01 14:41:47 +01:00
Michael S. Tsirkin
0d8c41dae5 block: use local path for local headers
When pulling in headers that are in the same directory as the C file (as
opposed to one in include/), we should use its relative path, without a
directory.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
2018-05-31 04:16:06 +03:00
Kevin Wolf
3fb588a0f2 block/create: Mark blockdev-create stable
We're ready to declare the blockdev-create job stable. This renames the
corresponding QMP command from x-blockdev-create to blockdev-create.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
2018-05-30 13:31:18 +02:00
Kevin Wolf
e5ab4347f9 block/create: Make x-blockdev-create a job
This changes the x-blockdev-create QMP command so that it doesn't block
the monitor and the main loop any more, but starts a background job that
performs the image creation.

The basic job as implemented here is all that is necessary to make image
creation asynchronous and to provide a QMP interface that can be marked
stable, but it still lacks a few features that jobs usually provide: The
job will ignore pause commands and it doesn't publish more than very
basic progress yet (total-progress is 1 and current-progress advances
from 0 to 1 when the driver callbacks returns). These features can be
added later without breaking compatibility.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
2018-05-30 13:31:07 +02:00
Kevin Wolf
1266c9b9f5 job: Add error message for failing jobs
So far we relied on job->ret and strerror() to produce an error message
for failed jobs. Not surprisingly, this tends to result in completely
useless messages.

This adds a Job.error field that can contain an error string for a
failing job, and a parameter to job_completed() that sets the field. As
a default, if NULL is passed, we continue to use strerror(job->ret).

All existing callers are changed to pass NULL. They can be improved in
separate patches.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
2018-05-30 13:31:01 +02:00
Kevin Wolf
4a5f2779ba vhdx: Fix vhdx_co_create() return value
.bdrv_co_create() is supposed to return 0 on success, but vhdx could
return a positive value instead. Fix this.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
2018-05-29 22:19:57 +02:00
Kevin Wolf
53618dd838 vdi: Fix vdi_co_do_create() return value
.bdrv_co_create() is supposed to return 0 on success, but vdi could
return a positive value instead. Fix this.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
2018-05-29 22:19:57 +02:00
Alberto Garcia
7af5eea9b3 qcow2: Fix Coverity warning when calculating the refcount cache size
MIN_REFCOUNT_CACHE_SIZE is 4 and the cluster size is guaranteed to be
at most 2MB, so the minimum refcount cache size (in bytes) is always
going to fit in a 32-bit integer.

Coverity doesn't know that, and since we're storing the result in a
uint64_t (*refcount_cache_size) it thinks that we need the 64 bits and
that we probably want to do a 64-bit multiplication to prevent the
result from being truncated.

This is a false positive in this case, but it's a fair warning.
We could do a 64-bit multiplication to get rid of it, but since we
know that a 32-bit variable is enough to store this value let's simply
reuse min_refcount_cache, make it a normal int and stop doing casts.

Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-05-29 20:09:17 +02:00
Kevin Wolf
30a5c887bf job: Move progress fields to Job
BlockJob has fields .offset and .len, which are actually misnomers today
because they are no longer tied to block device sizes, but just progress
counters. As such they make a lot of sense in generic Jobs.

This patch moves the fields to Job and renames them to .progress_current
and .progress_total to describe their function better.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-05-23 14:30:51 +02:00
Kevin Wolf
2e1795b581 job: Add job_transition_to_ready()
The transition to the READY state was still performed in the BlockJob
layer, in the same function that sent the BLOCK_JOB_READY QMP event.

This patch brings the state transition to the Job layer and implements
the QMP event using a notifier called from the Job layer, like we
already do for other events related to state transitions.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-05-23 14:30:51 +02:00
Kevin Wolf
198c49cc8d job: Add job_yield()
This moves block_job_yield() to the Job layer.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-05-23 14:30:51 +02:00
Kevin Wolf
3d70ff53b6 job: Move completion and cancellation to Job
This moves the top-level job completion and cancellation functions from
BlockJob to Job.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-05-23 14:30:51 +02:00
Kevin Wolf
62c9e4162a job: Switch transactions to JobTxn
This doesn't actually move any transaction code to Job yet, but it
renames the type for transactions from BlockJobTxn to JobTxn and makes
them contain Jobs rather than BlockJobs

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-05-23 14:30:50 +02:00
Kevin Wolf
6a74c075ac job: Move job_finish_sync() to Job
block_job_finish_sync() doesn't contain anything block job specific any
more, so it can be moved to Job.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-05-23 14:30:50 +02:00
Kevin Wolf
3453d97243 job: Move .complete callback to Job
This moves the .complete callback that tells a READY job to complete
from BlockJobDriver to JobDriver. The wrapper function job_complete()
doesn't require anything block job specific any more and can be moved
to Job.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-05-23 14:30:50 +02:00
Kevin Wolf
b69f777dd9 job: Add job_drain()
block_job_drain() contains a blk_drain() call which cannot be moved to
Job, so add a new JobDriver callback JobDriver.drain which has a common
implementation for all BlockJobs. In addition to this we keep the
existing BlockJobDriver.drain callback that is called by the common
drain implementation for all block jobs.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-05-23 14:30:50 +02:00
Kevin Wolf
004e95df98 job: Convert block_job_cancel_async() to Job
block_job_cancel_async() did two things that were still block job
specific:

* Setting job->force. This field makes sense on the Job level, so we can
  just move it. While at it, rename it to job->force_cancel to make its
  purpose more obvious.

* Resetting the I/O status. This can't be moved because generic Jobs
  don't have an I/O status. What the function really implements is a
  user resume, except without entering the coroutine. Consequently, it
  makes sense to call the .user_resume driver callback here which
  already resets the I/O status.

  The old block_job_cancel_async() has two separate if statements that
  check job->iostatus != BLOCK_DEVICE_IO_STATUS_OK and job->user_paused.
  However, the former condition always implies the latter (as is
  asserted in block_job_iostatus_reset()), so changing the explicit call
  of block_job_iostatus_reset() on the former condition with the
  .user_resume callback on the latter condition is equivalent and
  doesn't need to access any BlockJob specific state.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-05-23 14:30:50 +02:00
Kevin Wolf
4ad351819b job: Move single job finalisation to Job
This moves the finalisation of a single job from BlockJob to Job.

Some part of this code depends on job transactions, and job transactions
call this code, we introduce some temporary calls from Job functions to
BlockJob ones. This will be fixed once transactions move to Job, too.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-05-23 14:30:50 +02:00
Kevin Wolf
bb02b65c7d job: Move BlockJobCreateFlags to Job
This renames the BlockJobCreateFlags constants, moves a few JOB_INTERNAL
checks to job_create() and the auto_{finalize,dismiss} fields from
BlockJob to Job.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-05-23 14:30:50 +02:00
Kevin Wolf
b15de82867 job: Move pause/resume functions to Job
While we already moved the state related to job pausing to Job, the
functions to do were still BlockJob only. This commit moves them over to
Job.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
2018-05-23 14:30:50 +02:00
Kevin Wolf
5d43e86e11 job: Add job_sleep_ns()
There is nothing block layer specific about block_job_sleep_ns(), so
move the function to Job.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-05-23 14:30:50 +02:00
Kevin Wolf
da01ff7f38 job: Move coroutine and related code to Job
This commit moves some core functions for dealing with the job coroutine
from BlockJob to Job. This includes primarily entering the coroutine
(both for the first and reentering) and yielding explicitly and at pause
points.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
2018-05-23 14:30:50 +02:00
Kevin Wolf
1908a5590c job: Move defer_to_main_loop to Job
Move the defer_to_main_loop functionality from BlockJob to Job.

The code can be simplified because we can use job->aio_context in
job_defer_to_main_loop_bh() now, instead of having to access the
BlockDriverState.

Probably taking the data->aio_context lock in addition was already
unnecessary in the old code because we didn't actually make use of
anything protected by the old AioContext except getting the new
AioContext, in case it changed between scheduling the BH and running it.
But it's certainly unnecessary now that the BDS isn't accessed at all
any more.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
2018-05-23 14:30:50 +02:00
Kevin Wolf
daa7f2f946 job: Move cancelled to Job
We cannot yet move the whole logic around job cancelling to Job because
it depends on quite a few other things that are still only in BlockJob,
but we can move the cancelled field at least.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
2018-05-23 14:30:49 +02:00
Kevin Wolf
80fa2c756b job: Add reference counting
This moves reference counting from BlockJob to Job.

In order to keep calling the BlockJob cleanup code when the job is
deleted via job_unref(), introduce a new JobDriver.free callback. Every
block job must use block_job_free() for this callback, this is asserted
in block_job_create().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
2018-05-23 14:30:49 +02:00
Kevin Wolf
a50c2ab858 job: Move state transitions to Job
This moves BlockJob.status and the closely related functions
(block_)job_state_transition() and (block_)job_apply_verb to Job. The
two QAPI enums are renamed to JobStatus and JobVerb.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2018-05-23 14:30:49 +02:00
Kevin Wolf
252291eaea job: Add JobDriver.job_type
This moves the job_type field from BlockJobDriver to JobDriver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
2018-05-23 14:30:49 +02:00
Kevin Wolf
8e4c87000f job: Rename BlockJobType into JobType
QAPI types aren't externally visible, so we can rename them without
causing problems. Before we add a job type to Job, rename the enum
so it can be used for more than just block jobs.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
2018-05-23 14:30:49 +02:00
Kevin Wolf
33e9e9bd62 job: Create Job, JobDriver and job_create()
This is the first step towards creating an infrastructure for generic
background jobs that aren't tied to a block device. For now, Job only
stores its ID and JobDriver, the rest stays in BlockJob.

The following patches will move over more parts of BlockJob to Job if
they are meaningful outside the context of a block job.

BlockJob.driver is now redundant, but this patch leaves it around to
avoid unnecessary churn. The next patches will get rid of almost all of
its uses anyway so that it can be removed later with much less churn.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
2018-05-23 14:30:49 +02:00
Peter Maydell
ae8622ec19 sheepdog: Remove unnecessary NULL check in sd_prealloc()
In commit 8b9ad56e9c, we removed the code that could result
in our getting to sd_prealloc()'s out_with_err_set label with a
NULL blk pointer. That makes the NULL check in the error-handling
path unnecessary, and Coverity gripes about it (CID 1390636).
Delete the redundant check.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-05-23 13:29:06 +02:00
Laurent Vivier
4a4ff4c58f Remove unnecessary variables for function return value
Re-run Coccinelle script scripts/coccinelle/return_directly.cocci

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
ppc part
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2018-05-20 08:48:13 +03:00
Kevin Wolf
c82be42cc8 nfs: Remove processed options from QDict
Commit c22a03454 QAPIfied option parsing in the NFS block driver, but
forgot to remove all the options we processed. Therefore, we get an
error in bdrv_open_inherit(), which thinks the remaining options are
invalid. Trying to open an NFS image will result in an error like this:

    Block protocol 'nfs' doesn't support the option 'server.host'

Remove all options from the QDict to make the NFS driver work again.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 20180516160816.26259-1-kwolf@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
2018-05-16 13:37:47 -04:00
Kevin Wolf
54b7af4369 nfs: Fix error path in nfs_options_qdict_to_qapi()
Don't throw away local_err, but propagate it to errp.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 20180516161034.27440-1-kwolf@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
2018-05-16 13:37:47 -04:00
Max Reitz
228345bf5d block: Support BDRV_REQ_WRITE_UNCHANGED in filters
Update the rest of the filter drivers to support
BDRV_REQ_WRITE_UNCHANGED.  They already forward write request flags to
their children, so we just have to announce support for it.

This patch does not cover the replication driver because that currently
does not support flags at all, and because it just grabs the WRITE
permission for its children when it can, so we should be fine just
submitting the incoming WRITE_UNCHANGED requests as normal writes.

It also does not cover format drivers for similar reasons.  They all use
bdrv_format_default_perms() as their .bdrv_child_perm() implementation
so they just always grab the WRITE permission for their file children
whenever possible.  In addition, it often would be difficult to
ascertain whether incoming unchanging writes end up as unchanging writes
in their files.  So we just leave them as normal potentially changing
writes.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20180421132929.21610-7-mreitz@redhat.com
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-05-15 16:15:21 +02:00
Max Reitz
1b1a920b71 block/quorum: Support BDRV_REQ_WRITE_UNCHANGED
We just need to forward it to quorum's children (except in case of a
rewrite because of corruption), but for that we first have to support
flags in child requests at all.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20180421132929.21610-6-mreitz@redhat.com
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-05-15 16:15:21 +02:00
Max Reitz
7adcf59fec block: Set BDRV_REQ_WRITE_UNCHANGED for COR writes
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20180421132929.21610-5-mreitz@redhat.com
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-05-15 16:15:21 +02:00
Max Reitz
c6035964f8 block: Add BDRV_REQ_WRITE_UNCHANGED flag
This flag signifies that a write request will not change the visible
disk content.  With this flag set, it is sufficient to have the
BLK_PERM_WRITE_UNCHANGED permission instead of BLK_PERM_WRITE.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20180421132929.21610-4-mreitz@redhat.com
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-05-15 16:15:21 +02:00
Max Reitz
6c6f24fd84 block: Add COR filter driver
This adds a simple copy-on-read filter driver.  It relies on the already
existing COR functionality in the central block layer code, which may be
moved here once we no longer need it there.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180421132929.21610-2-mreitz@redhat.com
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-05-15 16:15:21 +02:00
Alberto Garcia
52253998ec qcow2: Give the refcount cache the minimum possible size by default
The L2 and refcount caches have default sizes that can be overridden
using the l2-cache-size and refcount-cache-size (an additional
parameter named cache-size sets the combined size of both caches).

Unless forced by one of the aforementioned parameters, QEMU will set
the unspecified sizes so that the L2 cache is 4 times larger than the
refcount cache.

This is based on the premise that the refcount metadata needs to be
only a fourth of the L2 metadata to cover the same amount of disk
space. This is incorrect for two reasons:

 a) The amount of disk covered by an L2 table depends solely on the
    cluster size, but in the case of a refcount block it depends on
    the cluster size *and* the width of each refcount entry.
    The 4/1 ratio is only valid with 16-bit entries (the default).

 b) When we talk about disk space and L2 tables we are talking about
    guest space (L2 tables map guest clusters to host clusters),
    whereas refcount blocks are used for host clusters (including
    L1/L2 tables and the refcount blocks themselves). On a fully
    populated (and uncompressed) qcow2 file, image size > virtual size
    so there are more refcount entries than L2 entries.

Problem (a) could be fixed by adjusting the algorithm to take into
account the refcount entry width. Problem (b) could be fixed by
increasing a bit the refcount cache size to account for the clusters
used for qcow2 metadata.

However this patch takes a completely different approach and instead
of keeping a ratio between both cache sizes it assigns as much as
possible to the L2 cache and the remainder to the refcount cache.

The reason is that L2 tables are used for every single I/O request
from the guest and the effect of increasing the cache is significant
and clearly measurable. Refcount blocks are however only used for
cluster allocation and internal snapshots and in practice are accessed
sequentially in most cases, so the effect of increasing the cache is
negligible (even when doing random writes from the guest).

So, make the refcount cache as small as possible unless the user
explicitly asks for a larger one.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 9695182c2eb11b77cb319689a1ebaa4e7c9d6591.1523968389.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-05-15 16:15:21 +02:00
Alberto Garcia
74c44a5934 Fix error message about compressed clusters with OFLAG_COPIED
Compressed clusters are not supposed to have the COPIED bit set.
"qemu-img check" detects that and prints an error message reporting
the number of the affected host cluster. This doesn't make much sense
because compressed clusters are not aligned to host clusters, so it
would be better to report the offset instead. Plus, the calculation is
wrong and it uses the raw L2 entry as if it was simply an offset.

This patch fixes the error message and reports the offset of the
compressed cluster.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 0f687957feb72e80c740403191a47e607c2463fe.1523376013.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-05-15 16:15:21 +02:00
Kevin Wolf
bd21935b50 blockjob: Add block_job_driver()
The backup block job directly accesses the driver field in BlockJob. Add
a wrapper for getting it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
2018-05-15 16:11:50 +02:00
Kevin Wolf
dee81d5111 blockjob: Introduce block_job_ratelimit_get_delay()
This gets us rid of more direct accesses to BlockJob fields from the
job drivers.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
2018-05-15 16:11:50 +02:00
Kevin Wolf
18bb69287e blockjob: Implement block_job_set_speed() centrally
All block job drivers support .set_speed and all of them duplicate the
same code to implement it. Move that code to blockjob.c and remove the
now useless callback.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
2018-05-15 16:11:50 +02:00
Kevin Wolf
f05fee508f blockjob: Move RateLimit to BlockJob
Every block job has a RateLimit, and they all do the exact same thing
with it, so it should be common infrastructure. Move the struct field
for a start.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
2018-05-15 16:11:50 +02:00
Kevin Wolf
05df8a6a2b blockjob: Wrappers for progress counter access
Block job drivers are not expected to mess with the internals of the
BlockJob object, so provide wrapper functions for one of the cases where
they still do it: Updating the progress counter.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
2018-05-15 16:11:49 +02:00
Eric Blake
e18a58b4e3 block: Merge .bdrv_co_writev{,_flags} in drivers
We have too many driver callback interfaces; simplify the mess
somewhat by merging the flags parameter of .bdrv_co_writev_flags()
into .bdrv_co_writev().  Note that as long as a driver doesn't set
.supported_write_flags, the flags argument will be 0 and behavior is
identical.  Also note that the public function bdrv_co_writev() still
lacks a flags argument; so the driver signature is thus intentionally
slightly different.  But that's not the end of the world, nor the first
time that the driver interface differs slightly from the public
interface.

Ideally, we should be rewriting all of these drivers to use modern
byte-based interfaces.  But that's a more invasive patch to write
and audit, compared to the simplification done here.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-05-15 16:11:41 +02:00
Eric Blake
edfab6a08b block: Drop last of the sector-based aio callbacks
We are gradually moving away from sector-based interfaces, towards
byte-based.  Now that all drivers with aio callbacks are using the
byte-based interfaces, we can remove the sector-based versions.

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-05-15 16:11:41 +02:00
Eric Blake
918889b291 vxhs: Switch to byte-based callbacks
We are gradually moving away from sector-based interfaces, towards
byte-based.  Make the change for the last few sector-based callbacks
in the vxhs driver.

Note that the driver was already using byte-based calls for
performing actual I/O, so this just gets rid of a round trip
of scaling; however, as I don't know if VxHS is tolerant of
non-sector AIO operations, I went with the conservative approach
of adding .bdrv_refresh_limits to override the block layer
defaults back to the pre-patch value of 512.

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-05-15 16:11:41 +02:00
Eric Blake
e8e16d4baf rbd: Switch to byte-based callbacks
We are gradually moving away from sector-based interfaces, towards
byte-based.  Make the change for the last few sector-based callbacks
in the rbd driver.

Note that the driver was already using byte-based calls for
performing actual I/O, so this just gets rid of a round trip
of scaling; however, as I don't know if RBD is tolerant of
non-sector AIO operations, I went with the conservate approach
of adding .bdrv_refresh_limits to override the block layer
defaults back to the pre-patch value of 512.

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-05-15 16:11:41 +02:00
Eric Blake
b3241e9274 null: Switch to byte-based read/write
We are gradually moving away from sector-based interfaces, towards
byte-based.  Make the change for the last few sector-based callbacks
in the null-co and null-aio drivers.

Note that since the null driver does nothing on writes, it trivially
supports the BDRV_REQ_FUA flag (all writes have already landed to
the same bit-bucket without needing an extra flush call).  Also, since
the null driver does just as well with byte-based requests, we can
now avoid cycles wasted on read-modify-write by taking advantage of
the block layer now defaulting the alignment to 1 instead of 512.

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-05-15 16:11:41 +02:00
Eric Blake
de7056a3f9 file-win32: Switch to byte-based callbacks
We are gradually moving away from sector-based interfaces, towards
byte-based.  Make the change for the last few sector-based callbacks
in the file-win32 driver.

Note that the driver was already using byte-based calls for
performing actual I/O, so this just gets rid of a round trip
of scaling; however, as I don't know if Windows is tolerant of
non-sector AIO operations, I went with the conservative approach
of modifying .bdrv_refresh_limits to override the block layer
defaults back to the pre-patch value of 512.

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-05-15 16:11:41 +02:00
Eric Blake
e31f6864a6 block: Support byte-based aio callbacks
We are gradually moving away from sector-based interfaces, towards
byte-based.  Add new sector-based aio callbacks for read and write,
to match the fact that bdrv_aio_pdiscard is already byte-based.

Ideally, drivers should be converted to use coroutine callbacks
rather than aio; but that is not quite as trivial (and if we were
to do that conversion, the null-aio driver would disappear), so for
the short term, converting the signature but keeping things with
aio is easier.  However, we CAN declare that a driver that uses
the byte-based aio interfaces now defaults to byte-based
operations, and must explicitly provide a refresh_limits override
to stick with larger alignments (making the alignment issues more
obvious directly in the drivers touched in the next few patches).

Once all drivers are converted, the sector-based aio callbacks will
be removed; in the meantime, a FIXME comment is added due to a
slight inefficiency that will be touched up as part of that later
cleanup.

Simplify some instances of 'bs->drv' into 'drv' while touching this,
since the local variable already exists to reduce typing.

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-05-15 16:11:41 +02:00
Daniel Henrique Barboza
7803696d85 block-backend: simplify blk_get_aio_context
blk_get_aio_context verifies if BlockDriverState bs is not NULL,
return bdrv_get_aio_context(bs) if true or qemu_get_aio_context()
otherwise. However, bdrv_get_aio_context from block.c already does
this verification itself, also returning qemu_get_aio_context()
if bs is NULL:

AioContext *bdrv_get_aio_context(BlockDriverState *bs)
{
    return bs ? bs->aio_context : qemu_get_aio_context();
}

This patch simplifies blk_get_aio_context to simply call
bdrv_get_aio_context instead of replicating the same logic.

Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-05-15 16:11:41 +02:00
Stefan Hajnoczi
31be8a2a97 block/file-posix: add x-check-page-cache=on|off option
mincore(2) checks whether pages are resident.  Use it to verify that
page cache has been dropped.

You can trigger a verification failure by mmapping the image file from
another process that loads a byte from a page, forcing it to become
resident.  bdrv_co_invalidate_cache() will fail while that process is
alive.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20180427162312.18583-3-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-05-11 16:43:05 +01:00
Stefan Hajnoczi
dd577a26ff block/file-posix: implement bdrv_co_invalidate_cache() on Linux
On Linux posix_fadvise(POSIX_FADV_DONTNEED) invalidates pages*.  Use
this to drop page cache on the destination host during shared storage
migration.  This way the destination host will read the latest copy of
the data and will not use stale data from the page cache.

The flow is as follows:

1. Source host writes out all dirty pages and inactivates drives.
2. QEMU_VM_EOF is sent on migration stream.
3. Destination host invalidates caches before accessing drives.

This patch enables live migration even with -drive cache.direct=off.

* Terms and conditions may apply, please see patch for details.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20180427162312.18583-2-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-05-11 16:43:05 +01:00
Kevin Wolf
a2cb9239b7 sheepdog: Fix sd_co_create_opts() memory leaks
Both the option string for the 'redundancy' option and the
SheepdogRedundancy object that is created accordingly could be leaked in
error paths. This fixes the memory leaks.

Reported by Coverity (CID 1390614 and 1390641).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 20180503153509.22223-1-kwolf@redhat.com
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
2018-05-08 10:47:27 -04:00
Max Reitz
eb36639f7b block/mirror: Make cancel always cancel pre-READY
Commit b76e4458b1 made the mirror block
job respect block-job-cancel's @force flag: With that flag set, it would
now always really cancel, even post-READY.

Unfortunately, it had a side effect: Without that flag set, it would now
never cancel, not even before READY.  Considering that is an
incompatible change and not noted anywhere in the commit or the
description of block-job-cancel's @force parameter, this seems
unintentional and we should revert to the previous behavior, which is to
immediately cancel the job when block-job-cancel is called before source
and target are in sync (i.e. before the READY event).

Cc: qemu-stable@nongnu.org
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1572856
Reported-by: Yanan Fu <yfu@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20180501220509.14152-2-mreitz@redhat.com
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
2018-05-08 10:47:27 -04:00
Stefan Hajnoczi
ddc4115efd block/mirror: honor ratelimit again
Commit b76e4458b1 ("block/mirror: change
the semantic of 'force' of block-job-cancel") accidentally removed the
ratelimit in the mirror job.

Reintroduce the ratelimit but keep the block-job-cancel force=true
behavior that was added in commit
b76e4458b1.

Note that block_job_sleep_ns() returns immediately when the job is
cancelled.  Therefore it's safe to unconditionally call
block_job_sleep_ns() - a cancelled job does not sleep.

This commit fixes the non-deterministic qemu-iotests 185 output.  The
test relies on the ratelimit to make the job sleep until the 'quit'
command is processed.  Previously the job could complete before the
'quit' command was received since there was no ratelimit.

Cc: Liang Li <liliang.opensource@gmail.com>
Cc: Jeff Cody <jcody@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20180424123527.19168-1-stefanha@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2018-05-08 10:47:27 -04:00
Peter Maydell
c8b7e627b4 nbd patches for 2018-05-04
- Vladimir Sementsov-Ogievskiy: 0/2 fix coverity bugs
 - Eric Blake: nbd/client: Fix error messages during NBD_INFO_BLOCK_SIZE
 - Eric Blake: nbd/client: Relax handling of large NBD_CMD_BLOCK_STATUS reply
 -----BEGIN PGP SIGNATURE-----
 Comment: Public key at http://people.redhat.com/eblake/eblake.gpg
 
 iQEcBAABCAAGBQJa7F9jAAoJEKeha0olJ0NqwZIH/3LbaF7Q0CcuB6d+nQo3jYm2
 fGIb8pV4pZLgC4D4qrelXP2Ttn0RNMLNdq2UR6F66/MAFj2/sF4gRE/p82exZRK3
 be2hDQpzKOTgrF7SQN9ccI1df7nrMhvXgo1Y4rhSFZKtMTBPZirDFdAP/2xklSzI
 jMHE/Iq9Ng16303FR5KEiVtmAWxAaagapcvEKIrD0SpoiAX9jk6dAT5EwOHLi4Lf
 2CCe/5iBFC96zLE2xJ8n+esZ1chJJp/2gubmYON/lwLx5fXqYVowywDVNzv+uc8A
 sg2VMjb/UySOLJ6IxdxgGdln0w46RB7u55nRnyH6LcU2IdBeladhkI7Oh/reOTI=
 =wnw9
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2018-05-04' into staging

nbd patches for 2018-05-04

- Vladimir Sementsov-Ogievskiy: 0/2 fix coverity bugs
- Eric Blake: nbd/client: Fix error messages during NBD_INFO_BLOCK_SIZE
- Eric Blake: nbd/client: Relax handling of large NBD_CMD_BLOCK_STATUS reply

# gpg: Signature made Fri 04 May 2018 14:25:55 BST
# gpg:                using RSA key A7A16B4A2527436A
# gpg: Good signature from "Eric Blake <eblake@redhat.com>"
# gpg:                 aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>"
# gpg:                 aka "[jpeg image of size 6874]"
# Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2  F3AA A7A1 6B4A 2527 436A

* remotes/ericb/tags/pull-nbd-2018-05-04:
  nbd/client: Relax handling of large NBD_CMD_BLOCK_STATUS reply
  nbd/client: Fix error messages during NBD_INFO_BLOCK_SIZE
  migration/block-dirty-bitmap: fix memory leak in dirty_bitmap_load_bits
  nbd/client: fix nbd_negotiate_simple_meta_context

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-05-04 14:42:46 +01:00
Eric Blake
acfd8f7a5f nbd/client: Relax handling of large NBD_CMD_BLOCK_STATUS reply
The NBD spec is proposing a relaxation of NBD_CMD_BLOCK_STATUS
where a server may have the final extent per context give a
length beyond the original request, if it can easily prove that
subsequent bytes have the same status, on the grounds that a
client can take advantage of this information for fewer block
status requests.  Since qemu 2.12 as a client always sends
NBD_CMD_FLAG_REQ_ONE, and rejects a server that sends extra
length, the upstream NBD spec will probably limit this behavior
to clients that don't request REQ_ONE semantics; but it doesn't
hurt to relax qemu to always be permissive of this server
behavior, even if it continues to use REQ_ONE.

CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180503222626.1303410-1-eblake@redhat.com>
Reviewed-by:  Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2018-05-04 08:23:39 -05:00
Marc-André Lureau
f5a74a5a50 qobject: Modify qobject_ref() to return obj
For convenience and clarity, make it possible to call qobject_ref() at
the time when the reference is associated with a variable, or
argument, by making qobject_ref() return the same pointer as given.
Use that to simplify the callers.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180419150145.24795-5-marcandre.lureau@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Useless change to qobject_ref_impl() dropped, commit message improved
slightly]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2018-05-04 08:27:53 +02:00
Marc-André Lureau
cb3e7f08ae qobject: Replace qobject_incref/QINCREF qobject_decref/QDECREF
Now that we can safely call QOBJECT() on QObject * as well as its
subtypes, we can have macros qobject_ref() / qobject_unref() that work
everywhere instead of having to use QINCREF() / QDECREF() for QObject
and qobject_incref() / qobject_decref() for its subtypes.

The replacement is mechanical, except I broke a long line, and added a
cast in monitor_qmp_cleanup_req_queue_locked().  Unlike
qobject_decref(), qobject_unref() doesn't accept void *.

Note that the new macros evaluate their argument exactly once, thus no
need to shout them.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180419150145.24795-4-marcandre.lureau@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Rebased, semantic conflict resolved, commit message improved]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2018-05-04 08:27:53 +02:00
Vladimir Sementsov-Ogievskiy
605bc8be42 qcow2: try load bitmaps only once
Checking reopen by existence of some bitmaps is wrong, as it may be
some other bitmaps, or on the other hand, user may remove bitmaps. This
criteria is bad. To simplify things and make behavior more predictable
let's just add a flag to remember, that we've already tried to load
bitmaps on open and do not want do it again.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20180411122606.367301-2-vsementsov@virtuozzo.com
[mreitz: Changed comment wording according to Eric Blake's suggestion]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-04-16 13:35:32 +02:00
Kevin Wolf
2fe4bba19b commit/stream: Reset delay_ns
Streaming and the commit block job only want to apply throttling when
they actually copied data instead of skipping it, so they made the
calculation of delay_ns conditional. However, delay_ns isn't reset when
skipping some sectors, so instead of not waiting, the old delay is
applied again.

Properly reset delay_ns where needed.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2018-04-10 16:33:47 +02:00
Jeff Cody
bfb15b4bec block/rbd: remove processed options from qdict
Commit 4bfb274 added some QAPIfication of option parsing in
qemu_rbd_open().  We need to remove all the options we processed,
otherwise in bdrv_open_inherit() we will think the remaining options are
invalid.

(This needs to go in 2.12 to avoid a regression that prevents rbd
from being opened.)

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2018-04-04 12:05:13 -04:00
Peter Maydell
fd69ad866b Block layer patches
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJaw6JlAAoJEH8JsnLIjy/W8JcP/06bxQ+8056Vz/+oRAm3VEZN
 iFJ1w+A1cFVpaqeE1fl+nfluyT+bU4J8YIyeHQlgckFyzQz13v0cSSSpBJYfJAO9
 XSO/g96kILwN+h16NNdKX2K7uewVPZgyDbNf+FzM2UHuSSXdSp0ZbeW1yXvB6cpx
 hk2exA0Vfxm2dq2Xzz00EzGCWdZAnWWesj9pimASQ1W0426t7PJRCrRaUVawaJfT
 J+IB4zsF4BJ49CvqmeFqyPfDV9q1GF3c9EkXzFgw7qYY2C4F4bAKv82AxWiKEiRI
 cm1bHEdVyDFuBxzMuGOhZS3g2YkJDnEbkfJ0zj5cKmmYO2Pc0q8DeAMZTvy3aobL
 R5UawZTgPv2O71hOlRFl/YZwMjhb80BXDq9mx7TrrhQ5/wCAUOs4e1XYHn+kzL3W
 ohp5r74Qj1s84pGi8sPaXWm8myxalkTicT1cKFHztb+3+DwVPQIOXhMkMqrJyAK8
 3GwZsGmtzI5DCIyF3whLbElN/OHRM9zr/T+uwhRQw8bs/XYPU27f367Y+JJGAaZq
 xTD7vlF49XSOoSDTYcvJfODlloJODyL+CxFS/vpJKfMYgDWYQZiwvYeIEywmwIYE
 2bEdgcN/9BOdYK4NRuJzDY+Vra7iVnDvXhsZ8UClp/SPLspJ54zwCny9f5rSIDJF
 NSiBcahA5femreYqAfWz
 =quyk
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches

# gpg: Signature made Tue 03 Apr 2018 16:48:53 BST
# gpg:                using RSA key 7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream:
  iotests: Test abnormally large size in compressed cluster descriptor
  qemu-iotests: Use ppc64 qemu_arch on ppc64le host
  iotests: Test preallocated truncate of 2G image
  block/file-posix: Fix fully preallocated truncate
  iotests: fix 208 for luks format
  iotests: Update 186 after commit ac64273c66
  iotests: Update 051 and 186 after commit 1454509726
  block: handle invalid lseek returns gracefully
  gluster: Fix blockdev-add with server.N.type=unix

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-04-04 14:00:07 +01:00
Peter Maydell
e5efa1f5f2 -----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJaw6duAAoJEL2+eyfA3jBXoYUP/07v6FQ0E91eprL2Zg2uowA1
 u6e/neVhlzZgMpOevWfnjScBYHA5wjPjM2ZbKckbnFjH60s7gVBuYtbnwcChRMY1
 LXJts1LivuQOru5QIf0Fwi2W4vkPzV8rQsWp6nmnIDf/Jzs2iKEe6jy6A2enSLYM
 FKOrJc1ViNWFpW1y/7mV8gHIAChEC3wEL9L2JUQEvi4VQD7uCwJv0YS5Ci+xjV21
 /DPaVfZA4VMNNi+FTRezW+l5m0zOxenxUnDXqpP5yjgPyTpyqfHSCCVK54HLQ2L/
 8uMblucjRRFXl6zsX2Quzwqiq80D9KxidhOKk9OU1Yi+rbDz4vSTlE/fBRHKVVln
 Rd8DhTK7aPG1bsiQSHCqcd4+a9qsHOn0LY7tS1m78CaVx1O8JhMBe5Y+NJEMR8AM
 iiWss8QyXggDkagzyiGrMHguCI5yM0EWqYfWtjLBiKOkzTjn9xzSbAbn9JK9TBBa
 aZ6JwNE1krtsTydJ1K37KFNabX/yyZR1KlYUc90FwRC1b9pfPHCmaMQ2imx/pg/y
 f3p3uG1UQmBXMMgchstqc1Xf4BqEeOV5pBKlF2Dwodi1ewZoS+WtuvqjURs2aJUr
 2YLOGPBtPHboWl+kjmn+csJJ8zSWw9byRyHHKJN7zFEx3iXZ4I9HMauzap+goWJq
 2NCKYjzyWkuR7YLR4w3q
 =vLaJ
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/cody/tags/block-pull-request' into staging

# gpg: Signature made Tue 03 Apr 2018 17:10:22 BST
# gpg:                using RSA key BDBE7B27C0DE3057
# gpg: Good signature from "Jeffrey Cody <jcody@redhat.com>"
# gpg:                 aka "Jeffrey Cody <jeff@codyprime.org>"
# gpg:                 aka "Jeffrey Cody <codyprime@gmail.com>"
# Primary key fingerprint: 9957 4B4D 3474 90E7 9D98  D624 BDBE 7B27 C0DE 3057

* remotes/cody/tags/block-pull-request:
  gluster: Fix blockdev-add with server.N.type=unix
  blockjob: use qapi enum helpers
  blockjob: leak fix, remove from txn when failing early

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-04-04 12:33:23 +01:00
Max Reitz
82b45e0a0b block/file-posix: Fix fully preallocated truncate
Storing the lseek() result in an int results in it overflowing when the
file is at least 2 GB big.  Then, we have a 50 % chance of the result
being "negative" and thus thinking an error occurred when actually
everything went just fine.

So we should use the correct type for storing the result: off_t.

Reported-by: Daniel P. Berrange <berrange@redhat.com>
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1549231
Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180228131315.30194-2-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-04-03 17:39:37 +02:00
Kevin Wolf
9dae635afa gluster: Fix blockdev-add with server.N.type=unix
The legacy command line interface gets the socket path from an option
called 'socket'. QAPI in contract uses SocketAddress, where the
corresponding option is called 'path'.

Fix the gluster block driver to accept both 'socket' and 'path', with
'path' being the preferred syntax.

https://bugzilla.redhat.com/show_bug.cgi?id=1545155

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 20180403110810.25624-1-kwolf@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2018-04-03 09:57:14 -04:00
Jeff Cody
a03083a017 block: handle invalid lseek returns gracefully
In commit 223a23c198, we implemented a
workaround in the gluster driver to handle invalid values returned for
SEEK_DATA or SEEK_HOLE.

In some instances, these same invalid values can be seen in the posix
file handler as well - for example, it has been reported on FUSE gluster
mounts.

Calling assert() for these invalid values is overly harsh; we can safely
return -EIO and allow this case to be treated as a "learned nothing"
case (e.g., D4 / H4, as commented in the code).

This patch does the same thing that 223a23c198 did for gluster.c,
except in file-posix.c

Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-04-03 15:25:17 +02:00
Kevin Wolf
3e4d88eabf gluster: Fix blockdev-add with server.N.type=unix
The legacy command line interface gets the socket path from an option
called 'socket'. QAPI in contract uses SocketAddress, where the
corresponding option is called 'path'.

Fix the gluster block driver to accept both 'socket' and 'path', with
'path' being the preferred syntax.

https://bugzilla.redhat.com/show_bug.cgi?id=1545155

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2018-04-03 15:20:36 +02:00
Eric Blake
00d96a4612 nbd: Fix 32-bit compilation on BLOCK_STATUS
iotests 123 and 209 fail on 32-bit platforms.  The culprit:
sizeof(extent) is wrong; we want sizeof(*extent).  But since
the struct is 8 bytes, it happened to work on 64-bit platforms
where the pointer is also 8 bytes (nasty).

Fixes: 78a33ab58
Reported-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180327210517.1804242-1-eblake@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2018-04-02 08:45:21 -05:00
Peter Maydell
f55e88f2ab qapi patches for 2018-03-27, 2.12-rc1
- Marc-André Lureau: qmp-test: fix response leak
 - Eric Blake: tests: Silence false positive warning on generated test name
 - Laurent Vivier: 0/4 (partial) coccinelle: re-run scripts from scripst/coccinelle
 - Peter Xu: 0/8 Monitor: some oob related patches (fixes, new param, tests)
 - Satheesh Rajendran: hmp.c: Revert hmp_info_cpus output format change
 -----BEGIN PGP SIGNATURE-----
 Comment: Public key at http://people.redhat.com/eblake/eblake.gpg
 
 iQEcBAABCAAGBQJaumDMAAoJEKeha0olJ0NqY34IAJ1AJLZeqtb9HIPDrL/iydXQ
 qrMPxfz/fbcjMbomYvltvCYB3GMQw88YVtTuu2kS5aZsr6vLhpw2ZjGrb+a0llFM
 lF6oERg/6VmfQNaeqcHEzORIdG6/GEctxoWS07f+NgSrjdxNNHyU+i/LGvvA5K4m
 x7akPw3DZIF/oBLDlFgtl80KtOf8vQ8QitCIp2nsURQFLHvjVSoXHGUfhoiUUgNA
 89rF62Repq9xZXSsFr+Jpp4/bPGxA717sixXSxFUDuTVIZGnffbt4wVxLi4TKSx1
 1Ho6psPK3d82rQ9OeIc6CqulElLMPGn7aNwR1VR6ig0sB409K5PQJybJDqQD/C4=
 =ld/c
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/ericb/tags/pull-qapi-2018-03-27-v2' into staging

qapi patches for 2018-03-27, 2.12-rc1

- Marc-André Lureau: qmp-test: fix response leak
- Eric Blake: tests: Silence false positive warning on generated test name
- Laurent Vivier: 0/4 (partial) coccinelle: re-run scripts from scripst/coccinelle
- Peter Xu: 0/8 Monitor: some oob related patches (fixes, new param, tests)
- Satheesh Rajendran: hmp.c: Revert hmp_info_cpus output format change

# gpg: Signature made Tue 27 Mar 2018 16:18:36 BST
# gpg:                using RSA key A7A16B4A2527436A
# gpg: Good signature from "Eric Blake <eblake@redhat.com>"
# gpg:                 aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>"
# gpg:                 aka "[jpeg image of size 6874]"
# Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2  F3AA A7A1 6B4A 2527 436A

* remotes/ericb/tags/pull-qapi-2018-03-27-v2:
  hmp.c: Revert hmp_info_cpus output format change
  tests: qmp-test: add test for new "x-oob"
  tests: Add parameter to qtest_init_without_qmp_handshake
  monitor: new parameter "x-oob"
  qmp: cleanup qmp queues properly
  tests: add oob-test for qapi-schema
  tests: let qapi-schema tests detect oob
  qapi: restrict allow-oob value to be "true"
  qmp: fix qmp_capabilities error regression
  qdict: remove useless cast
  error: Remove NULL checks on error_propagate() calls
  error: Strip trailing '\n' from error string arguments (again again)
  tests: Silence false positive warning on generated test name
  qmp-test: fix response leak

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-03-27 19:20:57 +01:00
Peter Maydell
6cf38cbf29 -----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJaulgHAAoJEJykq7OBq3PI0owH/3isnzyO8efBzlv4HlWb1qV0
 F7m9urAKZm1QLCkJBlli/G5iNaNUfrXSum4qYK22XAiwV5WAoPAjwrOvtTRRTSfW
 vFGAucmf6NKkomuZRAiPQftqR+DL9VhBC7mcdRwfwWZ82AGZZnAQsDf8y9e3kal2
 YPwogmvQi+bn2Psv6q3zYtrTW9bC1VJWzYFP4xcSpcxGOY/WfLhPX9nrhjQz07mm
 h/y9mWqXA8PytmopMLj31xRmhTSndIwo+uvScLH95UXRuAOhjyHMvvD4HfI4nkR+
 WD94pE6UiUpIdCZ6fh8H/jaCpN8jPnXI4XJg9cQcRHZYkOrHwQ8G1xxoOintjW0=
 =55b0
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging

# gpg: Signature made Tue 27 Mar 2018 15:41:11 BST
# gpg:                using RSA key 9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>"
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35  775A 9CA4 ABB3 81AB 73C8

* remotes/stefanha/tags/block-pull-request:
  MAINTAINERS: add include/block/aio-wait.h
  coroutine: add test-aio coroutine queue chaining test case
  coroutine: avoid co_queue_wakeup recursion
  queue: add QSIMPLEQ_PREPEND()

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-03-27 17:11:33 +01:00
Laurent Vivier
625eaca9e5 qdict: remove useless cast
Re-run Coccinelle script scripts/coccinelle/qobject.cocci

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-Id: <20180323143202.28879-5-lvivier@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2018-03-27 10:17:32 -05:00
Peter Maydell
bdc408e91b A fix for dirty bitmap migration through shared storage, and a VMDK
patch keeping us from creating too large extents.
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJauVVBAAoJEPQH2wBh1c9AGtYIAJ1ojl6+guKQHjtzv9W8ch50
 2tzH9/lFkhq/Tyay8MCcq1dWQr23tKqusxi6fDHVdc8XIx6fDuPFzWxUQwwLvS81
 9CA6Qs2NngkiAs89ZZ0Sc9aj+LzFbKvXDXPYd4FFeLVVzD0CA4qghH5THrrH6LRm
 4DoRUK1QejrOC0v2zCRQvEN6RlI6WFGCf9YiYKkdrvswnjtLgOobCt7TvC9Nade2
 smGyxtDENkV6bAZgQgxMAlf88GCyKslb4Fu6U+sfKMDejWmlYEREbBsQc+/Gp6Af
 R6ZeiXEJ+KUF34RktPTmhJlUbzRc4Mw0Ij7Abrfhtpre9hifIp62XSrM8sasgLo=
 =k+ly
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/maxreitz/tags/pull-block-2018-03-26' into staging

A fix for dirty bitmap migration through shared storage, and a VMDK
patch keeping us from creating too large extents.

# gpg: Signature made Mon 26 Mar 2018 21:17:05 BST
# gpg:                using RSA key F407DB0061D5CF40
# gpg: Good signature from "Max Reitz <mreitz@redhat.com>"
# Primary key fingerprint: 91BE B60A 30DB 3E88 57D1  1829 F407 DB00 61D5 CF40

* remotes/maxreitz/tags/pull-block-2018-03-26:
  vmdk: return ERROR when cluster sector is larger than vmdk limitation
  iotests: enable shared migration cases in 169
  qcow2: fix bitmaps loading when bitmaps already exist
  qcow2-bitmap: add qcow2_reopen_bitmaps_rw_hint()

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-03-27 14:11:30 +01:00
Stefan Hajnoczi
c40a254570 coroutine: avoid co_queue_wakeup recursion
qemu_aio_coroutine_enter() is (indirectly) called recursively when
processing co_queue_wakeup.  This can lead to stack exhaustion.

This patch rewrites co_queue_wakeup in an iterative fashion (instead of
recursive) with bounded memory usage to prevent stack exhaustion.

qemu_co_queue_run_restart() is inlined into qemu_aio_coroutine_enter()
and the qemu_coroutine_enter() call is turned into a loop to avoid
recursion.

There is one change that is worth mentioning:  Previously, when
coroutine A queued coroutine B, qemu_co_queue_run_restart() entered
coroutine B from coroutine A.  If A was terminating then it would still
stay alive until B yielded.  After this patch B is entered by A's parent
so that a A can be deleted immediately if it is terminating.

It is safe to make this change since B could never interact with A if it
was terminating anyway.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20180322152834.12656-3-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-03-27 13:05:28 +01:00
yuchenlin
a77672ea3d vmdk: return ERROR when cluster sector is larger than vmdk limitation
VMDK has a hard limitation of extent size, which is due to the size of grain
table entry is 32 bits. It means it can only point to a grain located at
offset = 2^32. To avoid writing the user data beyond limitation and record a useless offset
in grain table. We should return ERROR here.

Signed-off-by: yuchenlin <yuchenlin@synology.com>
Message-id: 20180322133337.28024-1-yuchenlin@synology.com
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-03-26 21:17:24 +02:00
Vladimir Sementsov-Ogievskiy
2d949dfcef qcow2: fix bitmaps loading when bitmaps already exist
On reopen with existing bitmaps, instead of loading bitmaps, lets
reopen them if needed. This also fixes bitmaps migration through
shared storage.
Consider the case. Persistent bitmaps are stored on bdrv_inactivate.
Then, on destination process_incoming_migration_bh() calls
bdrv_invalidate_cache_all() which leads to
qcow2_load_autoloading_dirty_bitmaps() which fails if bitmaps are
already loaded on destination start.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20180320170521.32152-3-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-03-26 21:17:24 +02:00
Vladimir Sementsov-Ogievskiy
b1336cc2ec qcow2-bitmap: add qcow2_reopen_bitmaps_rw_hint()
Add version of qcow2_reopen_bitmaps_rw, which do the same work but
also return a hint about was header updated or not. This will be
used in the following fix for bitmaps reloading after migration.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180320170521.32152-2-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-03-26 21:17:24 +02:00
Kevin Wolf
6f16f7c562 vhdx: Check for 4 GB maximum log size on creation
It's unclear what the real maximum is, but we use an uint32_t to store
the log size in vhdx_co_create(), so we should check that the given
value fits in 32 bits.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
2018-03-26 12:17:43 +02:00
Kevin Wolf
0fcc38e7d0 vhdx: Don't use error_setg_errno() with constant errno
error_setg_errno() is meant for cases where we got an errno from the OS
that can add useful extra information to an error message. It's
pointless if we pass a constant errno, these cases should use plain
error_setg().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
2018-03-26 12:17:43 +02:00
Kevin Wolf
b412f49407 vhdx: Require power-of-two block size on create
Images with a non-power-of-two block size are invalid and cannot be
opened. Reject such block sizes when creating an image.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
2018-03-26 12:17:43 +02:00
Kevin Wolf
2332d82589 parallels: Check maximum cluster size on create
It's unclear what the real maximum cluster size is for the Parallels
format, but let's at least make sure that we don't get integer
overflows in our .bdrv_co_create implementation.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2018-03-26 12:17:43 +02:00
Kevin Wolf
120bc742c0 luks: Turn another invalid assertion into check
Commit e39e959e fixed an invalid assertion in the .bdrv_length
implementation, but left a similar assertion in place for
.bdrv_truncate. Instead of crashing when the user requests a too large
image size, fail gracefully.

A file size of exactly INT64_MAX caused failure before, but is actually
legal.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2018-03-26 12:17:40 +02:00
Kevin Wolf
95a14d51b2 vdi: Fix build with CONFIG_VDI_DEBUG
Use qemu_uuid_unparse() instead of uuid_unparse() to make vdi.c compile
again when CONFIG_VDI_DEBUG is set. In order to prevent future bitrot,
replace '#ifdef CONFIG_VDI_DEBUG' by 'if (VDI_DEBUG)' so that the
compiler always sees the code.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2018-03-26 12:16:12 +02:00
Kevin Wolf
61fa64871d vdi: Change 'static' create option to 'preallocation' in QMP
What static=on really does is what we call metadata preallocation for
other block drivers. While we can still change the QMP interface, make
it more consistent by using 'preallocation' for VDI, too.

This doesn't implement any new functionality, so the only supported
preallocation modes are 'off' and 'metadata' for now.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2018-03-26 12:16:12 +02:00
Alberto Garcia
abf754fe40 qcow2: Reset free_cluster_index when allocating a new refcount block
When we try to allocate new clusters we first look for available ones
starting from s->free_cluster_index and once we find them we increase
their reference counts. Before we get to call update_refcount() to do
this last step s->free_cluster_index is already pointing to the next
cluster after the ones we are trying to allocate.

During update_refcount() it may happen however that we also need to
allocate a new refcount block in order to store the refcounts of these
new clusters (and to complicate things further that may also require
us to grow the refcount table). After all this we don't know if the
clusters that we originally tried to allocate are still available, so
we return -EAGAIN to ask the caller to restart the search for free
clusters.

This is what can happen in a common scenario:

  1) We want to allocate a new cluster and we see that cluster N is
     free.

  2) We try to increase N's refcount but all refcount blocks are full,
     so we allocate a new one at N+1 (where s->free_cluster_index was
     pointing at).

  3) Once we're done we return -EAGAIN to look again for a free
     cluster, but now s->free_cluster_index points at N+2, so that's
     the one we allocate. Cluster N remains unallocated and we have a
     hole in the qcow2 file.

This can be reproduced easily:

     qemu-img create -f qcow2 -o cluster_size=512 hd.qcow2 1M
     qemu-io -c 'write 0 124k' hd.qcow2

After this the image has 132608 bytes (256 clusters), and the refcount
block is full. If we write 512 more bytes it should allocate two new
clusters: the data cluster itself and a new refcount block.

     qemu-io -c 'write 124k 512' hd.qcow2

However the image has now three new clusters (259 in total), and the
first one of them is empty (and unallocated):

     dd if=hd.qcow2 bs=512c skip=256 count=1 | hexdump -C

If we write larger amounts of data in the last step instead of the 512
bytes used in this example we can create larger holes in the qcow2
file.

What this patch does is reset s->free_cluster_index to its previous
value when alloc_refcount_block() returns -EAGAIN. This way the caller
will try to allocate again the original clusters if they are still
free.

The output of iotest 026 also needs to be updated because now that
images have no holes some tests fail at a different point and the
number of leaked clusters is different.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-26 12:16:00 +02:00
Fabiano Rosas
8140e786f0 block/blkreplay: Remove protocol-related fields
The blkreplay driver is not a protocol so it should implement bdrv_open
instead of bdrv_file_open and not provide a protocol_name.

Attempts to invoke this driver using protocol syntax
(i.e. blkreplay:<filename:options:...>) will now fail gracefully:

  $ qemu-img info blkreplay:foo
  qemu-img: Could not open 'blkreplay:foo': Unknown protocol 'blkreplay'

Signed-off-by: Fabiano Rosas <farosas@linux.vnet.ibm.com>
Reviewed-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-26 12:16:00 +02:00
Fabiano Rosas
a7328ba55f block/throttle: Remove protocol-related fields
The throttle driver is not a protocol so it should implement bdrv_open
instead of bdrv_file_open and not provide a protocol_name.

Attempts to invoke this driver using protocol syntax
(i.e. throttle:<filename:options:...>) will now fail gracefully:

  $ qemu-img info throttle:foo
  qemu-img: Could not open 'throttle:foo': Unknown protocol 'throttle'

Signed-off-by: Fabiano Rosas <farosas@linux.vnet.ibm.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-26 12:16:00 +02:00
Fabiano Rosas
65d2c3e2f6 block/quorum: Remove protocol-related fields
The quorum driver is not a protocol so it should implement bdrv_open
instead of bdrv_file_open and not provide a protocol_name.

Attempts to invoke this driver using protocol syntax
(i.e. quorum:<filename:options:...>) will now fail gracefully:

  $ qemu-img info quorum:foo
  qemu-img: Could not open 'quorum:foo': Unknown protocol 'quorum'

Signed-off-by: Fabiano Rosas <farosas@linux.vnet.ibm.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-26 12:16:00 +02:00
Fabiano Rosas
cb83d2efe1 block/replication: Remove protocol_name field
The protocol_name field is used when selecting a driver via protocol
syntax (i.e. <protocol_name>:<filename:options:...>). Drivers that are
only selected explicitly (e.g. driver=replication,mode=primary,...)
should not have a protocol_name.

This patch removes the protocol_name field from the brdv_replication
structure so that attempts to invoke this driver using protocol syntax
will fail gracefully:

  $ qemu-img info replication:foo
  qemu-img: Could not open 'replication:': Unknown protocol 'replication'

Buglink: https://bugs.launchpad.net/qemu/+bug/1726733
Signed-off-by: Fabiano Rosas <farosas@linux.vnet.ibm.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-26 12:16:00 +02:00
Vladimir Sementsov-Ogievskiy
7e5c776d15 qapi: add block latency histogram interface
Set (and clear) histograms through new command
block-latency-histogram-set and show new statistics in
query-blockstats results.

For now, the command is marked experimental with prefix 'x-',
to gain experience with the interface without being stuck
with design decisions.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20180309165212.97144-3-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
[eblake: fix typos, mention x- prefix in commit message]
Signed-off-by: Eric Blake <eblake@redhat.com>
2018-03-19 14:58:38 -05:00
Vladimir Sementsov-Ogievskiy
b741ae7417 block/accounting: introduce latency histogram
Introduce latency histogram statics for block devices.
For each accounted operation type, the latency region [0, +inf) is
divided into subregions by several points. Then, calculate
hits for each subregion.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20180309165212.97144-2-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2018-03-19 14:58:37 -05:00
Max Reitz
7dc847ebba qapi: Replace qobject_to_X(o) by qobject_to(X, o)
This patch was generated using the following Coccinelle script:

@@
expression Obj;
@@
(
- qobject_to_qnum(Obj)
+ qobject_to(QNum, Obj)
|
- qobject_to_qstring(Obj)
+ qobject_to(QString, Obj)
|
- qobject_to_qdict(Obj)
+ qobject_to(QDict, Obj)
|
- qobject_to_qlist(Obj)
+ qobject_to(QList, Obj)
|
- qobject_to_qbool(Obj)
+ qobject_to(QBool, Obj)
)

and a bit of manual fix-up for overly long lines and three places in
tests/check-qjson.c that Coccinelle did not find.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20180224154033.29559-4-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: swap order from qobject_to(o, X), rebase to master, also a fix
to latent false-positive compiler complaint about hw/i386/acpi-build.c]
Signed-off-by: Eric Blake <eblake@redhat.com>
2018-03-19 14:58:36 -05:00
Paolo Bonzini
b16a7cfac5 iscsi: fix iSER compilation
This fails in Fedora 28.

Reported-by: Andreas Schwab <schwab@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-19 12:01:39 +01:00
Fam Zheng
4f8e3a1f22 vvfat: Fix inherit_options flags
Overriding flags violates the precedence rules of
bdrv_reopen_queue_child. Just like the read-only option, no-flush should
be put into the options. The same is done in bdrv_temp_snapshot_options.

Reported-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-19 12:01:39 +01:00
Liang Li
b76e4458b1 block/mirror: change the semantic of 'force' of block-job-cancel
When doing drive mirror to a low speed shared storage, if there was heavy
BLK IO write workload in VM after the 'ready' event, drive mirror block job
can't be canceled immediately, it would keep running until the heavy BLK IO
workload stopped in the VM.

Libvirt depends on the current block-job-cancel semantics, which is that
when used without a flag after the 'ready' event, the command blocks
until data is in sync.  However, these semantics are awkward in other
situations, for example, people may use drive mirror for realtime
backups while still wanting to use block live migration.  Libvirt cannot
start a block live migration while another drive mirror is in progress,
but the user would rather abandon the backup attempt as broken and
proceed with the live migration than be stuck waiting for the current
drive mirror backup to finish.

The drive-mirror command already includes a 'force' flag, which libvirt
does not use, although it documented the flag as only being useful to
quit a job which is paused.  However, since quitting a paused job has
the same effect as abandoning a backup in a non-paused job (namely, the
destination file is not in sync, and the command completes immediately),
we can just improve the documentation to make the force flag obviously
useful.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Jeff Cody <jcody@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Max Reitz <mreitz@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Cc: John Snow <jsnow@redhat.com>
Reported-by: Huaitong Han <huanhuaitong@didichuxing.com>
Signed-off-by: Huaitong Han <huanhuaitong@didichuxing.com>
Signed-off-by: Liang Li <liliangleo@didichuxing.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-19 12:01:39 +01:00
Kevin Wolf
1cfeaf386e vpc: Require aligned size in .bdrv_co_create
Perform the rounding to match a CHS geometry only in the legacy code
path in .bdrv_co_create_opts. QMP now requires that the user already
passes a CHS aligned image size, unless force-size=true is given.

CHS alignment is required to make the image compatible with Virtual PC,
but not for use with newer Microsoft hypervisors.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-03-19 12:01:39 +01:00
Kevin Wolf
182c883550 vpc: Support .bdrv_co_create
This adds the .bdrv_co_create driver callback to vpc, which
enables image creation over QMP.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-03-19 12:01:39 +01:00
Kevin Wolf
09b68dab5a vhdx: Support .bdrv_co_create
This adds the .bdrv_co_create driver callback to vhdx, which
enables image creation over QMP.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-03-19 12:01:39 +01:00
Kevin Wolf
da23248f27 vdi: Make comments consistent with other drivers
This makes the .bdrv_co_create(_opts) implementation of vdi look more
like the other recently converted block drivers.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-03-19 12:01:39 +01:00
Kevin Wolf
959355a476 qed: Support .bdrv_co_create
This adds the .bdrv_co_create driver callback to qed, which
enables image creation over QMP.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-03-19 12:01:39 +01:00
Kevin Wolf
42a3e1ab36 qcow: Support .bdrv_co_create
This adds the .bdrv_co_create driver callback to qcow, which
enables image creation over QMP.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
2018-03-19 12:01:39 +01:00
Kevin Wolf
1511b49040 parallels: Support .bdrv_co_create
This adds the .bdrv_co_create driver callback to parallels, which
enables image creation over QMP.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
2018-03-19 12:01:39 +01:00
Max Reitz
e38105748f vdi: Implement .bdrv_co_create
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-19 12:01:39 +01:00
Max Reitz
ec73f06071 vdi: Move file creation to vdi_co_create_opts
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-19 12:01:39 +01:00
Max Reitz
49858b5098 vdi: Pull option parsing from vdi_co_create
In preparation of QAPI-fying VDI image creation, we have to create a
BlockdevCreateOptionsVdi type which is received by a (future)
vdi_co_create().

vdi_co_create_opts() now converts the QemuOpts object into such a
BlockdevCreateOptionsVdi object.  The protocol-layer file is still
created in vdi_co_do_create() (and BlockdevCreateOptionsVdi.file is set
to an empty string), but that will be addressed by a follow-up patch.

Note that cluster-size is not part of the QAPI schema because it is not
supported by default.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-19 12:01:39 +01:00
Kevin Wolf
3d7ed9c453 luks: Catch integer overflow for huge sizes
When you request an image size close to UINT64_MAX, the addition of the
crypto header may cause an integer overflow. Catch it instead of
silently truncating the image size.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2018-03-19 12:01:24 +01:00
Kevin Wolf
e39e959e89 luks: Turn invalid assertion into check
The .bdrv_getlength implementation of the crypto block driver asserted
that the payload offset isn't after EOF. This is an invalid assertion to
make as the image file could be corrupted. Instead, check it and return
-EIO if the file is too small for the payload offset.

Zero length images are fine, so trigger -EIO only on offset > len, not
on offset >= len as the assertion did before.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2018-03-19 12:01:24 +01:00
Kevin Wolf
1bedcaf120 luks: Support .bdrv_co_create
This adds the .bdrv_co_create driver callback to luks, which enables
image creation over QMP.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2018-03-19 12:01:24 +01:00
Kevin Wolf
1ec4f4160a luks: Create block_crypto_co_create_generic()
Everything that refers to the protocol layer or QemuOpts is moved out of
block_crypto_create_generic(), so that the remaining function is
suitable to be called by a .bdrv_co_create implementation.

LUKS is the only driver that actually implements the old interface, and
we don't intend to use it in any new drivers, so put the moved out code
directly into a LUKS function rather than creating a generic
intermediate one.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2018-03-19 12:01:24 +01:00
Kevin Wolf
3b5a1f6aa4 luks: Separate image file creation from formatting
The crypto driver used to create the image file in a callback from the
crypto subsystem. If we want to implement .bdrv_co_create, this needs to
go away because that callback will get a reference to an already
existing block node.

Move the image file creation to block_crypto_create_generic().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2018-03-19 12:01:24 +01:00
John Snow
11b61fbc0d blockjobs: add block-job-finalize
Instead of automatically transitioning from PENDING to CONCLUDED, gate
the .prepare() and .commit() phases behind an explicit acknowledgement
provided by the QMP monitor if auto_finalize = false has been requested.

This allows us to perform graph changes in prepare and/or commit so that
graph changes do not occur autonomously without knowledge of the
controlling management layer.

Transactions that have reached the "PENDING" state together can all be
moved to invoke their finalization methods by issuing block_job_finalize
to any one job in the transaction.

Jobs in a transaction with mixed job->auto_finalize settings will all
remain stuck in the "PENDING" state, as if the entire transaction was
specified with auto_finalize = false. Jobs that specified
auto_finalize = true, however, will still not emit the PENDING event.

Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-19 12:01:24 +01:00
John Snow
35d6b368f2 blockjobs: ensure abort is called for cancelled jobs
Presently, even if a job is canceled post-completion as a result of
a failing peer in a transaction, it will still call .commit because
nothing has updated or changed its return code.

The reason why this does not cause problems currently is because
backup's implementation of .commit checks for cancellation itself.

I'd like to simplify this contract:

(1) Abort is called if the job/transaction fails
(2) Commit is called if the job/transaction succeeds

To this end: A job's return code, if 0, will be forcibly set as
-ECANCELED if that job has already concluded. Remove the now
redundant check in the backup job implementation.

We need to check for cancellation in both block_job_completed
AND block_job_completed_single, because jobs may be cancelled between
those two calls; for instance in transactions. This also necessitates
an ABORTING -> ABORTING transition to be allowed.

The check in block_job_completed could be removed, but there's no
point in starting to attempt to succeed a transaction that we know
in advance will fail.

This does NOT affect mirror jobs that are "canceled" during their
synchronous phase. The mirror job itself forcibly sets the canceled
property to false prior to ceding control, so such cases will invoke
the "commit" callback.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-19 12:01:24 +01:00
John Snow
75f710599f blockjobs: add block_job_dismiss
For jobs that have reached their CONCLUDED state, prior to having their
last reference put down (meaning jobs that have completed successfully,
unsuccessfully, or have been canceled), allow the user to dismiss the
job's lingering status report via block-job-dismiss.

This gives management APIs the chance to conclusively determine if a job
failed or succeeded, even if the event broadcast was missed.

Note: block_job_do_dismiss and block_job_decommission happen to do
exactly the same thing, but they're called from different semantic
contexts, so both aliases are kept to improve readability.

Note 2: Don't worry about the 0x04 flag definition for AUTO_DISMISS, she
has a friend coming in a future patch to fill the hole where 0x02 is.

Verbs:
Dismiss: operates on CONCLUDED jobs only.
Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-19 12:01:24 +01:00
John Snow
0ec4dfb8d6 blockjobs: add block_job_verb permission table
Which commands ("verbs") are appropriate for jobs in which state is
also somewhat burdensome to keep track of.

As of this commit, it looks rather useless, but begins to look more
interesting the more states we add to the STM table.

A recurring theme is that no verb will apply to an 'undefined' job.

Further, it's not presently possible to restrict the "pause" or "resume"
verbs any more than they are in this commit because of the asynchronous
nature of how jobs enter the PAUSED state; justifications for some
seemingly erroneous applications are given below.

=====
Verbs
=====

Cancel:    Any state except undefined.
Pause:     Any state except undefined;
           'created': Requests that the job pauses as it starts.
           'running': Normal usage. (PAUSED)
           'paused':  The job may be paused for internal reasons,
                      but the user may wish to force an indefinite
                      user-pause, so this is allowed.
           'ready':   Normal usage. (STANDBY)
           'standby': Same logic as above.
Resume:    Any state except undefined;
           'created': Will lift a user's pause-on-start request.
           'running': Will lift a pause request before it takes effect.
           'paused':  Normal usage.
           'ready':   Will lift a pause request before it takes effect.
           'standby': Normal usage.
Set-speed: Any state except undefined, though ready may not be meaningful.
Complete:  Only a 'ready' job may accept a complete request.

=======
Changes
=======

(1)

To facilitate "nice" error checking, all five major block-job verb
interfaces in blockjob.c now support an errp parameter:

- block_job_user_cancel is added as a new interface.
- block_job_user_pause gains an errp paramter
- block_job_user_resume gains an errp parameter
- block_job_set_speed already had an errp parameter.
- block_job_complete already had an errp parameter.

(2)

block-job-pause and block-job-resume will no longer no-op when trying
to pause an already paused job, or trying to resume a job that isn't
paused. These functions will now report that they did not perform the
action requested because it was not possible.

iotests have been adjusted to address this new behavior.

(3)

block-job-complete doesn't worry about checking !block_job_started,
because the permission table guards against this.

(4)

test-bdrv-drain's job implementation needs to announce that it is
'ready' now, in order to be completed.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-19 12:01:24 +01:00
John Snow
c9de40505f blockjobs: add state transition table
The state transition table has mostly been implied. We're about to make
it a bit more complex, so let's make the STM explicit instead.

Perform state transitions with a function that for now just asserts the
transition is appropriate.

Transitions:
Undefined -> Created: During job initialization.
Created   -> Running: Once the job is started.
                      Jobs cannot transition from "Created" to "Paused"
                      directly, but will instead synchronously transition
                      to running to paused immediately.
Running   -> Paused:  Normal workflow for pauses.
Running   -> Ready:   Normal workflow for jobs reaching their sync point.
                      (e.g. mirror)
Ready     -> Standby: Normal workflow for pausing ready jobs.
Paused    -> Running: Normal resume.
Standby   -> Ready:   Resume of a Standby job.

+---------+
|UNDEFINED|
+--+------+
   |
+--v----+
|CREATED|
+--+----+
   |
+--v----+     +------+
|RUNNING<----->PAUSED|
+--+----+     +------+
   |
+--v--+       +-------+
|READY<------->STANDBY|
+-----+       +-------+

Notably, there is no state presently defined as of this commit that
deals with a job after the "running" or "ready" states, so this table
will be adjusted alongside the commits that introduce those states.

Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-19 12:01:24 +01:00