qemu-e2k

Commit Graph

Author	SHA1	Message	Date
Kevin Wolf	cf3129323f	block-backend: Queue requests while drained This fixes devices like IDE that can still start new requests from I/O handlers in the CPU thread while the block backend is drained. The basic assumption is that in a drain section, no new requests should be allowed through a BlockBackend (blk_drained_begin/end don't exist, we get drain sections only on the node level). However, there are two special cases where requests should not be queued: 1. Block jobs: We already make sure that block jobs are paused in a drain section, so they won't start new requests. However, if the drain_begin is called on the job's BlockBackend first, it can happen that we deadlock because the job stays busy until it reaches a pause point - which it can't if its requests aren't processed any more. The proper solution here would be to make all requests through the job's filter node instead of using a BlockBackend. For now, just disabling request queuing on the job BlockBackend is simpler. 2. In test cases where making requests through bdrv_* would be cumbersome because we'd need a BdrvChild. As we already got the functionality to disable request queuing from 1., use it in tests, too, for convenience. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2019-08-16 10:25:16 +02:00
Kevin Wolf	d2da5e288a	mirror: Keep mirror_top_bs drained after dropping permissions mirror_top_bs is currently implicitly drained through its connection to the source or the target node. However, the drain section for target_bs ends early after moving mirror_top_bs from src to target_bs, so that requests can already be restarted while mirror_top_bs is still present in the chain, but has dropped all permissions and therefore runs into an assertion failure like this: qemu-system-x86_64: block/io.c:1634: bdrv_co_write_req_prepare: Assertion `child->perm & BLK_PERM_WRITE' failed. Keep mirror_top_bs drained until all graph changes have completed. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2019-08-16 10:25:16 +02:00
Max Reitz	9adc1cb49a	mirror: Only mirror granularity-aligned chunks In write-blocking mode, all writes to the top node directly go to the target. We must only mirror chunks of data that are aligned to the job's granularity, because that is how the dirty bitmap works. Therefore, the request alignment for writes must be the job's granularity (in write-blocking mode). Unfortunately, this forces all reads and writes to have the same granularity (we only need this alignment for writes to the target, not the source), but that is something to be fixed another time. Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190805153308.2657-1-mreitz@redhat.com Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Fixes: `d06107ade0` Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-08-06 13:17:25 +02:00
Max Reitz	e5182c1c57	block: Add BDS.never_freeze The commit and the mirror block job must be able to drop their filter node at any point. However, this will not be possible if any of the BdrvChild links to them is frozen. Therefore, we need to prevent them from ever becoming frozen. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20190703172813.6868-2-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-07-15 15:48:40 +02:00
Andrey Shinkevich	170d3bd341	block: include base when checking image chain for block allocation This patch is used in the 'block/stream: introduce a bottom node' that is following. Instead of the base node, the caller may pass the node that has the base as its backing image to the function bdrv_is_allocated_above() with a new parameter include_base = true and get rid of the dependency on the base that may change during commit/stream parallel jobs. Now, if the specified base is not found in the backing image chain, the QEMU will abort. Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 1559152576-281803-2-git-send-email-andrey.shinkevich@virtuozzo.com [mreitz: Squashed in the following as a rebase on conflicting patches:] Message-id: e3cf99ae-62e9-8b6e-5a06-d3c8b9363b85@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-07-02 03:53:04 +02:00
Max Reitz	f94dc3b414	block/mirror: Fix child permissions We cannot use bdrv_child_try_set_perm() to give up all restrictions on the child edge, and still have bdrv_mirror_top_child_perm() request BLK_PERM_WRITE. Fix this by making bdrv_mirror_top_child_perm() return 0/BLK_PERM_ALL when we want to give up all permissions, and replacing bdrv_child_try_set_perm() by bdrv_child_refresh_perms(). The bdrv_child_try_set_perm() before removing the node with bdrv_replace_node() is then unnecessary. No permissions have changed since the previous invocation of bdrv_child_try_set_perm(). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-06-18 16:41:10 +02:00
Vladimir Sementsov-Ogievskiy	cc19f1773d	block/replication: drop usage of bs->job We are going to remove bs->job pointer. Drop it's usage in replication code. Additionally we have to return job pointer from some mirror APIs. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-06-18 16:41:09 +02:00
Kevin Wolf	d0ee0204f4	block: Remove wrong bdrv_set_aio_context() calls The mirror and commit block jobs use bdrv_set_aio_context() to move their filter node into the right AioContext before hooking it up in the graph. Similarly, bdrv_open_backing_file() explicitly moves the backing file node into the right AioContext first. This isn't necessary any more, they get automatically moved into the right context now when attaching them. However, in the case of bdrv_open_backing_file() with a node reference, it's actually not only unnecessary, but even wrong: The unchecked bdrv_set_aio_context() changes the AioContext of the child node even if other parents require it to retain the old context. So this is not only a simplification, but a bug fix, too. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1684342 Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-06-04 15:22:22 +02:00
Kevin Wolf	d861ab3acf	block: Add BlockBackend.ctx This adds a new parameter to blk_new() which requires its callers to declare from which AioContext this BlockBackend is going to be used (or the locks of which AioContext need to be taken anyway). The given context is only stored and kept up to date when changing AioContexts. Actually applying the stored AioContext to the root node is saved for another commit. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-06-04 15:22:22 +02:00
Kevin Wolf	9ff7f0df87	blockjob: Propagate AioContext change to all job nodes Block jobs require that all of the nodes the job is using are in the same AioContext. Therefore all BdrvChild objects of the job propagate .(can_)set_aio_context to all other job nodes, so that the switch is checked and performed consistently even if both nodes are in different subtrees. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-05-20 17:08:56 +02:00
Kevin Wolf	80f5c33ff3	block: Advertise BDRV_REQ_NO_FALLBACK in filter drivers Filter drivers that support .bdrv_co_pwrite_zeroes can safely advertise BDRV_REQ_NO_FALLBACK because they just forward the request flags to their child node. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Eric Blake <eblake@redhat.com>	2019-03-26 11:37:51 +01:00
Sergio Lopez	5e771752a1	mirror: Confirm we're quiesced only if the job is paused or cancelled While child_job_drained_begin() calls to job_pause(), the job doesn't actually transition between states until it runs again and reaches a pause point. This means bdrv_drained_begin() may return with some jobs using the node still having 'busy == true'. As a consequence, block_job_detach_aio_context() may get into a deadlock, waiting for the job to be actually paused, while the coroutine servicing the job is yielding and doesn't get the opportunity to get scheduled again. This situation can be reproduced by issuing a 'block-commit' immediately followed by a 'device_del'. To ensure bdrv_drained_begin() only returns when the jobs have been paused, we change mirror_drained_poll() to only confirm it's quiesced when job->paused == true and there aren't any in-flight requests, except if we reached that point by a drained section initiated by the mirror/commit job itself. The other block jobs shouldn't need any changes, as the default drained_poll() behavior is to only confirm it's quiesced if the job is not busy or completed. Signed-off-by: Sergio Lopez <slp@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-19 15:49:29 +01:00
Alberto Garcia	ef53dc09ed	block: Freeze the backing chain for the duration of the mirror job Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-12 20:30:14 +01:00
Max Reitz	998b3a1e5a	block: Purify .bdrv_refresh_filename() Currently, BlockDriver.bdrv_refresh_filename() is supposed to both refresh the filename (BDS.exact_filename) and set BDS.full_open_options. Now that we have generic code in the central bdrv_refresh_filename() for creating BDS.full_open_options, we can drop the latter part from all BlockDriver.bdrv_refresh_filename() implementations. This also means that we can drop all of the existing default code for this from the global bdrv_refresh_filename() itself. Furthermore, we now have to call BlockDriver.bdrv_refresh_filename() after having set BDS.full_open_options, because the block driver's implementation should now be allowed to depend on BDS.full_open_options being set correctly. Finally, with this patch we can drop the @options parameter from BlockDriver.bdrv_refresh_filename(); also, add a comment on this function's purpose in block/block_int.h while touching its interface. This completely obsoletes blklogwrite's implementation of .bdrv_refresh_filename(). Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190201192935.18394-25-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-02-25 15:11:27 +01:00
Max Reitz	e24518e303	block: Use children list in bdrv_refresh_filename bdrv_refresh_filename() should invoke itself recursively on all children, not just on file. With that change, we can remove the manual invocations in blkverify, quorum, commit, mirror, and blklogwrites. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20190201192935.18394-3-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-02-25 15:11:25 +01:00
Alberto Garcia	67b24427fe	mirror: Block the source BlockDriverState in mirror_start_job() The mirror_start_job() function used for the commit-active job blocks the source, target and all intermediate nodes for the duration of the job. target <- intermediate <- source Since `4ef85a9c23` this function creates a dummy mirror_top_bs that goes on top of the source node, and it is this dummy node that gets blocked instead. The source node is never blocked or added to the job's list of nodes. target <- intermediate <- source <- mirror_top At the moment I don't think it is possible to exploit this problem because any additional job on 'source' would either be forbidden for other reasons or it would need to involve an additional node that is blocked, causing an error. This can be seen in the error messages, however, because they never refer to the source node being blocked: $ qemu-img create -f qcow2 hd0.qcow2 1M $ qemu-img create -f qcow2 -b hd0.qcow2 hd1.qcow2 $ qemu-io -c 'write 0 1M' hd0.qcow2 $ $QEMU -drive if=none,file=hd1.qcow2,node-name=hd1 { "execute": "qmp_capabilities" } { "execute": "block-commit", "arguments": {"device": "hd1", "speed": 256}} { "execute": "block-stream", "arguments": {"device": "hd1"}} { "error": {"class": "GenericError", "desc": "Node 'hd0' is busy: block device is in use by block job: commit"}} After this patch the error message refers to 'hd1', as it should. The expected output of iotest 141 also needs to be updated for the same reason. Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-02-01 13:46:44 +01:00
Alberto Garcia	e917e2cb2a	mirror: Release the dirty bitmap if mirror_start_job() fails At the moment I don't see how to make this function fail after the dirty bitmap has been created, but if that was possible then we would hit the assert(QLIST_EMPTY(&bs->dirty_bitmaps)) in bdrv_close(). Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-02-01 13:46:44 +01:00
Vladimir Sementsov-Ogievskiy	1eaf1b0fdf	block/mirror: fix and improve do_sync_target_write Use bdrv_dirty_bitmap_next_dirty_area() instead of bdrv_dirty_iter_next_area(), because of the following problems of bdrv_dirty_iter_next_area(): 1. Using HBitmap iterators we should carefully handle unaligned offset, as first call to hbitmap_iter_next() may return a value less than original offset (actually, it will be original offset rounded down to bitmap granularity). This handling is not done in do_sync_target_write(). 2. bdrv_dirty_iter_next_area() handles unaligned max_offset incorrectly: look at the code: if (max_offset == iter->bitmap->size) { /* If max_offset points to the image end, round it up by the * bitmap granularity */ gran_max_offset = ROUND_UP(max_offset, granularity); } else { gran_max_offset = max_offset; } ret = hbitmap_iter_next(&iter->hbi, false); if (ret < 0 \|\| ret + granularity > gran_max_offset) { return false; } and assume that max_offset != iter->bitmap->size but still unaligned. if 0 < ret < max_offset we found dirty area, but the function can return false in this case (if ret + granularity > max_offset). 3. bdrv_dirty_iter_next_area() uses inefficient loop to find the end of the dirty area. Let's use more efficient hbitmap_next_zero instead (bdrv_dirty_bitmap_next_dirty_area() do so) Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2019-01-15 18:26:50 -05:00
Paolo Bonzini	b58deb344d	qemu/queue.h: leave head structs anonymous unless necessary Most list head structs need not be given a name. In most cases the name is given just in case one is going to use QTAILQ_LAST, QTAILQ_PREV or reverse iteration, but this does not apply to lists of other kinds, and even for QTAILQ in practice this is only rarely needed. In addition, we will soon reimplement those macros completely so that they do not need a name for the head struct. So clean up everything, not giving a name except in the rare case where it is necessary. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-01-11 15:46:55 +01:00
Stefan Hajnoczi	537c3d4f64	block/mirror: add missing coroutine_fn annotations Marking a function coroutine_fn currently has no effect on the compiler, but it documents that this function must be called from coroutine context and it may yield. This is important information for the programmer. Also, if we ever transition to a stackless coroutine implementation, then it's likely that the annotation will become mandatory so the compiler can use the correct calling convention for coroutine functions. Cc: Max Reitz <mreitz@redhat.com> Cc: John Snow <jsnow@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-12-14 11:55:02 +01:00
Alberto Garcia	1ba7938895	block: Use bdrv_reopen_set_read_only() in the mirror driver The 'block-commit' QMP command is implemented internally using two different drivers. If the source image is the active layer then the mirror driver is used (commit_active_start()), otherwise the commit driver is used (commit_start()). In both cases the destination image must be put temporarily in read-write mode. This is done correctly in the latter case, but what commit_active_start() does is copy all flags instead. This patch replaces the bdrv_reopen() calls in that function with bdrv_reopen_set_read_only() so that only the read-only status is changed. A similar change is made in mirror_exit(), which is also used by the 'drive-mirror' and 'blockdev-mirror' commands. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-12-14 11:55:02 +01:00
Vladimir Sementsov-Ogievskiy	d12ade5732	mirror: fix dead-lock Let start from the beginning: Commit `b9e413dd37` (in 2.9) "block: explicitly acquire aiocontext in aio callbacks that need it" added pairs of aio_context_acquire/release to mirror_write_complete and mirror_read_complete, when they were aio callbacks for blk_aio_* calls. Then, commit `2e1990b26e` (in 3.0) "block/mirror: Convert to coroutines" dropped these blk_aio_* calls, than mirror_write_complete and mirror_read_complete are not callbacks more, and don't need additional aiocontext acquiring. Furthermore, mirror_read_complete calls blk_co_pwritev inside these pair of aio_context_acquire/release, which leads to the following dead-lock with mirror: (gdb) info thr Id Target Id Frame 3 Thread (LWP 145412) "qemu-system-x86" syscall () 2 Thread (LWP 145416) "qemu-system-x86" __lll_lock_wait () * 1 Thread (LWP 145411) "qemu-system-x86" __lll_lock_wait () (gdb) bt #0 __lll_lock_wait () #1 _L_lock_812 () #2 __GI___pthread_mutex_lock #3 qemu_mutex_lock_impl (mutex=0x561032dce420 <qemu_global_mutex>, file=0x5610327d8654 "util/main-loop.c", line=236) at util/qemu-thread-posix.c:66 #4 qemu_mutex_lock_iothread_impl #5 os_host_main_loop_wait (timeout=480116000) at util/main-loop.c:236 #6 main_loop_wait (nonblocking=0) at util/main-loop.c:497 #7 main_loop () at vl.c:1892 #8 main Printing contents of qemu_global_mutex, I see that "__owner = 145416", so, thr1 is main loop, and now it wants BQL, which is owned by thr2. (gdb) thr 2 (gdb) bt #0 __lll_lock_wait () #1 _L_lock_870 () #2 __GI___pthread_mutex_lock #3 qemu_mutex_lock_impl (mutex=0x561034d25dc0, ... #4 aio_context_acquire (ctx=0x561034d25d60) #5 dma_blk_cb #6 dma_blk_io #7 dma_blk_read #8 ide_dma_cb #9 bmdma_cmd_writeb #10 bmdma_write #11 memory_region_write_accessor #12 access_with_adjusted_size #15 flatview_write #16 address_space_write #17 address_space_rw #18 kvm_handle_io #19 kvm_cpu_exec #20 qemu_kvm_cpu_thread_fn #21 qemu_thread_start #22 start_thread #23 clone () Printing mutex in fr 2, I see "__owner = 145411", so thr2 wants aio context mutex, which is owned by thr1. Classic dead-lock. Then, let's check that aio context is hold by mirror coroutine: just print coroutine stack of first tracked request in mirror job target: (gdb) [...] (gdb) qemu coroutine 0x561035dd0860 #0 qemu_coroutine_switch #1 qemu_coroutine_yield #2 qemu_co_mutex_lock_slowpath #3 qemu_co_mutex_lock #4 qcow2_co_pwritev #5 bdrv_driver_pwritev #6 bdrv_aligned_pwritev #7 bdrv_co_pwritev #8 blk_co_pwritev #9 mirror_read_complete () at block/mirror.c:232 #10 mirror_co_read () at block/mirror.c:370 #11 coroutine_trampoline #12 __start_context Yes it is mirror_read_complete calling blk_co_pwritev after acquiring aio context. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-12-03 16:50:58 +01:00
John Snow	737efc1eda	block/mirror: conservative mirror_exit refactor For purposes of minimum code movement, refactor the mirror_exit callback to use the post-finalization callbacks in a trivial way. Signed-off-by: John Snow <jsnow@redhat.com> Message-id: 20180906130225.5118-7-jsnow@redhat.com Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> [mreitz: Added comment for the mirror_exit() function] Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-09-25 15:31:15 +02:00
John Snow	c2924ceaa7	block/mirror: don't install backing chain on abort In cases where we abort the block/mirror job, there's no point in installing the new backing chain before we finish aborting. Signed-off-by: John Snow <jsnow@redhat.com> Message-id: 20180906130225.5118-6-jsnow@redhat.com Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-09-25 15:31:15 +02:00
John Snow	a1999b3348	block/mirror: add block job creation flags Add support for taking and passing forward job creation flags. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 20180906130225.5118-3-jsnow@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-09-25 15:31:15 +02:00
John Snow	7b508f6b7a	block/mirror: utilize job_exit shim Change the manual deferment to mirror_exit into the implicit callback to job_exit and the mirror_exit callback. This does change the order of some bdrv_unref calls and job_completed, but thanks to the new context in which we call .exit, this is safe to defer the possible flushing of any nodes to the job_finalize_single cleanup stage. Signed-off-by: John Snow <jsnow@redhat.com> Message-id: 20180830015734.19765-6-jsnow@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-08-31 16:28:33 +02:00
John Snow	3d1f8b07a4	jobs: canonize Error object Jobs presently use both an Error object in the case of the create job, and char strings in the case of generic errors elsewhere. Unify the two paths as just j->err, and remove the extra argument from job_completed. The integer error code for job_completed is kept for now, to be removed shortly in a separate patch. Signed-off-by: John Snow <jsnow@redhat.com> Message-id: 20180830015734.19765-3-jsnow@redhat.com [mreitz: Dropped a superfluous g_strdup()] Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-08-31 16:28:33 +02:00
John Snow	f67432a201	jobs: change start callback to run callback Presently we codify the entry point for a job as the "start" callback, but a more apt name would be "run" to clarify the idea that when this function returns we consider the job to have "finished," except for any cleanup which occurs in separate callbacks later. As part of this clarification, change the signature to include an error object and a return code. The error ptr is not yet used, and the return code while captured, will be overwritten by actions in the job_completed function. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20180830015734.19765-2-jsnow@redhat.com Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-08-31 16:28:33 +02:00
Vladimir Sementsov-Ogievskiy	f66b1f0e27	block: drop empty .bdrv_close handlers .bdrv_close handler is optional after previous commit, no needs to keep empty functions more. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-08-15 12:50:39 +02:00
Kevin Wolf	86fae10c64	mirror: Fail gracefully for source == target blockdev-mirror with the same node for source and target segfaults today: A node is in its own backing chain, so mirror_start_job() decides that this is an active commit. When adding the intermediate nodes with block_job_add_bdrv(), it starts the iteration through the subchain with the backing file of source, though, so it never reaches target and instead runs into NULL at the base. While we could fix that by starting with source itself, there is no point in allowing mirroring a node into itself and I wouldn't be surprised if this caused more problems later. So just check for this scenario and error out. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2018-08-15 12:50:39 +02:00
Fam Zheng	0b9fd3f467	block: Use BdrvChild to discard Other I/O functions are already using a BdrvChild pointer in the API, so make discard do the same. It makes it possible to initiate the same permission checks before doing I/O, and much easier to share the helper functions for this, which will be added and used by write, truncate and copy range paths. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-07-10 16:01:52 +02:00
Max Reitz	481debaa32	block/mirror: Add copy mode QAPI interface This patch allows the user to specify whether to use active or only background mode for mirror block jobs. Currently, this setting will remain constant for the duration of the entire block job. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20180613181823.13618-14-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-06-18 17:05:16 +02:00
Max Reitz	d06107ade0	block/mirror: Add active mirroring This patch implements active synchronous mirroring. In active mode, the passive mechanism will still be in place and is used to copy all initially dirty clusters off the source disk; but every write request will write data both to the source and the target disk, so the source cannot be dirtied faster than data is mirrored to the target. Also, once the block job has converged (BLOCK_JOB_READY sent), source and target are guaranteed to stay in sync (unless an error occurs). Active mode is completely optional and currently disabled at runtime. A later patch will add a way for users to enable it. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20180613181823.13618-13-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-06-18 17:05:15 +02:00
Max Reitz	429076e88d	block/mirror: Add MirrorBDSOpaque This will allow us to access the block job data when the mirror block driver becomes more complex. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20180613181823.13618-11-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-06-18 17:04:59 +02:00
Max Reitz	138f9fffb8	block/mirror: Use source as a BdrvChild With this, the mirror_top_bs is no longer just a technically required node in the BDS graph but actually represents the block job operation. Also, drop MirrorBlockJob.source, as we can reach it through mirror_top_bs->backing. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20180613181823.13618-6-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-06-18 17:04:54 +02:00
Max Reitz	1181e19a6d	block/mirror: Wait for in-flight op conflicts This patch makes the mirror code differentiate between simply waiting for any operation to complete (mirror_wait_for_free_in_flight_slot()) and specifically waiting for all operations touching a certain range of the virtual disk to complete (mirror_wait_on_conflicts()). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20180613181823.13618-5-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-06-18 17:04:53 +02:00
Max Reitz	12aa40822d	block/mirror: Use CoQueue to wait on in-flight ops Attach a CoQueue to each in-flight operation so if we need to wait for any we can use it to wait instead of just blindly yielding and hoping for some operation to wake us. A later patch will use this infrastructure to allow requests accessing the same area of the virtual disk to specifically wait for each other. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20180613181823.13618-4-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-06-18 17:04:52 +02:00
Max Reitz	2e1990b26e	block/mirror: Convert to coroutines In order to talk to the source BDS (and maybe in the future to the target BDS as well) directly, we need to convert our existing AIO requests into coroutine I/O requests. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20180613181823.13618-3-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-06-18 17:04:48 +02:00
Max Reitz	4295c5fc61	block/mirror: Pull out mirror_perform() When converting mirror's I/O to coroutines, we are going to need a point where these coroutines are created. mirror_perform() is going to be that point. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20180613181823.13618-2-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-06-18 17:04:43 +02:00
Kevin Wolf	89bd030533	block: Really pause block jobs on drain We already requested that block jobs be paused in .bdrv_drained_begin, but no guarantee was made that the job was actually inactive at the point where bdrv_drained_begin() returned. This introduces a new callback BdrvChildRole.bdrv_drained_poll() and uses it to make bdrv_drain_poll() consider block jobs using the node to be drained. For the test case to work as expected, we have to switch from block_job_sleep_ns() to qemu_co_sleep_ns() so that the test job is even considered active and must be waited for when draining the node. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-06-18 15:03:25 +02:00
Kevin Wolf	1266c9b9f5	job: Add error message for failing jobs So far we relied on job->ret and strerror() to produce an error message for failed jobs. Not surprisingly, this tends to result in completely useless messages. This adds a Job.error field that can contain an error string for a failing job, and a parameter to job_completed() that sets the field. As a default, if NULL is passed, we continue to use strerror(job->ret). All existing callers are changed to pass NULL. They can be improved in separate patches. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com>	2018-05-30 13:31:01 +02:00
Kevin Wolf	30a5c887bf	job: Move progress fields to Job BlockJob has fields .offset and .len, which are actually misnomers today because they are no longer tied to block device sizes, but just progress counters. As such they make a lot of sense in generic Jobs. This patch moves the fields to Job and renames them to .progress_current and .progress_total to describe their function better. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2018-05-23 14:30:51 +02:00
Kevin Wolf	2e1795b581	job: Add job_transition_to_ready() The transition to the READY state was still performed in the BlockJob layer, in the same function that sent the BLOCK_JOB_READY QMP event. This patch brings the state transition to the Job layer and implements the QMP event using a notifier called from the Job layer, like we already do for other events related to state transitions. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2018-05-23 14:30:51 +02:00
Kevin Wolf	198c49cc8d	job: Add job_yield() This moves block_job_yield() to the Job layer. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2018-05-23 14:30:51 +02:00
Kevin Wolf	3d70ff53b6	job: Move completion and cancellation to Job This moves the top-level job completion and cancellation functions from BlockJob to Job. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-05-23 14:30:51 +02:00
Kevin Wolf	3453d97243	job: Move .complete callback to Job This moves the .complete callback that tells a READY job to complete from BlockJobDriver to JobDriver. The wrapper function job_complete() doesn't require anything block job specific any more and can be moved to Job. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2018-05-23 14:30:50 +02:00
Kevin Wolf	b69f777dd9	job: Add job_drain() block_job_drain() contains a blk_drain() call which cannot be moved to Job, so add a new JobDriver callback JobDriver.drain which has a common implementation for all BlockJobs. In addition to this we keep the existing BlockJobDriver.drain callback that is called by the common drain implementation for all block jobs. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2018-05-23 14:30:50 +02:00
Kevin Wolf	004e95df98	job: Convert block_job_cancel_async() to Job block_job_cancel_async() did two things that were still block job specific: * Setting job->force. This field makes sense on the Job level, so we can just move it. While at it, rename it to job->force_cancel to make its purpose more obvious. * Resetting the I/O status. This can't be moved because generic Jobs don't have an I/O status. What the function really implements is a user resume, except without entering the coroutine. Consequently, it makes sense to call the .user_resume driver callback here which already resets the I/O status. The old block_job_cancel_async() has two separate if statements that check job->iostatus != BLOCK_DEVICE_IO_STATUS_OK and job->user_paused. However, the former condition always implies the latter (as is asserted in block_job_iostatus_reset()), so changing the explicit call of block_job_iostatus_reset() on the former condition with the .user_resume callback on the latter condition is equivalent and doesn't need to access any BlockJob specific state. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2018-05-23 14:30:50 +02:00
Kevin Wolf	4ad351819b	job: Move single job finalisation to Job This moves the finalisation of a single job from BlockJob to Job. Some part of this code depends on job transactions, and job transactions call this code, we introduce some temporary calls from Job functions to BlockJob ones. This will be fixed once transactions move to Job, too. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2018-05-23 14:30:50 +02:00
Kevin Wolf	bb02b65c7d	job: Move BlockJobCreateFlags to Job This renames the BlockJobCreateFlags constants, moves a few JOB_INTERNAL checks to job_create() and the auto_{finalize,dismiss} fields from BlockJob to Job. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2018-05-23 14:30:50 +02:00
Kevin Wolf	b15de82867	job: Move pause/resume functions to Job While we already moved the state related to job pausing to Job, the functions to do were still BlockJob only. This commit moves them over to Job. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>	2018-05-23 14:30:50 +02:00
Kevin Wolf	5d43e86e11	job: Add job_sleep_ns() There is nothing block layer specific about block_job_sleep_ns(), so move the function to Job. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2018-05-23 14:30:50 +02:00
Kevin Wolf	da01ff7f38	job: Move coroutine and related code to Job This commit moves some core functions for dealing with the job coroutine from BlockJob to Job. This includes primarily entering the coroutine (both for the first and reentering) and yielding explicitly and at pause points. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>	2018-05-23 14:30:50 +02:00
Kevin Wolf	1908a5590c	job: Move defer_to_main_loop to Job Move the defer_to_main_loop functionality from BlockJob to Job. The code can be simplified because we can use job->aio_context in job_defer_to_main_loop_bh() now, instead of having to access the BlockDriverState. Probably taking the data->aio_context lock in addition was already unnecessary in the old code because we didn't actually make use of anything protected by the old AioContext except getting the new AioContext, in case it changed between scheduling the BH and running it. But it's certainly unnecessary now that the BDS isn't accessed at all any more. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>	2018-05-23 14:30:50 +02:00
Kevin Wolf	daa7f2f946	job: Move cancelled to Job We cannot yet move the whole logic around job cancelling to Job because it depends on quite a few other things that are still only in BlockJob, but we can move the cancelled field at least. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>	2018-05-23 14:30:49 +02:00
Kevin Wolf	80fa2c756b	job: Add reference counting This moves reference counting from BlockJob to Job. In order to keep calling the BlockJob cleanup code when the job is deleted via job_unref(), introduce a new JobDriver.free callback. Every block job must use block_job_free() for this callback, this is asserted in block_job_create(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>	2018-05-23 14:30:49 +02:00
Kevin Wolf	252291eaea	job: Add JobDriver.job_type This moves the job_type field from BlockJobDriver to JobDriver. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>	2018-05-23 14:30:49 +02:00
Kevin Wolf	8e4c87000f	job: Rename BlockJobType into JobType QAPI types aren't externally visible, so we can rename them without causing problems. Before we add a job type to Job, rename the enum so it can be used for more than just block jobs. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>	2018-05-23 14:30:49 +02:00
Kevin Wolf	33e9e9bd62	job: Create Job, JobDriver and job_create() This is the first step towards creating an infrastructure for generic background jobs that aren't tied to a block device. For now, Job only stores its ID and JobDriver, the rest stays in BlockJob. The following patches will move over more parts of BlockJob to Job if they are meaningful outside the context of a block job. BlockJob.driver is now redundant, but this patch leaves it around to avoid unnecessary churn. The next patches will get rid of almost all of its uses anyway so that it can be removed later with much less churn. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>	2018-05-23 14:30:49 +02:00
Max Reitz	228345bf5d	block: Support BDRV_REQ_WRITE_UNCHANGED in filters Update the rest of the filter drivers to support BDRV_REQ_WRITE_UNCHANGED. They already forward write request flags to their children, so we just have to announce support for it. This patch does not cover the replication driver because that currently does not support flags at all, and because it just grabs the WRITE permission for its children when it can, so we should be fine just submitting the incoming WRITE_UNCHANGED requests as normal writes. It also does not cover format drivers for similar reasons. They all use bdrv_format_default_perms() as their .bdrv_child_perm() implementation so they just always grab the WRITE permission for their file children whenever possible. In addition, it often would be difficult to ascertain whether incoming unchanging writes end up as unchanging writes in their files. So we just leave them as normal potentially changing writes. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20180421132929.21610-7-mreitz@redhat.com Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-05-15 16:15:21 +02:00
Kevin Wolf	dee81d5111	blockjob: Introduce block_job_ratelimit_get_delay() This gets us rid of more direct accesses to BlockJob fields from the job drivers. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>	2018-05-15 16:11:50 +02:00
Kevin Wolf	18bb69287e	blockjob: Implement block_job_set_speed() centrally All block job drivers support .set_speed and all of them duplicate the same code to implement it. Move that code to blockjob.c and remove the now useless callback. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>	2018-05-15 16:11:50 +02:00
Kevin Wolf	f05fee508f	blockjob: Move RateLimit to BlockJob Every block job has a RateLimit, and they all do the exact same thing with it, so it should be common infrastructure. Move the struct field for a start. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>	2018-05-15 16:11:50 +02:00
Kevin Wolf	05df8a6a2b	blockjob: Wrappers for progress counter access Block job drivers are not expected to mess with the internals of the BlockJob object, so provide wrapper functions for one of the cases where they still do it: Updating the progress counter. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>	2018-05-15 16:11:49 +02:00
Max Reitz	eb36639f7b	block/mirror: Make cancel always cancel pre-READY Commit `b76e4458b1` made the mirror block job respect block-job-cancel's @force flag: With that flag set, it would now always really cancel, even post-READY. Unfortunately, it had a side effect: Without that flag set, it would now never cancel, not even before READY. Considering that is an incompatible change and not noted anywhere in the commit or the description of block-job-cancel's @force parameter, this seems unintentional and we should revert to the previous behavior, which is to immediately cancel the job when block-job-cancel is called before source and target are in sync (i.e. before the READY event). Cc: qemu-stable@nongnu.org Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1572856 Reported-by: Yanan Fu <yfu@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20180501220509.14152-2-mreitz@redhat.com Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2018-05-08 10:47:27 -04:00
Stefan Hajnoczi	ddc4115efd	block/mirror: honor ratelimit again Commit `b76e4458b1` ("block/mirror: change the semantic of 'force' of block-job-cancel") accidentally removed the ratelimit in the mirror job. Reintroduce the ratelimit but keep the block-job-cancel force=true behavior that was added in commit `b76e4458b1`. Note that block_job_sleep_ns() returns immediately when the job is cancelled. Therefore it's safe to unconditionally call block_job_sleep_ns() - a cancelled job does not sleep. This commit fixes the non-deterministic qemu-iotests 185 output. The test relies on the ratelimit to make the job sleep until the 'quit' command is processed. Previously the job could complete before the 'quit' command was received since there was no ratelimit. Cc: Liang Li <liliang.opensource@gmail.com> Cc: Jeff Cody <jcody@redhat.com> Cc: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20180424123527.19168-1-stefanha@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2018-05-08 10:47:27 -04:00
Liang Li	b76e4458b1	block/mirror: change the semantic of 'force' of block-job-cancel When doing drive mirror to a low speed shared storage, if there was heavy BLK IO write workload in VM after the 'ready' event, drive mirror block job can't be canceled immediately, it would keep running until the heavy BLK IO workload stopped in the VM. Libvirt depends on the current block-job-cancel semantics, which is that when used without a flag after the 'ready' event, the command blocks until data is in sync. However, these semantics are awkward in other situations, for example, people may use drive mirror for realtime backups while still wanting to use block live migration. Libvirt cannot start a block live migration while another drive mirror is in progress, but the user would rather abandon the backup attempt as broken and proceed with the live migration than be stuck waiting for the current drive mirror backup to finish. The drive-mirror command already includes a 'force' flag, which libvirt does not use, although it documented the flag as only being useful to quit a job which is paused. However, since quitting a paused job has the same effect as abandoning a backup in a non-paused job (namely, the destination file is not in sync, and the command completes immediately), we can just improve the documentation to make the force flag obviously useful. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Jeff Cody <jcody@redhat.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Max Reitz <mreitz@redhat.com> Cc: Eric Blake <eblake@redhat.com> Cc: John Snow <jsnow@redhat.com> Reported-by: Huaitong Han <huanhuaitong@didichuxing.com> Signed-off-by: Huaitong Han <huanhuaitong@didichuxing.com> Signed-off-by: Liang Li <liliangleo@didichuxing.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-19 12:01:39 +01:00
John Snow	75859b9420	blockjobs: model single jobs as transactions model all independent jobs as single job transactions. It's one less case we have to worry about when we add more states to the transition machine. This way, we can just treat all job lifetimes exactly the same. This helps tighten assertions of the STM graph and removes some conditionals that would have been needed in the coming commits adding a more explicit job lifetime management API. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-19 12:01:24 +01:00
Eric Blake	3e4d0e72b7	block: Switch passthrough drivers to .bdrv_co_block_status() We are gradually moving away from sector-based interfaces, towards byte-based. Update the generic helpers, and all passthrough clients (blkdebug, commit, mirror, throttle) accordingly. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-02 18:39:07 +01:00
Paolo Bonzini	5bf1d5a73a	blockjob: remove clock argument from block_job_sleep_ns All callers are using QEMU_CLOCK_REALTIME, and it will not be possible to support more than one clock when block_job_sleep_ns switches to a single timer stored in the BlockJob struct. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Tested-By: Jeff Cody <jcody@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-11-29 15:11:02 +01:00
Eric Blake	3182664220	block: Convert bdrv_get_block_status_above() to bytes We are gradually moving away from sector-based interfaces, towards byte-based. In the common case, allocation is unlikely to ever use values that are not naturally sector-aligned, but it is possible that byte-based values will let us be more precise about allocation at the end of an unaligned file that can do byte-based access. Changing the name of the function from bdrv_get_block_status_above() to bdrv_block_status_above() ensures that the compiler enforces that all callers are updated. Likewise, since it a byte interface allows an offset mapping that might not be sector aligned, split the mapping out of the return value and into a pass-by-reference parameter. For now, the io.c layer still assert()s that all uses are sector-aligned, but that can be relaxed when a later patch implements byte-based block status in the drivers. For the most part this patch is just the addition of scaling at the callers followed by inverse scaling at bdrv_block_status(), plus updates for the new split return interface. But some code, particularly bdrv_block_status(), gets a lot simpler because it no longer has to mess with sectors. Likewise, mirror code no longer computes s->granularity >> BDRV_SECTOR_BITS, and can therefore drop an assertion about alignment because the loop no longer depends on alignment (never mind that we don't really have a driver that reports sub-sector alignments, so it's not really possible to test the effect of sub-sector mirroring). Fix a neighboring assertion to use is_power_of_2 while there. For ease of review, bdrv_get_block_status() was tackled separately. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-26 14:45:57 +02:00
Eric Blake	7cfd527525	block: Make bdrv_round_to_clusters() signature more useful In the process of converting sector-based interfaces to bytes, I'm finding it easier to represent a byte count as a 64-bit integer at the block layer (even if we are internally capped by SIZE_MAX or even INT_MAX for individual transactions, it's still nicer to not have to worry about truncation/overflow issues on as many variables). Update the signature of bdrv_round_to_clusters() to uniformly use int64_t, matching the signature already chosen for bdrv_is_allocated and the fact that off_t is also a signed type, then adjust clients according to the required fallout (even where the result could now exceed 32 bits, no client is directly assigning the result into a 32-bit value without breaking things into a loop first). Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-26 14:45:57 +02:00
Eric Blake	298a1665a2	block: Allow NULL file for bdrv_get_block_status() Not all callers care about which BDS owns the mapping for a given range of the file. This patch merely simplifies the callers by consolidating the logic in the common call point, while guaranteeing a non-NULL file to all the driver callbacks, for no semantic change. The only caller that does not care about pnum is bdrv_is_allocated, as invoked by vvfat; we can likewise add assertions that the rest of the stack does not have to worry about a NULL pnum. Furthermore, this will also set the stage for a future cleanup: when a caller does not care about which BDS owns an offset, it would be nice to allow the driver to optimize things to not have to return BDRV_BLOCK_OFFSET_VALID in the first place. In the case of fragmented allocation (for example, it's fairly easy to create a qcow2 image where consecutive guest addresses are not at consecutive host addresses), the current contract requires bdrv_get_block_status() to clamp pnum to the limit where host addresses are no longer consecutive, but allowing a NULL file means that pnum could be set to the full length of known-allocated data. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-26 14:45:57 +02:00
Vladimir Sementsov-Ogievskiy	ce960aa906	block/mirror: check backing in bdrv_mirror_top_flush Backing may be zero after failed bdrv_append in mirror_start_job, which leads to SIGSEGV. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20170929152255.5431-1-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-10-06 16:30:48 +02:00
Vladimir Sementsov-Ogievskiy	18775ff326	block/mirror: check backing in bdrv_mirror_top_refresh_filename Backing may be zero after failed bdrv_attach_child in bdrv_set_backing_hd, which leads to SIGSEGV. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20170928120300.58164-1-vsementsov@virtuozzo.com Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-10-06 16:30:47 +02:00
Eric Blake	23ca459a45	mirror: Switch mirror_dirty_init() to byte-based iteration Now that we have adjusted the majority of the calls this function makes to be byte-based, it is easier to read the code if it makes passes over the image using bytes rather than sectors. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-06 16:28:58 +02:00
Eric Blake	e0d7f73e63	dirty-bitmap: Change bdrv_[re]set_dirty_bitmap() to use bytes Some of the callers were already scaling bytes to sectors; others can be easily converted to pass byte offsets, all in our shift towards a consistent byte interface everywhere. Making the change will also make it easier to write the hold-out callers to use byte rather than sectors for their iterations; it also makes it easier for a future dirty-bitmap patch to offload scaling over to the internal hbitmap. Although all callers happen to pass sector-aligned values, make the internal scaling robust to any sub-sector requests. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-06 16:28:58 +02:00
Eric Blake	3b5d4df0c6	dirty-bitmap: Change bdrv_get_dirty_locked() to take bytes Half the callers were already scaling bytes to sectors; the other half can eventually be simplified to use byte iteration. Both callers were already using the result as a bool, so make that explicit. Making the change also makes it easier for a future dirty-bitmap patch to offload scaling over to the internal hbitmap. Remember, asking whether a byte is dirty is effectively asking whether the entire granularity containing the byte is dirty, since we only track dirtiness by granularity. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-06 16:28:58 +02:00
Eric Blake	9a46dba7b7	dirty-bitmap: Change bdrv_get_dirty_count() to report bytes Thanks to recent cleanups, all callers were scaling a return value of sectors into bytes; do the scaling internally instead. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-06 16:28:58 +02:00
Eric Blake	f798184cfd	dirty-bitmap: Change bdrv_dirty_iter_next() to report byte offset Thanks to recent cleanups, most callers were scaling a return value of sectors into bytes (the exception, in qcow2-bitmap, will be converted to byte-based iteration later). Update the interface to do the scaling internally instead. In qcow2-bitmap, the code was specifically checking for an error return of -1. To avoid a regression, we either have to make sure we continue to return -1 (rather than a scaled -512) on error, or we have to fix the caller to treat all negative values as error rather than just one magic value. It's easy enough to make both changes at the same time, even though either one in isolation would work. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-06 16:28:58 +02:00
Eric Blake	715a74d819	dirty-bitmap: Set iterator start by offset, not sector All callers to bdrv_dirty_iter_new() passed 0 for their initial starting point, drop that parameter. Most callers to bdrv_set_dirty_iter() were scaling a byte offset to a sector number; the exception qcow2-bitmap will be converted later to use byte rather than sector iteration. Move the scaling to occur internally to dirty bitmap code instead, so that callers now pass in bytes. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-06 16:28:58 +02:00
Kevin Wolf	e0995dc3da	block: Add reopen_queue to bdrv_child_perm() In the context of bdrv_reopen(), we'll have to look at the state of the graph as it will be after the reopen. This interface addition is in preparation for the change. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-09-26 14:46:23 +02:00
Manos Pitsidianakis	f7cc69b326	block: add default implementations for bdrv_co_get_block_status() bdrv_co_get_block_status_from_file() and bdrv_co_get_block_status_from_backing() set *file to bs->file and bs->backing respectively, so that bdrv_co_get_block_status() can recurse to them. Future block drivers won't have to duplicate code to implement this. Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-09-04 18:31:13 +02:00
Fam Zheng	045a2f8254	mirror: Mark target BB as "force allow inactivate" Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20170823134242.12080-4-famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2017-08-23 10:21:55 -05:00
Kevin Wolf	d3c8c67469	block: Skip implicit nodes in query-block/blockstats Commits `0db832f` and `6cdbceb` introduced the automatic insertion of filter nodes above the top layer of mirror and commit block jobs. The assumption made there was that since libvirt doesn't do node-level management of the block layer yet, it shouldn't be affected by added nodes. This is true as far as commands issued by libvirt are concerned. It only uses BlockBackend names to address nodes, so any operations it performs still operate on the root of the tree as intended. However, the assumption breaks down when you consider query commands, which return data for the wrong node now. These commands also return information on some child nodes (bs->file and/or bs->backing), which libvirt does make use of, and which refer to the wrong nodes, too. One of the consequences is that oVirt gets wrong information about the image size and stops the VM in response as long as a mirror or commit job is running: https://bugzilla.redhat.com/show_bug.cgi?id=1470634 This patch fixes the problem by hiding the implicit nodes created automatically by the mirror and commit block jobs in the output of query-block and BlockBackend-based query-blockstats as long as the user doesn't indicate that they are aware of those nodes by providing a node name for them in the QMP command to start the block job. The node-based commands query-named-block-nodes and query-blockstats with query-nodes=true still show all nodes, including implicit ones. This ensures that users that are capable of node-level management can still access the full information; users that only know BlockBackends won't use these commands. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Tested-by: Eric Blake <eblake@redhat.com>	2017-07-24 15:06:04 +02:00
Max Reitz	3a691c50f1	block: Add PreallocMode to blk_truncate() blk_truncate() itself will pass that value to bdrv_truncate(), and all callers of blk_truncate() just set the parameter to PREALLOC_MODE_OFF for now. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20170613202107.10125-4-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:45:01 +02:00
Eric Blake	51b0a48888	block: Make bdrv_is_allocated_above() byte-based We are gradually moving away from sector-based interfaces, towards byte-based. In the common case, allocation is unlikely to ever use values that are not naturally sector-aligned, but it is possible that byte-based values will let us be more precise about allocation at the end of an unaligned file that can do byte-based access. Changing the signature of the function to use int64_t *pnum ensures that the compiler enforces that all callers are updated. For now, the io.c layer still assert()s that all callers are sector-aligned, but that can be relaxed when a later patch implements byte-based block status. Therefore, for the most part this patch is just the addition of scaling at the callers followed by inverse scaling at bdrv_is_allocated(). But some code, particularly stream_run(), gets a lot simpler because it no longer has to mess with sectors. Leave comments where we can further simplify by switching to byte-based iterations, once later patches eliminate the need for sector-aligned operations. For ease of review, bdrv_is_allocated() was tackled separately. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-10 13:18:07 +02:00
Eric Blake	fb2ef7919b	mirror: Switch mirror_iteration() to byte-based We are gradually converting to byte-based interfaces, as they are easier to reason about than sector-based. Change the internal loop iteration of mirroring to track by bytes instead of sectors (although we are still guaranteed that we iterate by steps that are both sector-aligned and multiples of the granularity). Drop the now-unused mirror_clip_sectors(). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-10 13:18:06 +02:00
Eric Blake	ae4cc8777b	mirror: Switch mirror_do_read() to byte-based We are gradually converting to byte-based interfaces, as they are easier to reason about than sector-based. Convert another internal function, preserving all existing semantics, and adding one more assertion that things are still sector-aligned (so that conversions to sectors in mirror_read_complete don't need to round). Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-10 13:18:06 +02:00
Eric Blake	782d97efec	mirror: Switch mirror_cow_align() to byte-based We are gradually converting to byte-based interfaces, as they are easier to reason about than sector-based. Convert another internal function (no semantic change), and add mirror_clip_bytes() as a counterpart to mirror_clip_sectors(). Some of the conversion is a bit tricky, requiring temporaries to convert between units; it will be cleared up in a following patch. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-10 13:18:06 +02:00
Eric Blake	931e52607f	mirror: Update signature of mirror_clip_sectors() Rather than having a void function that modifies its input in-place as the output, change the signature to reduce a layer of indirection and return the result. Suggested-by: John Snow <jsnow@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-10 13:18:06 +02:00
Eric Blake	e6f2419389	mirror: Switch mirror_do_zero_or_discard() to byte-based We are gradually converting to byte-based interfaces, as they are easier to reason about than sector-based. Convert another internal function (no semantic change). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-10 13:18:06 +02:00
Eric Blake	b436982f04	mirror: Switch MirrorBlockJob to byte-based We are gradually converting to byte-based interfaces, as they are easier to reason about than sector-based. Continue by converting an internal structure (no semantic change), and all references to the buffer size. Add an assertion that our use of s->granularity >> BDRV_SECTOR_BITS (necessary for interaction with sector-based dirty bitmaps, until a later patch converts those to be byte-based) does not suffer from truncation problems. [checkpatch has a false positive on use of MIN() in this patch] Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-10 13:18:06 +02:00
Eric Blake	5cb1a49e01	trace: Show blockjob actions via bytes, not sectors Upcoming patches are going to switch to byte-based interfaces instead of sector-based. Even worse, trace_backup_do_cow_enter() had a weird mix of cluster and sector indices. The trace interface is low enough that there are no stability guarantees, and therefore nothing wrong with changing our units, even in cases like trace_backup_do_cow_skip() where we are not changing the trace output. So make the tracing uniformly use bytes. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-10 13:18:06 +02:00
Eric Blake	f3e4ce4af3	blockjob: Track job ratelimits via bytes, not sectors The user interface specifies job rate limits in bytes/second. It's pointless to have our internal representation track things in sectors/second, particularly since we want to move away from sector-based interfaces. Fix up a doc typo found while verifying that the ratelimit code handles the scaling difference. Repetition of expressions like 'n * BDRV_SECTOR_SIZE' will be cleaned up later when functions are converted to iterate over images by bytes rather than by sectors. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-10 13:18:06 +02:00
Eric Blake	d5254033da	block: Simplify use of BDRV_BLOCK_RAW The lone caller that cares about a return of BDRV_BLOCK_RAW (namely, io.c:bdrv_co_get_block_status) completely replaces the return value, so there is no point in passing BDRV_BLOCK_DATA. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-10 13:18:05 +02:00
Manos Pitsidianakis	f5a5ca7969	block: change variable names in BlockDriverState Change the 'int count' parameter in pwrite_zeros, pdiscard related functions (and some others) to 'int bytes', as they both refer to bytes. This helps with code legibility. Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Message-id: 20170609101808.13506-1-el13635@mail.ntua.gr Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-06-26 14:54:46 +02:00
Paolo Bonzini	b64bd51efa	block: protect modification of dirty bitmaps with a mutex Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-17-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-06-16 07:55:00 +08:00
Paolo Bonzini	2119882c7e	block: introduce dirty_bitmap_mutex It protects only the list of dirty bitmaps; in the next patch we will also protect their content. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-15-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-06-16 07:55:00 +08:00
Stefan Hajnoczi	0748b3526e	Block layer patches -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJZLDGTAAoJEH8JsnLIjy/WjY0P/jtXF+SQKs9H695Uy+U+EUr5 9w+lYTfjHXTUx9F5AKT4OAMww/uWSG5ywG8FZD8AJZyt1B8piDrSA7sW5YYgWCV4 XD1HMzN6pasVGrchzgfBSW/K9BgUlvBEjqFrLCNVii0osa68J7vX3uJsLyRA+yPo ijFseJaB/b0KnAPG9JqHXc4DVQgU2IhZH3ZsgE6Utb+KUrm8/veiORhWQboBbfCW eyTF+YWee/rI+up4nKn9+Le57EUXWr3Y50BBxFZw1/4wQbv0WEhOIbl/Z4ggxNRR VoyXGMqOIZFlGQ7bGxDawuEwGKfcOFFEHj3rhPrs7RPnod/6X8Vxm0rAr2GNZoOW 9tLMQKe4HLo3kVwX65AhzWd/WMukAd0MC/0oaULOjiAEdlWjflyEhx9H9uYefl5x kn77ZqQqIEL4aCYRBLJn9oJV3CIZbSd6cuKC9UPuXAwD8XDWTXUzT/aDMJUnfij7 vhS1qks1Wyk37wIjTuuERwrr0K6y8e3De30NeU6LqQuzXvJ1MYSp49BA1FCgHfmT x5RWbTSjLdjZiIo7biE2dSwdOtsxsnuxX19utqfGqOmegNU7LykRtu6mFh9kJhNe 5ladt1Mu//puZLQ/q4RH/J0pwkuS/OVrS3LBobcggPGHhUIiHhdNUI73bb7BK0+9 ME5DGLm4/c/REB3Lidr9 =kkU0 -----END PGP SIGNATURE----- Merge remote-tracking branch 'kwolf/tags/for-upstream' into staging Block layer patches # gpg: Signature made Mon 29 May 2017 03:34:59 PM BST # gpg: using RSA key 0x7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * kwolf/tags/for-upstream: block/file-: _parse_filename() and colons block: Fix backing paths for filenames with colons block: Tweak error message related to qemu-img amend qemu-img: Fix leakage of options on error qemu-img: copy *key-secret opts when opening newly created files qemu-img: introduce --target-image-opts for 'convert' command qemu-img: fix --image-opts usage with dd command qemu-img: add support for --object with 'dd' command qemu-img: Fix documentation of convert qcow2: remove extra local_error variable mirror: Drop permissions on s->target on completion nvme: Add support for Controller Memory Buffers iotests: 147: Don't test inet6 if not available qemu-iotests: Test streaming with missing job ID stream: fix crash in stream_start() when block_job_create() fails Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-05-30 14:15:15 +01:00

1 2 3 4 5 ...

325 Commits