qemu-e2k

Author	SHA1	Message	Date
Paolo Bonzini	bdffb31d8e	mirror: do not flush every time the disks are synced This puts a huge strain on the disks when there are many concurrent migrations. With this patch we only flush twice: just before issuing the event, and just before pivoting to the destination. If management will complete the job close to the BLOCK_JOB_READY event, the cost of the second flush should be small anyway. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20161109162008.27287-2-pbonzini@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-11-14 22:49:26 -05:00
John Snow	5ccac6f186	blockjob: add block_job_start Instead of automatically starting jobs at creation time via backup_start et al, we'd like to return a job object pointer that can be started manually at later point in time. For now, add the block_job_start mechanism and start the jobs automatically as we have been doing, with conversions job-by-job coming in later patches. Of note: cancellation of unstarted jobs will perform all the normal cleanup as if the job had started, particularly abort and clean. The only difference is that we will not emit any events, because the job never actually started. Signed-off-by: John Snow <jsnow@redhat.com> Message-id: 1478587839-9834-5-git-send-email-jsnow@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-11-14 22:47:34 -05:00
John Snow	a7815a764c	blockjob: add .start field Add an explicit start field to specify the entrypoint. We already have ownership of the coroutine itself AND managing the lifetime of the coroutine, let's take control of creation of the coroutine, too. This will allow us to delay creation of the actual coroutine until we know we'll actually start a BlockJob in block_job_start. This avoids the sticky question of how to "un-create" a Coroutine that hasn't been started yet. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1478587839-9834-4-git-send-email-jsnow@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-11-14 22:47:34 -05:00
John Snow	c87621ea68	blockjobs: split interface into public/private, Part 1 To make it a little more obvious which functions are intended to be public interface and which are intended to be for use only by jobs themselves, split the interface into "public" and "private" files. Convert blockjobs (e.g. block/backup) to using the private interface. Leave blockdev and others on the public interface. There are remaining uses of private state by qemu-img, and several cases in blockdev.c and block/io.c where we grab job->blk for the purposes of acquiring an AIOContext. These will be corrected in future patches. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 1477584421-1399-7-git-send-email-jsnow@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-11-01 08:04:56 -04:00
John Snow	8254b6d953	blockjob: centralize QMP event emissions There's no reason to leave this to blockdev; we can do it in blockjobs directly and get rid of an extra callback for most users. All non-internal events, even those created outside of QMP, will consistently emit events. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 1477584421-1399-5-git-send-email-jsnow@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-11-01 07:55:57 -04:00
John Snow	47970dfb0a	Replication/Blockjobs: Create replication jobs as internal Bubble up the internal interface to commit and backup jobs, then switch replication tasks over to using this methodology. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 1477584421-1399-4-git-send-email-jsnow@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-11-01 07:55:57 -04:00
John Snow	f81e0b4532	blockjobs: Allow creating internal jobs Add the ability to create jobs without an ID. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 1477584421-1399-3-git-send-email-jsnow@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-11-01 07:55:57 -04:00
Fam Zheng	6f13acf97e	block: Turn on "unmap" in active commit We already specified BDRV_O_UNMAP when opening images in 'qemu-img commit', but didn't turn on the "unmap" in the active commit job. This patch fixes that so that zeroed clusters in top image can be discarded which is desired in the virt-sparsify use case, where a temporary overlay is created and fstrim'ed before commiting back, to free space in the original image. This also enables it for block-commit. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1474974892-5031-1-git-send-email-famz@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-11-01 07:55:57 -04:00
Alberto Garcia	f3ede4b05d	block: Block all intermediate nodes in commit_active_start() When block-commit is launched without the top parameter, it uses internally a mirror block job. In that case all intermediate nodes between the active and base nodes must be blocked as well. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-10-31 16:52:38 +01:00
Alberto Garcia	cee3c6b5ca	block: Use block_job_add_bdrv() in mirror_start_job() Use block_job_add_bdrv() instead of blocking all operations in mirror_start_job() and unblocking them in mirror_exit(). Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-10-31 16:52:38 +01:00
Paolo Bonzini	9a0cec664e	mirror: use bdrv_drained_begin/bdrv_drained_end Ensure that there are no changes between the last check to bdrv_get_dirty_count and the switch to the target. There is already a bdrv_drained_end call, we only need to ensure that bdrv_drained_begin is not called twice. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-Id: <1477565348-5458-4-git-send-email-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2016-10-28 21:50:18 +08:00
Paolo Bonzini	bae8196d9f	blockjob: introduce .drain callback for jobs This is required to decouple block jobs from running in an AioContext. With multiqueue block devices, a BlockDriverState does not really belong to a single AioContext. The solution is to first wait until all I/O operations are complete; then loop in the main thread for the block job to complete entirely. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-Id: <1477565348-5458-3-git-send-email-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2016-10-28 21:50:18 +08:00
Fam Zheng	dc162c8e4f	block: Hide HBitmap in block dirty bitmap interface HBitmap is an implementation detail of block dirty bitmap that should be hidden from users. Introduce a BdrvDirtyBitmapIter to encapsulate the underlying HBitmapIter. A small difference in the interface is, before, an HBitmapIter is initialized in place, now the new BdrvDirtyBitmapIter must be dynamically allocated because the structure definition is in block/dirty-bitmap.c. Two current users are converted too. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com> Message-id: 1476395910-8697-2-git-send-email-jsnow@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-10-24 17:56:07 +02:00
Wen Congyang	b49f7ead8d	mirror: auto complete active commit Auto complete mirror job in background to prevent from blocking synchronously Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com> Signed-off-by: Wang WeiWei <wangww.fnst@cn.fujitsu.com> Message-id: 1469602913-20979-7-git-send-email-xiecl.fnst@cn.fujitsu.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-09-13 11:00:56 +01:00
Vladimir Sementsov-Ogievskiy	dbaa7b57ec	mirror: finish earlier on error Stop to produce new async copy requests from mirror_iteration if critical error (error action = BLOCK_ERROR_ACTION_REPORT) detected. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-08-08 13:05:43 +02:00
Vladimir Sementsov-Ogievskiy	0965a41e99	mirror: double performance of the bulk stage if the disc is full Mirror can do up to 16 in-flight requests, but actually on full copy (the whole source disk is non-zero) in-flight is always 1. This happens as the request is not limited in size: the data occupies maximum available capacity of s->buf. The patch limits the size of the request to some artificial constant (1 Mb here), which is not that big or small. This effectively enables back parallelism in mirror code as it was designed. The result is important: the time to migrate 10 Gb disk is reduced from ~350 sec to 170 sec. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 1468516741-82174-1-git-send-email-vsementsov@virtuozzo.com CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Fam Zheng <famz@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> CC: Jeff Cody <jcody@redhat.com> CC: Eric Blake <eblake@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-07-26 16:23:36 -04:00
Peter Maydell	206d0c2436	pc, pci, virtio: new features, cleanups, fixes - interrupt remapping for intel iommus - a bunch of virtio cleanups - fixes all over the place Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJXkQsqAAoJECgfDbjSjVRpanoIAJ9JVlc1aEjt9sa0cSBcs+NQ J7JmgU9FqFsj+4FrNTouO3AxTjHurd1UAULP1WMPD+V3JpbnHct8r6SCBLQ5EBMN VOjYo4DwWs1g+DqnQ9WZmbadu06XvYi/yiAKNUzWfZk0MR11D0D/S5hmarNKw0Kq tGHeTWjGeY4WqFLV7m+qB4+cqkAByn6um99UtUvgLL05RgIEIP2IEMKYZ+rXvAa9 iGUvzqlO7mbq/+LbL18kaWywa4TCwbbd2eSGWaqhX4CuB62Rl33mWTXFcfaYhkyp Z3FgwaJ09h0lAjSVEbyAuLFMfO/BnMcsoKqwl4xc4vkn/xBCqFtgH9JcEVm3O8U= =ge2D -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging pc, pci, virtio: new features, cleanups, fixes - interrupt remapping for intel iommus - a bunch of virtio cleanups - fixes all over the place Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # gpg: Signature made Thu 21 Jul 2016 18:49:30 BST # gpg: using RSA key 0x281F0DB8D28D5469 # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * remotes/mst/tags/for_upstream: (57 commits) intel_iommu: avoid unnamed fields virtio: Update migration docs virtio-gpu: Wrap in vmstate virtio-gpu: Use migrate_add_blocker for virgl migration blocking virtio-input: Wrap in vmstate 9pfs: Wrap in vmstate virtio-serial: Wrap in vmstate virtio-net: Wrap in vmstate virtio-balloon: Wrap in vmstate virtio-rng: Wrap in vmstate virtio-blk: Wrap in vmstate virtio-scsi: Wrap in vmstate virtio: Migration helper function and macro virtio-serial: Remove old migration version support virtio-net: Remove old migration version support virtio-scsi: Replace HandleOutput typedef Revert "mirror: Workaround for unexpected iohandler events during completion" virtio-scsi: Call virtio_add_queue_aio virtio-blk: Call virtio_add_queue_aio virtio: Introduce virtio_add_queue_aio ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2016-07-21 20:12:37 +01:00
Fam Zheng	d4a92a8420	Revert "mirror: Workaround for unexpected iohandler events during completion" This reverts commit `ab27c3b5e7`. The virtio storage device host notifiers now work with bdrv_drained_begin/end, so we don't need this hack any more. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com>	2016-07-21 20:44:19 +03:00
Peter Maydell	61ead113ae	Pull request v2: * Resolved merge conflict with block/iscsi.c [Peter] -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJXj6TkAAoJEJykq7OBq3PI15oIAL24OFnMTFOAWyY2h3yfzGwe Gn1qzFOTHCXgbgVsolVuZnFU/SDn9WzMS9y6o7jcJykv++FdMSYUO7paPQAg9etX o9KOgyfUUDhskbaXEbTKxhqy7UvcyJsZYmjN+TD5tupfnrqGlmu6rkUPpSwGo++G u7CkAZewJc5SvsCdQRe3N10LbG4xU/j06cjCIHWDPRr+AV9qvsb9KkIOg3wkrhWN R83yj+yX6vhfQouHLoOlh3RYZ5HNo020JPum0jT1tGcoshKoFyu0prSRqNVVMqTY BepvWMhXbiq219HnFaQBtnysve9gWix4WhuCHtaAlaZDTBh945ZztsW7w7dubtA= =7ics -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging Pull request v2: * Resolved merge conflict with block/iscsi.c [Peter] # gpg: Signature made Wed 20 Jul 2016 17:20:52 BST # gpg: using RSA key 0x9CA4ABB381AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" # Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8 * remotes/stefanha/tags/block-pull-request: (25 commits) raw_bsd: Convert to byte-based interface nbd: Convert to byte-based interface block: Kill .bdrv_co_discard() sheepdog: Switch .bdrv_co_discard() to byte-based raw_bsd: Switch .bdrv_co_discard() to byte-based qcow2: Switch .bdrv_co_discard() to byte-based nbd: Switch .bdrv_co_discard() to byte-based iscsi: Switch .bdrv_co_discard() to byte-based gluster: Switch .bdrv_co_discard() to byte-based blkreplay: Switch .bdrv_co_discard() to byte-based block: Add .bdrv_co_pdiscard() driver callback block: Convert .bdrv_aio_discard() to byte-based rbd: Switch rbd_start_aio() to byte-based raw-posix: Switch paio_submit() to byte-based block: Convert BB interface to byte-based discards block: Convert bdrv_aio_discard() to byte-based block: Switch BlockRequest to byte-based block: Convert bdrv_discard() to byte-based block: Convert bdrv_co_discard() to byte-based iscsi: Rely on block layer to break up large requests ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Conflicts: block/gluster.c	2016-07-21 11:00:36 +01:00
Eric Blake	1c6c4bb7f0	block: Convert BB interface to byte-based discards Change sector-based blk_discard(), blk_co_discard(), and blk_aio_discard() to instead be byte-based blk_pdiscard(), blk_co_pdiscard(), and blk_aio_pdiscard(). NBD gets a lot simpler now that ignoring the unaligned portion of a byte-based discard request is handled under the hood by the block layer. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1468624988-423-6-git-send-email-eblake@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-07-20 14:11:55 +01:00
Denis V. Lunev	cf56a3c632	mirror: fix request throttling in drive-mirror There are 2 deficiencies here: - mirror_iteration could start several requests inside. Thus we could simply have more in_flight requests than MAX_IN_FLIGHT. - keeping this in mind throttling in mirror_run which is checking s->in_flight == MAX_IN_FLIGHT is wrong. The patch adds the check and throttling into mirror_iteration and fixes the check in mirror_run() to be sure. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 1466598927-5990-1-git-send-email-den@openvz.org CC: Jeff Cody <jcody@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> (cherry picked from commit e648dc95c28fbca12e67be26a1fc4b9a0676c3fe)	2016-07-19 17:03:44 -04:00
Denis V. Lunev	4b5004d9fc	mirror: improve performance of mirroring of empty disk We should not take into account zero blocks for delay calculations. They are not read and thus IO throttling is not required. In the other case VM migration with 16 Tb QCOW2 disk with 4 Gb of data takes days. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1468503209-19498-9-git-send-email-den@openvz.org CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> CC: Jeff Cody <jcody@redhat.com> CC: Eric Blake <eblake@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-07-19 17:02:49 -04:00
Denis V. Lunev	c7c2769c0e	mirror: efficiently zero out target With a bdrv_co_write_zeroes method on a target BDS and when this method is working as indicated by the bdrv_can_write_zeroes_with_unmap(), zeroes will not be placed into the wire. Thus the target could be very efficiently zeroed out. This should be done with the largest chunk possible. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Vladimir Sementsov-Ogievskiy<vsementsov@virtuozzo.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1468503209-19498-8-git-send-email-den@openvz.org CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> CC: Jeff Cody <jcody@redhat.com> CC: Eric Blake <eblake@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-07-19 16:54:46 -04:00
Denis V. Lunev	b7d5062c9c	mirror: optimize dirty bitmap filling in mirror_run a bit There is no need to scan allocation tables if we have mark_all_dirty flag set. Just mark it all dirty. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Vladimir Sementsov-Ogievskiy<vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1468503209-19498-7-git-send-email-den@openvz.org CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> CC: Jeff Cody <jcody@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-07-19 16:54:46 -04:00
Denis V. Lunev	c0b363ad43	mirror: create mirror_dirty_init helper for mirror_run The code inside the helper will be extended in the next patch. mirror_run itself is overbloated at the moment. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Vladimir Sementsov-Ogievskiy<vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1468503209-19498-5-git-send-email-den@openvz.org CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> CC: Jeff Cody <jcody@redhat.com> CC: Eric Blake <eblake@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-07-19 16:54:46 -04:00
Denis V. Lunev	49efb1f5b0	mirror: create mirror_throttle helper The patch also places last_pause_ns from stack in mirror_run into MirrorBlockJob structure. This helper will be useful in next patches. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1468503209-19498-4-git-send-email-den@openvz.org CC: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> CC: Eric Blake <eblake@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Fam Zheng <famz@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> CC: Jeff Cody <jcody@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-07-19 16:54:46 -04:00
Denis V. Lunev	531509ba28	mirror: make sectors_in_flight int64_t We keep here the sum of int fields. Thus this could easily overflow, especially when we will start sending big requests in next patches. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1468503209-19498-3-git-send-email-den@openvz.org CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> CC: Jeff Cody <jcody@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-07-19 16:54:46 -04:00
Sascha Silbe	f14a39ccb9	Improve block job rate limiting for small bandwidth values ratelimit_calculate_delay() previously reset the accounting every time slice, no matter how much data had been processed before. This had (at least) two consequences: 1. The minimum speed is rather large, e.g. 5 MiB/s for commit and stream. Not sure if there are real-world use cases where this would be a problem. Mirroring and backup over a slow link (e.g. DSL) would come to mind, though. 2. Tests for block job operations (e.g. cancel) were rather racy All block jobs currently use a time slice of 100ms. That's a reasonable value to get smooth output during regular operation. However this also meant that the state of block jobs changed every 100ms, no matter how low the configured limit was. On busy hosts, qemu often transferred additional chunks until the test case had a chance to cancel the job. Fix the block job rate limit code to delay for more than one time slice to address the above issues. To make it easier to handle oversized chunks we switch the semantics from returning a delay _before_ the current request to a delay _after_ the current request. If necessary, this delay consists of multiple time slice units. Since the mirror job sends multiple chunks in one go even if the rate limit was exceeded in between, we need to keep track of the start of the current time slice so we can correctly re-compute the delay for the updated amount of data. The minimum bandwidth now is 1 data unit per time slice. The block jobs are currently passing the amount of data transferred in sectors and using 100ms time slices, so this translates to 5120 bytes/second. With chunk sizes usually being O(512KiB), tests have plenty of time (O(100s)) to operate on block jobs. The chance of a race condition now is fairly remote, except possibly on insanely loaded systems. Signed-off-by: Sascha Silbe <silbe@linux.vnet.ibm.com> Message-id: 1467127721-9564-2-git-send-email-silbe@linux.vnet.ibm.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-07-13 13:41:38 +02:00
Paolo Bonzini	0b8b8753e4	coroutine: move entry argument to qemu_coroutine_create In practice the entry argument is always known at creation time, and it is confusing that sometimes qemu_coroutine_enter is used with a non-NULL argument to re-enter a coroutine (this happens in block/sheepdog.c and tests/test-coroutine.c). So pass the opaque value at creation time, for consistency with e.g. aio_bh_new. Mostly done with the following semantic patch: @ entry1 @ expression entry, arg, co; @@ - co = qemu_coroutine_create(entry); + co = qemu_coroutine_create(entry, arg); ... - qemu_coroutine_enter(co, arg); + qemu_coroutine_enter(co); @ entry2 @ expression entry, arg; identifier co; @@ - Coroutine co = qemu_coroutine_create(entry); + Coroutine co = qemu_coroutine_create(entry, arg); ... - qemu_coroutine_enter(co, arg); + qemu_coroutine_enter(co); @ entry3 @ expression entry, arg; @@ - qemu_coroutine_enter(qemu_coroutine_create(entry), arg); + qemu_coroutine_enter(qemu_coroutine_create(entry, arg)); @ reentry @ expression co; @@ - qemu_coroutine_enter(co, NULL); + qemu_coroutine_enter(co); except for the aforementioned few places where the semantic patch stumbled (as expected) and for test_co_queue, which would otherwise produce an uninitialized variable warning. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-07-13 13:26:02 +02:00
Alberto Garcia	fd62c609ed	commit: Add 'job-id' parameter to 'block-commit' This patch adds a new optional 'job-id' parameter to 'block-commit', allowing the user to specify the ID of the block job to be created. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-07-13 13:26:02 +02:00
Alberto Garcia	71aa98678c	mirror: Add 'job-id' parameter to 'blockdev-mirror' and 'drive-mirror' This patch adds a new optional 'job-id' parameter to 'blockdev-mirror' and 'drive-mirror', allowing the user to specify the ID of the block job to be created. The HMP 'drive_mirror' command remains unchanged. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-07-13 13:26:02 +02:00
Alberto Garcia	7f0317cfc8	blockjob: Add 'job_id' parameter to block_job_create() When a new job is created, the job ID is taken from the device name of the BDS. This patch adds a new 'job_id' parameter to let the caller provide one instead. This patch also verifies that the ID is always unique and well-formed. This causes problems in a couple of places where no ID is being set, because the BDS does not have a device name. In the case of test_block_job_start() (from test-blockjob-txn.c) we can simply use this new 'job_id' parameter to set the missing ID. In the case of img_commit() (from qemu-img.c) we still don't have the API to make commit_active_start() set the job ID, so we solve it by setting a default value. We'll get rid of this as soon as we extend the API. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-07-13 13:26:02 +02:00
Alberto Garcia	9df229c3ca	blockjob: Update description of the 'id' field The 'id' field of the BlockJob structure will be able to hold any ID, not only a device name. This patch updates the description of that field and the error messages where it is being used. Soon we'll add the ability to set an arbitrary ID when creating a block job. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-07-13 13:26:02 +02:00
Changlong Xie	15d6729850	mirror: fix misleading comments s/target bs/to_replace/, also we check to_replace bs is not blocked in qmp_drive_mirror() not here Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 1466672241-22485-3-git-send-email-xiecl.fnst@cn.fujitsu.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-06-28 23:08:25 -04:00
John Snow	e4808881cb	mirror: limit niov to IOV_MAX elements, again During the refactor of mirror_iteration in `e5b43573`, we regressed the fix introduced in `cae98cb8`. This patch re-adds IOV_MAX checking to cases where we aren't checking alignment (and size) already. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1466625064-11280-3-git-send-email-jsnow@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-06-28 22:53:03 -04:00
John Snow	176129552f	mirror: clarify mirror_do_read return code mirror_do_read intends to return the number of sectors processed after the starting sector, without regard to how many sectors were processed before the starting sector due to alignment. Clean up the comments and code to hopefully illustrate this more clearly. This also fixes an issue in initialization where if the mirror buffer size is initialized to smaller than the number of sectors being requested for transfer, we report back an incorrectly large number to the caller. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1466625064-11280-2-git-send-email-jsnow@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-06-28 22:53:03 -04:00
Denis V. Lunev	ff04198bf5	mirror: fix trace_mirror_yield_in_flight usage in mirror_iteration() trace_mirror_yield_in_flight accepts 2nd arguments in sectors while here we pass chunks instead. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1466518157-27140-1-git-send-email-den@openvz.org CC: Jeff Cody <jcody@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-06-28 22:52:45 -04:00
Stefan Hajnoczi	565ac01f8d	mirror: follow AioContext change gracefully Add block_job_pause_point() calls to mark quiescent points and make sure to complete in-flight requests when switching AioContexts. This patch solves undefined behavior in the mirror block job when the BDS AioContext is changed by dataplane. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1466096189-6477-8-git-send-email-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-06-20 14:25:41 +01:00
Max Reitz	274fccee2b	block/mirror: Fix target backing BDS Currently, we are trying to move the backing BDS from the source to the target in bdrv_replace_in_backing_chain() which is called from mirror_exit(). However, mirror_complete() already tries to open the target's backing chain with a call to bdrv_open_backing_file(). First, we should only set the target's backing BDS once. Second, the mirroring block job has a better idea of what to set it to than the generic code in bdrv_replace_in_backing_chain() (in fact, the latter's conditions on when to move the backing BDS from source to target are not really correct). Therefore, remove that code from bdrv_replace_in_backing_chain() and leave it to mirror_complete(). Depending on what kind of mirroring is performed, we furthermore want to use different strategies to open the target's backing chain: - If blockdev-mirror is used, we can assume the user made sure that the target already has the correct backing chain. In particular, we should not try to open a backing file if the target does not have any yet. - If drive-mirror with mode=absolute-paths is used, we can and should reuse the already existing chain of nodes that the source BDS is in. In case of sync=full, no backing BDS is required; with sync=top, we just link the source's backing BDS to the target, and with sync=none, we use the source BDS as the target's backing BDS. We should not try to open these backing files anew because this would lead to two BDSs existing per physical file in the backing chain, and we would like to avoid such concurrent access. - If drive-mirror with mode=existing is used, we have to use the information provided in the physical image file which means opening the target's backing chain completely anew, just as it has been done already. If the target's backing chain shares images with the source, this may lead to multiple BDSs per physical image file. But since we cannot reliably ascertain this case, there is nothing we can do about it. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20160610185750.30956-3-mreitz@redhat.com Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-06-16 15:20:37 +02:00
Kevin Wolf	244483e64e	block: Byte-based bdrv_co_do_copy_on_readv() In a first step to convert the common I/O path to work on bytes rather than sectors, this converts the copy-on-read logic that is used by bdrv_aligned_preadv(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-06-16 15:19:55 +02:00
Eric Blake	73698c30ca	block: Avoid bogus flags during mirroring Commit `e253f4b8` converted mirroring from sector-based bdrv_aio_* to byte-based blk_aio_*, but failed to account for the subtle difference in signatures (the former takes a semi-redundant length, the latter takes a flags parameter). Since all of our flags are currently smaller in size than BDRV_SECTOR_SIZE, it has no ill effects until we either perform sub-sector mirroring, or we start asserting that no unexpected flags are set. I found it while testing new asserts when qemu-iotests 132 started warning about an unknown flag 0x200000. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-06-16 15:19:55 +02:00
Kevin Wolf	e253f4b897	mirror: Use BlockBackend for I/O This changes the mirror block job to use the job's BlockBackend for performing its I/O. job->bs isn't used by the mirroring code any more afterwards. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-05-25 19:04:21 +02:00
Kevin Wolf	b880481579	mirror: Allow target that already has a BlockBackend We had to forbid mirroring to a target BDS that already had a BB attached because the node swapping at job completion would add a second BB and we didn't support multiple BBs on a single BDS at the time. Now we do, so we can lift the restriction. As we allow additional BlockBackends for the target, we must expect other users to be sending requests. There may no requests be in flight during the graph modification, so we have to drain those users now. The core part of this patch is a revert of commit `40365552`. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-05-25 19:04:21 +02:00
Kevin Wolf	b6d2e59995	block: Convert block job core to BlockBackend This adds a new BlockBackend field to the BlockJob struct, which coexists with the BlockDriverState while converting the individual jobs. When creating a block job, a new BlockBackend is created on top of the given BlockDriverState, and it is destroyed when the BlockJob ends. The reference to the BDS is now held by the BlockBackend instead of calling bdrv_ref/unref manually. We have to be careful when we use bdrv_replace_in_backing_chain() in block jobs because this changes the BDS that job->blk points to. At the moment block jobs are too tightly coupled with their BDS, so that moving a job to another BDS isn't easily possible; therefore, we need to just manually undo this change afterwards. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-05-25 19:04:21 +02:00
Kevin Wolf	1f0c461b82	block: Remove BlockDriverState.blk This patch removes the remaining users of bs->blk, which will allow us to have multiple BBs on top of a single BDS. In the meantime, all checks that are currently in place to prevent the user from creating such setups can be switched to bdrv_has_blk() instead of accessing BDS.blk. Future patches can allow them and e.g. enable users to mirror to a block device that already has a BlockBackend on it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-05-19 16:45:31 +02:00
Kevin Wolf	66a0fae438	blockjob: Don't touch BDS iostatus Block jobs don't actually make use of the iostatus for their BDSes, but they manage a separate block job iostatus. Still, they require that it is enabled for the source BDS and they enable it automatically for the target and set the error handling mode - which ends up never being used by the job. This patch removes all of the BDS iostatus handling from the block job, which removes another few bs->blk accesses. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-05-19 16:45:31 +02:00
Kevin Wolf	81e254dc83	blockjob: Don't set iostatus of target When block job errors were introduced, we assigned the iostatus of the target BDS "just in case". The field has never been accessible for the user because the target isn't listed in query-block. Before we can allow the user to have a second BlockBackend on the target, we need to clean this up. If anything, we would want to set the iostatus for the internal BB of the job (which we can always do later), but certainly not for a separate BB which the job doesn't even use. As a nice side effect, this gets us rid of another bs->blk use. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-05-19 16:45:31 +02:00
Fam Zheng	ab27c3b5e7	mirror: Workaround for unexpected iohandler events during completion Commit `5a7e7a0ba` moved mirror_exit to a BH handler but didn't add any protection against new requests that could sneak in just before the BH is dispatched. For example (assuming a code base at that commit): main_loop_wait # 1 os_host_main_loop_wait g_main_context_dispatch aio_ctx_dispatch aio_dispatch ... mirror_run bdrv_drain (a) block_job_defer_to_main_loop qemu_iohandler_poll virtio_queue_host_notifier_read ... virtio_submit_multiwrite (b) blk_aio_multiwrite main_loop_wait # 2 <snip> aio_dispatch aio_bh_poll (c) mirror_exit At (a) we know the BDS has no pending request. However, the same main_loop_wait call is going to dispatch iohandlers (EventNotifier events), which may lead to a new I/O from guest. So the invariant is already broken at (c). Data loss. Commit `f3926945c8` made iohandler to use aio API. The order of virtio_queue_host_notifier_read and block_job_defer_to_main_loop within a main_loop_wait becomes unpredictable, and even worse, if the host notifier event arrives at the next main_loop_wait call, the unpredictable order between mirror_exit and virtio_queue_host_notifier_read is also a trouble. As shown below, this commit made the bug easier to trigger: - Bug case 1: main_loop_wait # 1 os_host_main_loop_wait g_main_context_dispatch aio_ctx_dispatch (qemu_aio_context) ... mirror_run bdrv_drain (a) block_job_defer_to_main_loop aio_ctx_dispatch (iohandler_ctx) virtio_queue_host_notifier_read ... virtio_submit_multiwrite (b) blk_aio_multiwrite main_loop_wait # 2 ... aio_dispatch aio_bh_poll (c) mirror_exit - Bug case 2: main_loop_wait # 1 os_host_main_loop_wait g_main_context_dispatch aio_ctx_dispatch (qemu_aio_context) ... mirror_run bdrv_drain (a) block_job_defer_to_main_loop main_loop_wait # 2 ... aio_ctx_dispatch (iohandler_ctx) virtio_queue_host_notifier_read ... virtio_submit_multiwrite (b) blk_aio_multiwrite aio_dispatch aio_bh_poll (c) mirror_exit In both cases, (b) breaks the invariant wanted by (a) and (c). Until then, the request loss has been silent. Later, `3f09bfbc7b` added asserts at (c) to check the invariant (in bdrv_replace_in_backing_chain), and Max reported an assertion failure first visible there, by doing active committing while the guest is running bonnie++. 2.5 added bdrv_drained_begin at (a) to protect the dataplane case from similar problems, but we never realize the main loop bug until now. As a bandage, this patch disables iohandler's external events temporarily together with bs->ctx. Launchpad Bug: 1570134 Cc: qemu-stable@nongnu.org Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-04-22 16:44:09 +02:00
Fam Zheng	4150ae60eb	mirror: Don't extend the last sub-chunk The last sub-chunk is rounded up to the copy granularity in the target image, resulting in a larger size than the source. Add a function to clip the copied sectors to the end. This undoes the "wrong" changes to tests/qemu-iotests/109.out in `e5b43573e2`. The remaining two offset changes are okay. [ kwolf: Use DIV_ROUND_UP to calculate nb_chunks now ] Reported-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com>	2016-04-20 16:52:55 +02:00
Max Reitz	f27a274259	block/mirror: Refresh stale bitmap iterator cache If the drive's dirty bitmap is dirtied while the mirror operation is running, the cache of the iterator used by the mirror code may become stale and not contain all dirty bits. This only becomes an issue if we are looking for contiguously dirty chunks on the drive. In that case, we can easily detect the discrepancy and just refresh the iterator if one occurs. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-04-20 16:52:55 +02:00

1 2 3

148 Commits