qemu-e2k

Author	SHA1	Message	Date
Eric Blake	b8d0a9804d	block: Cater to iscsi with non-power-of-2 discard Dell Equallogic iSCSI SANs have a very unusual advertised geometry: $ iscsi-inq -e 1 -c $((0xb0)) iscsi://XXX/0 wsnz:0 maximum compare and write length:1 optimal transfer length granularity:0 maximum transfer length:0 optimal transfer length:0 maximum prefetch xdread xdwrite transfer length:0 maximum unmap lba count:30720 maximum unmap block descriptor count:2 optimal unmap granularity:30720 ugavalid:1 unmap granularity alignment:0 maximum write same length:30720 which says that both the maximum and the optimal discard size is 15M. It is not immediately apparent if the device allows discard requests not aligned to the optimal size, nor if it allows discards at a finer granularity than the optimal size. I tried to find details in the SCSI Commands Reference Manual Rev. A on what valid values of maximum and optimal sizes are permitted, but while that document mentions a "Block Limits VPD Page", I couldn't actually find documentation of that page or what values it would have, or if a SCSI device has an advertisement of its minimal unmap granularity. So it is not obvious to me whether the Dell Equallogic device is compliance with the SCSI specification. Fortunately, it is easy enough to support non-power-of-2 sizing, even if it means we are less efficient than truly possible when targetting that device (for example, it means that we refuse to unmap anything that is not a multiple of 15M and aligned to a 15M boundary, even if the device truly does support a smaller granularity where unmapping actually works). Reported-by: Peter Lieven <pl@kamp.de> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1469129688-22848-5-git-send-email-eblake@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-08-03 18:44:57 +02:00
Peter Maydell	61ead113ae	Pull request v2: * Resolved merge conflict with block/iscsi.c [Peter] -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJXj6TkAAoJEJykq7OBq3PI15oIAL24OFnMTFOAWyY2h3yfzGwe Gn1qzFOTHCXgbgVsolVuZnFU/SDn9WzMS9y6o7jcJykv++FdMSYUO7paPQAg9etX o9KOgyfUUDhskbaXEbTKxhqy7UvcyJsZYmjN+TD5tupfnrqGlmu6rkUPpSwGo++G u7CkAZewJc5SvsCdQRe3N10LbG4xU/j06cjCIHWDPRr+AV9qvsb9KkIOg3wkrhWN R83yj+yX6vhfQouHLoOlh3RYZ5HNo020JPum0jT1tGcoshKoFyu0prSRqNVVMqTY BepvWMhXbiq219HnFaQBtnysve9gWix4WhuCHtaAlaZDTBh945ZztsW7w7dubtA= =7ics -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging Pull request v2: * Resolved merge conflict with block/iscsi.c [Peter] # gpg: Signature made Wed 20 Jul 2016 17:20:52 BST # gpg: using RSA key 0x9CA4ABB381AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" # Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8 * remotes/stefanha/tags/block-pull-request: (25 commits) raw_bsd: Convert to byte-based interface nbd: Convert to byte-based interface block: Kill .bdrv_co_discard() sheepdog: Switch .bdrv_co_discard() to byte-based raw_bsd: Switch .bdrv_co_discard() to byte-based qcow2: Switch .bdrv_co_discard() to byte-based nbd: Switch .bdrv_co_discard() to byte-based iscsi: Switch .bdrv_co_discard() to byte-based gluster: Switch .bdrv_co_discard() to byte-based blkreplay: Switch .bdrv_co_discard() to byte-based block: Add .bdrv_co_pdiscard() driver callback block: Convert .bdrv_aio_discard() to byte-based rbd: Switch rbd_start_aio() to byte-based raw-posix: Switch paio_submit() to byte-based block: Convert BB interface to byte-based discards block: Convert bdrv_aio_discard() to byte-based block: Switch BlockRequest to byte-based block: Convert bdrv_discard() to byte-based block: Convert bdrv_co_discard() to byte-based iscsi: Rely on block layer to break up large requests ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Conflicts: block/gluster.c	2016-07-21 11:00:36 +01:00
Eric Blake	02aefe43cb	block: Kill .bdrv_co_discard() Now that all drivers have a byte-based .bdrv_co_pdiscard(), we no longer need to worry about the sector-based version. We can also relax our minimum alignment to 1 for drivers that support it. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1468624988-423-18-git-send-email-eblake@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-07-20 14:24:25 +01:00
Eric Blake	47a5486d59	block: Add .bdrv_co_pdiscard() driver callback There's enough drivers with a sector-based callback that it will be easier to switch one at a time. This patch adds a byte-based callback, and then after all drivers are swapped, we'll drop the sector-based callback. [checkpatch doesn't like the space after coroutine_fn in block_int.h, but it's consistent with the rest of the file] Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1468624988-423-10-git-send-email-eblake@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-07-20 14:11:55 +01:00
Eric Blake	4da444a0bb	block: Convert .bdrv_aio_discard() to byte-based Another step towards byte-based interfaces everywhere. Replace the sector-based driver callback .bdrv_aio_discard() with a new byte-based .bdrv_aio_pdiscard(). Only raw-posix and RBD drivers are affected, so it was not worth splitting into multiple patches. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1468624988-423-9-git-send-email-eblake@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-07-20 14:11:55 +01:00
Denis V. Lunev	6d07859926	dirty-bitmap: operate with int64_t amount Underlying HBitmap operates even with uint64_t. Thus this change is safe. This would be useful f.e. to mark entire bitmap dirty in one call. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1468503209-19498-2-git-send-email-den@openvz.org CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> CC: Jeff Cody <jcody@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-07-19 16:54:46 -04:00
Evgeny Yakovlev	3ff2f67a7c	block: ignore flush requests when storage is clean Some guests (win2008 server for example) do a lot of unnecessary flushing when underlying media has not changed. This adds additional overhead on host when calling fsync/fdatasync. This change introduces a write generation scheme in BlockDriverState. Current write generation is checked against last flushed generation to avoid unnessesary flushes. The problem with excessive flushing was found by a performance test which does parallel directory tree creation (from 2 processes). Results improved from 0.424 loops/sec to 0.432 loops/sec. Each loop creates 10^3 directories with 10 files in each. This affected some blkdebug testcases that were expecting error logs from failure-injected flushes which are now skipped entirely (tests 026 071 089). This also affects the performance of block jobs and thus BLOCK_JOB_READY events for driver-mirror and active block-commit commands now arrives faster, before QMP send successfully returns to caller (tests 141 144). Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1468870792-7411-5-git-send-email-den@openvz.org CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Fam Zheng <famz@redhat.com> CC: John Snow <jsnow@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com>	2016-07-18 18:19:01 -04:00
Alberto Garcia	fd62c609ed	commit: Add 'job-id' parameter to 'block-commit' This patch adds a new optional 'job-id' parameter to 'block-commit', allowing the user to specify the ID of the block job to be created. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-07-13 13:26:02 +02:00
Alberto Garcia	2323322ed0	stream: Add 'job-id' parameter to 'block-stream' This patch adds a new optional 'job-id' parameter to 'block-stream', allowing the user to specify the ID of the block job to be created. The HMP 'block_stream' command remains unchanged. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-07-13 13:26:02 +02:00
Alberto Garcia	70559d499c	backup: Add 'job-id' parameter to 'blockdev-backup' and 'drive-backup' This patch adds a new optional 'job-id' parameter to 'blockdev-backup' and 'drive-backup', allowing the user to specify the ID of the block job to be created. The HMP 'drive_backup' command remains unchanged. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-07-13 13:26:02 +02:00
Alberto Garcia	71aa98678c	mirror: Add 'job-id' parameter to 'blockdev-mirror' and 'drive-mirror' This patch adds a new optional 'job-id' parameter to 'blockdev-mirror' and 'drive-mirror', allowing the user to specify the ID of the block job to be created. The HMP 'drive_mirror' command remains unchanged. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-07-13 13:26:02 +02:00
Alberto Garcia	29338003c9	stream: Fix prototype of stream_start() 'stream-start' has a parameter called 'backing-file', which is the string to be written to bs->backing when the job finishes. In the stream_start() implementation it is called 'backing_file_str', but it the prototype in the header file it is called 'base_id'. This patch fixes it so the name is the same in both cases and is consistent with other cases (like commit_start()). Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-07-13 13:26:02 +02:00
Kevin Wolf	a03ef88f77	block: Convert bdrv_co_preadv/pwritev to BdrvChild This is the final patch for converting the common I/O path to take a BdrvChild parameter instead of BlockDriverState. The completion of this conversion means that all users that perform I/O on an image need to actually hold a reference (in the form of BdrvChild, possible as part of a BlockBackend) to that image. This also protects against inconsistent use of BlockBackend vs. BlockDriverState functions because direct use of a BlockDriverState isn't possible any more and blk->root is private for block-backends.c. In addition, we can now distinguish different users in the I/O path, and the future op blockers work is going to add assertions based on permissions stored in BdrvChild. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-07-05 16:46:27 +02:00
Eric Blake	5411541270	block: Use bool as appropriate for BDS members Using int for values that are only used as booleans is confusing. While at it, rearrange a couple of members so that all the bools are contiguous. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-07-05 16:46:26 +02:00
Eric Blake	a5b8dd2ce8	block: Move request_alignment into BlockLimit It makes more sense to have ALL block size limit constraints in the same struct. Improve the documentation while at it. Simplify a couple of conditionals, now that we have audited and documented that request_alignment is always non-zero. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-07-05 16:46:26 +02:00
Eric Blake	b9f7855a50	block: Switch discard length bounds to byte-based Sector-based limits are awkward to think about; in our on-going quest to move to byte-based interfaces, convert max_discard and discard_alignment. Rename them, using 'pdiscard' as an aid to track which remaining discard interfaces need conversion, and so that the compiler will help us catch the change in semantics across any rebased code. The BlockLimits type is now completely byte-based; and in iscsi.c, sector_limits_lun2qemu() is no longer needed. pdiscard_alignment is made unsigned (we use power-of-2 alignments as bitmasks, where unsigned is easier to think about) while leaving max_pdiscard signed (since we still have an 'int' interface); this is comparable to what commit `cf081fc` did for write zeroes limits. We may later want to make everything an unsigned 64-bit limit - but that requires a bigger code audit. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-07-05 16:46:25 +02:00
Eric Blake	29cc6a6834	block: Wording tweaks to write zeroes limits Improve the documentation of the write zeroes limits, to mention additional constraints that drivers should observe. Worth squashing into commit `cf081fca`, if that hadn't been pushed already :) Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-07-05 16:46:25 +02:00
Eric Blake	5def6b80e1	block: Switch transfer length bounds to byte-based Sector-based limits are awkward to think about; in our on-going quest to move to byte-based interfaces, convert max_transfer_length and opt_transfer_length. Rename them (dropping the _length suffix) so that the compiler will help us catch the change in semantics across any rebased code, and improve the documentation. Use unsigned values, so that we don't have to worry about negative values and so that bit-twiddling is easier; however, we are still constrained by 2^31 of signed int in most APIs. When a value comes from an external source (iscsi and raw-posix), sanitize the results to ensure that opt_transfer is a power of 2. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-07-05 16:46:25 +02:00
Stefan Hajnoczi	e8a095dadb	block: use safe iteration over AioContext notifiers It's possible that an AioContext notifier user was close to finishing when .detach_aio_context() or .attached_aio_context() is called. In that case they may call bdrv_remove_aio_context_notifier() during the callback. Use safe iteration to avoid crashing when the notifier list is modified during iteration. We must not only handle the case where the current aio notifier is removed during a callback but also the one where any other aio notifier is removed. The next patch adds an AioContext notifier for block jobs and they really could be terminating just as .detach_aio_context() is invoked. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1466096189-6477-6-git-send-email-stefanha@redhat.com	2016-06-20 14:25:41 +01:00
Max Reitz	274fccee2b	block/mirror: Fix target backing BDS Currently, we are trying to move the backing BDS from the source to the target in bdrv_replace_in_backing_chain() which is called from mirror_exit(). However, mirror_complete() already tries to open the target's backing chain with a call to bdrv_open_backing_file(). First, we should only set the target's backing BDS once. Second, the mirroring block job has a better idea of what to set it to than the generic code in bdrv_replace_in_backing_chain() (in fact, the latter's conditions on when to move the backing BDS from source to target are not really correct). Therefore, remove that code from bdrv_replace_in_backing_chain() and leave it to mirror_complete(). Depending on what kind of mirroring is performed, we furthermore want to use different strategies to open the target's backing chain: - If blockdev-mirror is used, we can assume the user made sure that the target already has the correct backing chain. In particular, we should not try to open a backing file if the target does not have any yet. - If drive-mirror with mode=absolute-paths is used, we can and should reuse the already existing chain of nodes that the source BDS is in. In case of sync=full, no backing BDS is required; with sync=top, we just link the source's backing BDS to the target, and with sync=none, we use the source BDS as the target's backing BDS. We should not try to open these backing files anew because this would lead to two BDSs existing per physical file in the backing chain, and we would like to avoid such concurrent access. - If drive-mirror with mode=existing is used, we have to use the information provided in the physical image file which means opening the target's backing chain completely anew, just as it has been done already. If the target's backing chain shares images with the source, this may lead to multiple BDSs per physical image file. But since we cannot reliably ascertain this case, there is nothing we can do about it. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20160610185750.30956-3-mreitz@redhat.com Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-06-16 15:20:37 +02:00
Kevin Wolf	c9d20029f4	block: Remove bs->zero_beyond_eof It is always true for open images now. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-06-16 15:19:56 +02:00
Kevin Wolf	1a8ae82217	block: Make bdrv_load/save_vmstate coroutine_fns This allows drivers to share code between normal I/O and vmstate accesses. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-06-16 15:19:56 +02:00
Kevin Wolf	5ddda0b8f0	block: Make .bdrv_load_vmstate() vectored This brings it in line with .bdrv_save_vmstate(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-06-16 15:19:55 +02:00
Eric Blake	c1499a5e73	block: Kill bdrv_co_write_zeroes() Now that all drivers have been converted to a byte interface, we no longer need a sector interface. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-06-08 10:21:08 +02:00
Eric Blake	d05aa8bb4a	block: Add .bdrv_co_pwrite_zeroes() Update bdrv_co_do_write_zeroes() to be byte-based, and select between the new byte-based bdrv_co_pwrite_zeroes() or the old bdrv_co_write_zeroes(). The next patches will convert drivers, then remove the old interface. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-06-08 10:21:08 +02:00
Eric Blake	cf081fca4e	block: Track write zero limits in bytes Another step towards removing sector-based interfaces: convert the maximum write and minimum alignment values from sectors to bytes. Rename the variables to let the compiler check that all users are converted to the new semantics. The maximum remains an int as long as BDRV_REQUEST_MAX_SECTORS is constrained by INT_MAX (this means that we can't even support a 2G write_zeroes, but just under it) - changing operation lengths to unsigned or to 64-bits is a much bigger audit, and debatable if we even want to do it (since at the core, a 32-bit platform will still have ssize_t as its underlying limit on write()). Meanwhile, alignment is changed to 'uint32_t', since it makes no sense to have an alignment larger than the maximum write, and less painful to use an unsigned type with well-defined behavior in bit operations than to have to worry about what happens if a driver mistakenly supplies a negative alignment. Add an assert that no one was trying to use sectors to get a write zeroes larger than 2G, and therefore that a later conversion to bytes won't be impacted by keeping the limit at 32 bits. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-06-08 10:21:08 +02:00
Kevin Wolf	36fe13317b	block: Fix reconfiguring graph with drained nodes When changing the BlockDriverState that a BdrvChild points to while the node is currently drained, we must call the .drained_end() parent callback. Conversely, when this means attaching a new node that is already drained, we need to call .drained_begin(). bdrv_root_attach_child() takes now an opaque parameter, which is needed because the callbacks must also be called if we're attaching a new child to the BlockBackend when the root node is already drained, and they need a way to identify the BlockBackend. Previously, child->opaque was set too late and the callbacks would still see it as NULL. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-25 19:04:10 +02:00
Kevin Wolf	1f0c461b82	block: Remove BlockDriverState.blk This patch removes the remaining users of bs->blk, which will allow us to have multiple BBs on top of a single BDS. In the meantime, all checks that are currently in place to prevent the user from creating such setups can be switched to bdrv_has_blk() instead of accessing BDS.blk. Future patches can allow them and e.g. enable users to mirror to a block device that already has a BlockBackend on it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-05-19 16:45:31 +02:00
Kevin Wolf	4c265bf9f4	block: User BdrvChild callback for device name In order to get rid of bs->blk for bdrv_get_device_name() and bdrv_get_device_or_node_name(), ask all parents for their name and simply pick the first one. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-05-19 16:45:31 +02:00
Kevin Wolf	5c8cab4808	block: Use BdrvChild callbacks for change_media/resize We want to get rid of BlockDriverState.blk in order to allow multiple BlockBackends per BDS. Converting the device callbacks in block.c (which assume a single BlockBackend) to per-child callbacks gets us rid of the first few instances. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-05-19 16:45:31 +02:00
Kevin Wolf	7ca7f0f6db	block: Decouple throttling from BlockDriverState This moves the throttling related part of the BDS life cycle management to BlockBackend. The throttling group reference is now kept even when no medium is inserted. With this commit, throttling isn't disabled and then re-enabled any more during graph reconfiguration. This fixes the temporary breakage of I/O throttling when used with live snapshots or block jobs that manipulate the graph. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-05-19 16:45:30 +02:00
Kevin Wolf	c2066af051	block: Drain throttling queue with BdrvChild callback This removes the last part of I/O throttling from block/io.c and moves it to the BlockBackend. Instead of having knowledge about throttling inside io.c, we can call a BdrvChild callback .drained_begin/end, which happens to drain the throttled requests for BlockBackend parents. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-05-19 16:45:30 +02:00
Kevin Wolf	22aa8b246a	block: Introduce BdrvChild.opaque BlockBackends use it to get a back pointer from BdrvChild to BlockBackend in any BdrvChildRole callbacks. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-05-19 16:45:30 +02:00
Kevin Wolf	97148076e8	block: Move I/O throttling configuration functions to BlockBackend Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-05-19 16:45:30 +02:00
Kevin Wolf	27ccdd5259	block: Move throttling fields from BDS to BB This patch changes where the throttling state is stored (used to be the BlockDriverState, now it is the BlockBackend), but it doesn't actually make it a BB level feature yet. For example, throttling is still disabled when the BDS is detached from the BB. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-05-19 16:45:30 +02:00
Kevin Wolf	31dce3ccca	block: throttle-groups: Use BlockBackend pointers internally As a first step towards moving I/O throttling to the BlockBackend level, this patch changes all pointers in struct ThrottleGroup from referencing a BlockDriverState to referencing a BlockBackend. This change is valid because we made sure that throttling can only be enabled on BDSes which have a BB attached. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-05-19 16:45:29 +02:00
Wen Congyang	e06018ad28	Add new block driver interface to add/delete a BDS's child In some cases, we want to take a quorum child offline, and take another child online. Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Gonglei <arei.gonglei@huawei.com> Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 1462865799-19402-2-git-send-email-xiecl.fnst@cn.fujitsu.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-05-12 15:33:23 +02:00
Eric Blake	465fe887cc	block: Honor BDRV_REQ_FUA during write_zeroes The block layer has a couple of cases where it can lose Force Unit Access semantics when writing a large block of zeroes, such that the request returns before the zeroes have been guaranteed to land on underlying media. SCSI does not support FUA during WRITESAME(10/16); FUA is only supported if it falls back to WRITE(10/16). But where the underlying device is new enough to not need a fallback, it means that any upper layer request with FUA semantics was silently ignoring BDRV_REQ_FUA. Conversely, NBD has situations where it can support FUA but not ZERO_WRITE; when that happens, the generic block layer fallback to bdrv_driver_pwritev() (or the older bdrv_co_writev() in qemu 2.6) was losing the FUA flag. The problem of losing flags unrelated to ZERO_WRITE has been latent in bdrv_co_do_write_zeroes() since commit `aa7bfbff`, but back then, it did not matter because there was no FUA flag. It became observable when commit `93f5e6d8` paved the way for flags that can impact correctness, when we should have been using bdrv_co_writev_flags() with modified flags. Compare to commit `9eeb6dd`, which got flag manipulation right in bdrv_co_do_zero_pwritev(). Symptoms: I tested with qemu-io with default writethrough cache (which is supposed to use FUA semantics on every write), and targetted an NBD client connected to a server that intentionally did not advertise NBD_FLAG_SEND_FUA. When doing 'write 0 512', the NBD client sent two operations (NBD_CMD_WRITE then NBD_CMD_FLUSH) to get the fallback FUA semantics; but when doing 'write -z 0 512', the NBD client sent only NBD_CMD_WRITE. The fix is do to a cleanup bdrv_co_flush() at the end of the operation if any step in the middle relied on a BDS that does not natively support FUA for that step (note that we don't need to flush after every operation, if the operation is broken into chunks based on bounce-buffer sizing). Each BDS gains a new flag .supported_zero_flags, which parallels the use of .supported_write_flags but only when accessing a zero write operation (the flags MUST be different, because of SCSI having different semantics based on WRITE vs. WRITESAME; and also because BDRV_REQ_MAY_UNMAP only makes sense on zero writes). Also fix some documentation to describe -ENOTSUP semantics, particularly since iscsi depends on those semantics. Down the road, we may want to add a driver where its .bdrv_co_pwritev() honors all three of BDRV_REQ_FUA, BDRV_REQ_ZERO_WRITE, and BDRV_REQ_MAY_UNMAP, and advertise this via bs->supported_write_flags for blocks opened by that driver; such a driver should NOT supply .bdrv_co_write_zeroes nor .supported_zero_flags. But none of the drivers touched in this patch want to do that (the act of writing zeroes is different enough from normal writes to deserve a second callback). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:09 +02:00
Eric Blake	4df863f336	block: Make supported_write_flags a per-bds property Pre-patch, .supported_write_flags lives at the driver level, which means we are blindly declaring that all block devices using a given driver will either equally support FUA, or that we need a fallback at the block layer. But there are drivers where FUA support is a per-block decision: the NBD block driver is dependent on the remote server advertising NBD_FLAG_SEND_FUA (and has fallback code to duplicate the flush that the block layer would do if NBD had not set .supported_write_flags); and the iscsi block driver is dependent on the mode sense bits advertised by the underlying device (and is currently silently ignoring FUA requests if the underlying device does not support FUA). The fix is to make supported flags as a per-BDS option, set during .bdrv_open(). This patch moves the variable and fixes NBD and iscsi to set it only conditionally; later patches will then further simplify the NBD driver to quit duplicating work done at the block layer, as well as tackle the fact that SCSI does not support FUA semantics on WRITESAME(10/16) but only on WRITE(10/16). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:09 +02:00
Janne Karhunen	f249924e96	Allow users to specify the vmdk virtual hardware version. Vmdk images have metadata to indicate the vmware virtual hardware version image was created/tested to run with. Allow users to specify that version via new 'hwversion' option. [ kwolf: Adjust qemu-iotests common.filter ] Signed-off-by: Janne Karhunen <Janne.Karhunen@gmail.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:08 +02:00
Kevin Wolf	e3ddef25e9	block: Remove BlockDriver.bdrv_read/write There are no block drivers left that implement the old .bdrv_read/write interface, so it can be removed now. This gets us rid of the corresponding emulation functions, too. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-12 15:22:08 +02:00
Kevin Wolf	3fb06697ae	block: Introduce .bdrv_co_preadv/pwritev BlockDriver function Many parts of the block layer are already byte granularity. The block driver interface, however, was still missing an interface that allows making use of this. This patch introduces a new BlockDriver interface, which is based on coroutines, vectored, has flags and uses a byte granularity. This is now the preferred interface for new drivers. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-12 15:22:08 +02:00
Kevin Wolf	cab3a3563c	block: Rename bdrv_co_do_preadv/writev to bdrv_co_preadv/writev It used to be an internal helper function just for implementing bdrv_co_do_readv/writev(), but now that it's a public interface, it deserves a name without "do" in it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-12 15:22:08 +02:00
Paolo Bonzini	6b98bd6495	block: plug whole tree at once, introduce bdrv_io_unplugged_begin/end Extract the handling of io_plug "depth" from linux-aio.c and let the main bdrv_drain loop do nothing but wait on I/O. Like the two newly introduced functions, bdrv_io_plug and bdrv_io_unplug now operate on all children. The visit order is now symmetrical between plug and unplug, making it possible for formats to implement plug/unplug. Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:07 +02:00
Paolo Bonzini	ce0f141259	block: introduce bdrv_no_throttling_begin/end Extract the handling of throttling from bdrv_flush_io_queue. These new functions will soon become BdrvChildRole callbacks, as they can be generalized to "beginning of drain" and "end of drain". Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:07 +02:00
Kevin Wolf	93f5e6d88a	block: Introduce bdrv_co_writev_flags() This function will allow drivers to implement BDRV_REQ_FUA natively instead of sending a separate flush after the write. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-03-30 12:16:02 +02:00
Kevin Wolf	bfd18d1e0b	block: Move enable_write_cache to BB level Whether a write cache is used or not is a decision that concerns the user (e.g. the guest device) rather than the backend. It was already logically part of the BB level as bdrv_move_feature_fields() always kept it on top of the BDS tree; with this patch, the core of it (the actual flag and the additional flushes) is also implemented there. Direct callers of bdrv_open() must pass BDRV_O_CACHE_WB now if bs doesn't have a BlockBackend attached. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-03-30 12:16:02 +02:00
Pavel Dovgalyuk	c32b82afaf	block: add flush callback This patch adds callback for flush request. This callback is responsible for flushing whole block devices stack. bdrv_flush function does not proceed to underlying devices. It should be performed by this callback function, if needed. Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-30 12:12:15 +02:00
Kevin Wolf	72f41b6fbd	block: Remove blk_set_bs() The function is unused since commit `f21d96d0` ('block: Use BdrvChild in BlockBackend'). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2016-03-30 11:59:32 +02:00
Kevin Wolf	a8823a3bfd	block: Use blk_co_pwritev() for blk_write() Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-17 15:47:57 +01:00

1 2 3 4

199 Commits