qemu-e2k

Commit Graph

Author	SHA1	Message	Date
Philippe Mathieu-Daudé	31e404151b	cutils: Move size_to_str() from "qemu-common.h" to "qemu/cutils.h" "qemu/cutils.h" contains various qemu_strtosz_*() functions useful to convert strings to size. It seems natural to have the opposite usage (from size to string) there too. The function definition is already in util/cutils.c. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <20190903120555.7551-1-philmd@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2019-09-19 11:57:34 +02:00
Maxim Levitsky	603fbd076c	block/qcow2: refactor encryption code * Change the qcow2_co_{encrypt\|decrypt} to just receive full host and guest offsets and use this function directly instead of calling do_perform_cow_encrypt (which is removed by that patch). * Adjust qcow2_co_encdec to take full host and guest offsets as well. * Document the qcow2_co_{encrypt\|decrypt} arguments to prevent the bug fixed in former commit from hopefully happening again. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-id: 20190915203655.21638-3-mlevitsk@redhat.com Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [mreitz: Let perform_cow() return the error value returned by qcow2_co_encrypt(), as proposed by Vladimir] Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-09-16 15:36:22 +02:00
Maxim Levitsky	38e7d54bdc	block/qcow2: Fix corruption introduced by commit `8ac0f15f33` This fixes subtle corruption introduced by luks threaded encryption in commit `8ac0f15f33` Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1745922 The corruption happens when we do a write that * writes to two or more unallocated clusters at once * doesn't fully cover the first sector * doesn't fully cover the last sector * uses luks encryption In this case, when allocating the new clusters we COW both areas prior to the write and after the write, and we encrypt them. The above mentioned commit accidentally made it so we encrypt the second COW area using the physical cluster offset of the first area. The problem is that offset_in_cluster in do_perform_cow_encrypt can be larger that the cluster size, thus cluster_offset will no longer point to the start of the cluster at which encrypted area starts. Next patch in this series will refactor the code to avoid all these assumptions. In the bugreport that was triggered by rebasing a luks image to new, zero filled base, which lot of such writes, and causes some files with zero areas to contain garbage there instead. But as described above it can happen elsewhere as well Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190915203655.21638-2-mlevitsk@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-09-16 15:35:02 +02:00
Max Reitz	c34dc07f9f	curl: Check curl_multi_add_handle()'s return code If we had done that all along, debugging would have been much simpler. (Also, I/O errors are better than hangs.) Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190910124136.10565-8-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-09-16 15:31:12 +02:00
Max Reitz	bfb23b480a	curl: Handle success in multi_check_completion Background: As of cURL 7.59.0, it verifies that several functions are not called from within a callback. Among these functions is curl_multi_add_handle(). curl_read_cb() is a callback from cURL and not a coroutine. Waking up acb->co will lead to entering it then and there, which means the current request will settle and the caller (if it runs in the same coroutine) may then issue the next request. In such a case, we will enter curl_setup_preadv() effectively from within curl_read_cb(). Calling curl_multi_add_handle() will then fail and the new request will not be processed. Fix this by not letting curl_read_cb() wake up acb->co. Instead, leave the whole business of settling the AIOCB objects to curl_multi_check_completion() (which is called from our timer callback and our FD handler, so not from any cURL callbacks). Reported-by: Natalie Gavrielov <ngavrilo@redhat.com> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1740193 Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190910124136.10565-7-mreitz@redhat.com Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-09-16 15:31:11 +02:00
Max Reitz	9abaf9fc47	curl: Report only ready sockets Instead of reporting all sockets to cURL, only report the one that has caused curl_multi_do_locked() to be called. This lets us get rid of the QLIST_FOREACH_SAFE() list, which was actually wrong: SAFE foreaches are only safe when the current element is removed in each iteration. If it possible for the list to be concurrently modified, we cannot guarantee that only the current element will be removed. Therefore, we must not use QLIST_FOREACH_SAFE() here. Fixes: `ff5ca1664a` Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190910124136.10565-6-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-09-16 15:31:11 +02:00
Max Reitz	9dbad87d25	curl: Pass CURLSocket to curl_multi_do() curl_multi_do_locked() currently marks all sockets as ready. That is not only inefficient, but in fact unsafe (the loop is). A follow-up patch will change that, but to do so, curl_multi_do_locked() needs to know exactly which socket is ready; and that is accomplished by this patch here. Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190910124136.10565-5-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-09-16 15:31:11 +02:00
Max Reitz	948403bcb1	curl: Check completion in curl_multi_do() While it is more likely that transfers complete after some file descriptor has data ready to read, we probably should not rely on it. Better be safe than sorry and call curl_multi_check_completion() in curl_multi_do(), too, just like it is done in curl_multi_read(). With this change, curl_multi_do() and curl_multi_read() are actually the same, so drop curl_multi_read() and use curl_multi_do() as the sole FD handler. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190910124136.10565-4-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-09-16 15:31:11 +02:00
Max Reitz	007f339b10	curl: Keep *socket until the end of curl_sock_cb() This does not really change anything, but it makes the code a bit easier to follow once we use @socket as the opaque pointer for aio_set_fd_handler(). Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190910124136.10565-3-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-09-16 15:31:11 +02:00
Max Reitz	0487861685	curl: Keep pointer to the CURLState in CURLSocket A follow-up patch will make curl_multi_do() and curl_multi_read() take a CURLSocket instead of the CURLState. They still need the latter, though, so add a pointer to it to the former. Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190910124136.10565-2-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-09-16 15:31:11 +02:00
Nir Soffer	1bbbf32d5f	block: Use QEMU_IS_ALIGNED Replace instances of: (n & (BDRV_SECTOR_SIZE - 1)) == 0 And: (n & ~BDRV_SECTOR_MASK) == 0 With: QEMU_IS_ALIGNED(n, BDRV_SECTOR_SIZE) Which reveals the intent of the code better, and makes it easier to locate the code checking alignment. Signed-off-by: Nir Soffer <nsoffer@redhat.com> Message-id: 20190827185913.27427-2-nsoffer@redhat.com Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-09-16 14:48:30 +02:00
Alberto Garcia	bf3d78ae55	qcow2: Stop overwriting compressed clusters one by one handle_alloc() tries to find as many contiguous clusters that need copy-on-write as possible in order to allocate all of them at the same time. However, compressed clusters are only overwritten one by one, so let's say that we have an image with 1024 consecutive compressed clusters: qemu-img create -f qcow2 hd.qcow2 64M for f in `seq 0 64 65472`; do qemu-io -c "write -c ${f}k 64k" hd.qcow2 done In this case trying to overwrite the whole image with one large write request results in 1024 separate allocations: qemu-io -c "write 0 64M" hd.qcow2 This restriction comes from commit `095a9c58ce` from 2008. Nowadays QEMU can overwrite multiple compressed clusters just fine, and in fact it already does: as long as the first cluster that handle_alloc() finds is not compressed, all other compressed clusters in the same batch will be overwritten in one go: qemu-img create -f qcow2 hd.qcow2 64M qemu-io -c "write -z 0 64k" hd.qcow2 for f in `seq 64 64 65472`; do qemu-io -c "write -c ${f}k 64k" hd.qcow2 done Compared to the previous one, overwriting this image on my computer goes from 8.35s down to 230ms. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: John Snow <jsnow@redhat.com Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-09-13 12:18:37 +02:00
Philippe Mathieu-Daudé	d90d5cae2b	block/create: Do not abort if a block driver is not available The 'blockdev-create' QMP command was introduced as experimental feature in commit `b0292b851b`, using the assert() debug call. It got promoted to 'stable' command in `3fb588a0f2`, but the assert call was not removed. Some block drivers are optional, and bdrv_find_format() might return a NULL value, triggering the assertion. Stable code is not expected to abort, so return an error instead. This is easily reproducible when libnfs is not installed: ./configure [...] module support no Block whitelist (rw) Block whitelist (ro) libiscsi support yes libnfs support no [...] Start QEMU: $ qemu-system-x86_64 -S -qmp unix:/tmp/qemu.qmp,server,nowait Send the 'blockdev-create' with the 'nfs' driver: $ ( cat << 'EOF' {'execute': 'qmp_capabilities'} {'execute': 'blockdev-create', 'arguments': {'job-id': 'x', 'options': {'size': 0, 'driver': 'nfs', 'location': {'path': '/', 'server': {'host': '::1', 'type': 'inet'}}}}, 'id': 'x'} EOF ) \| socat STDIO UNIX:/tmp/qemu.qmp {"QMP": {"version": {"qemu": {"micro": 50, "minor": 1, "major": 4}, "package": "v4.1.0-733-g89ea03a7dc"}, "capabilities": ["oob"]}} {"return": {}} QEMU crashes: $ gdb qemu-system-x86_64 core Program received signal SIGSEGV, Segmentation fault. (gdb) bt #0 0x00007ffff510957f in raise () at /lib64/libc.so.6 #1 0x00007ffff50f3895 in abort () at /lib64/libc.so.6 #2 0x00007ffff50f3769 in _nl_load_domain.cold.0 () at /lib64/libc.so.6 #3 0x00007ffff5101a26 in .annobin_assert.c_end () at /lib64/libc.so.6 #4 0x0000555555d7e1f1 in qmp_blockdev_create (job_id=0x555556baee40 "x", options=0x555557666610, errp=0x7fffffffc770) at block/create.c:69 #5 0x0000555555c96b52 in qmp_marshal_blockdev_create (args=0x7fffdc003830, ret=0x7fffffffc7f8, errp=0x7fffffffc7f0) at qapi/qapi-commands-block-core.c:1314 #6 0x0000555555deb0a0 in do_qmp_dispatch (cmds=0x55555645de70 <qmp_commands>, request=0x7fffdc005c70, allow_oob=false, errp=0x7fffffffc898) at qapi/qmp-dispatch.c:131 #7 0x0000555555deb2a1 in qmp_dispatch (cmds=0x55555645de70 <qmp_commands>, request=0x7fffdc005c70, allow_oob=false) at qapi/qmp-dispatch.c:174 With this patch applied, QEMU returns a QMP error: {'execute': 'blockdev-create', 'arguments': {'job-id': 'x', 'options': {'size': 0, 'driver': 'nfs', 'location': {'path': '/', 'server': {'host': '::1', 'type': 'inet'}}}}, 'id': 'x'} {"id": "x", "error": {"class": "GenericError", "desc": "Block driver 'nfs' not found or not supported"}} Cc: qemu-stable@nongnu.org Reported-by: Xu Tian <xutian@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-09-13 12:18:37 +02:00
Peter Lieven	d2c6becbe0	block/nfs: add support for nfs_umount libnfs recently added support for unmounting. Add support in Qemu too. Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-09-13 12:18:15 +02:00
Peter Lieven	601dc65597	block/nfs: tear down aio before nfs_close nfs_close is a sync call from libnfs and has its own event handler polling on the nfs FD. Avoid that both QEMU and libnfs are intefering here. CC: qemu-stable@nongnu.org Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-09-13 12:18:14 +02:00
Max Reitz	1a37e31244	vpc: Return 0 from vpc_co_create() on success blockdev_create_run() directly uses .bdrv_co_create()'s return value as the job's return value. Jobs must return 0 on success, not just any nonnegative value. Therefore, using blockdev-create for VPC images may currently fail as the vpc driver may return a positive integer. Because there is no point in returning a positive integer anywhere in the block layer (all non-negative integers are generally treated as complete success), we probably do not want to add more such cases. Therefore, fix this problem by making the vpc driver always return 0 in case of success. Suggested-by: Kevin Wolf <kwolf@redhat.com> Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-09-10 08:58:43 +02:00
Kevin Wolf	effecce6bc	file-posix: Fix has_write_zeroes after NO_FALLBACK If QEMU_AIO_NO_FALLBACK is given, we always return failure and don't even try to use the BLKZEROOUT ioctl. In this failure case, we shouldn't disable has_write_zeroes because we didn't learn anything about the ioctl. The next request might not set QEMU_AIO_NO_FALLBACK and we can still use the ioctl then. Fixes: `738301e117` Reported-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2019-09-10 08:58:43 +02:00
Max Reitz	b2c6f23f4a	block/file-posix: Reduce xfsctl() use This patch removes xfs_write_zeroes() and xfs_discard(). Both functions have been added just before the same feature was present through fallocate(): - fallocate() has supported PUNCH_HOLE for XFS since Linux 2.6.38 (March 2011); xfs_discard() was added in December 2010. - fallocate() has supported ZERO_RANGE for XFS since Linux 3.15 (June 2014); xfs_write_zeroes() was added in November 2013. Nowadays, all systems that qemu runs on should support both fallocate() features (RHEL 7's kernel does). xfsctl() is still useful for getting the request alignment for O_DIRECT, so this patch does not remove our dependency on it completely. Note that xfs_write_zeroes() had a bug: It calls ftruncate() when the file is shorter than the specified range (because ZERO_RANGE does not increase the file length). ftruncate() may yield and then discard data that parallel write requests have written past the EOF in the meantime. Dropping the function altogether fixes the bug. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Fixes: `50ba5b2d99` Reported-by: Lukáš Doktor <ldoktor@redhat.com> Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Tested-by: Stefano Garzarella <sgarzare@redhat.com> Tested-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-09-10 08:58:43 +02:00
Vladimir Sementsov-Ogievskiy	bb0c940993	job: drop job_drain In job_finish_sync job_enter should be enough for a job to make some progress and draining is a wrong tool for it. So use job_enter directly here and drop job_drain with all related staff not used more. Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Tested-by: John Snow <jsnow@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-09-10 08:58:43 +02:00
Alberto Garcia	b70d08205b	qcow2: Fix the calculation of the maximum L2 cache size The size of the qcow2 L2 cache defaults to 32 MB, which can be easily larger than the maximum amount of L2 metadata that the image can have. For example: with 64 KB clusters the user would need a qcow2 image with a virtual size of 256 GB in order to have 32 MB of L2 metadata. Because of that, since commit `b749562d98` we forbid the L2 cache to become larger than the maximum amount of L2 metadata for the image, calculated using this formula: uint64_t max_l2_cache = virtual_disk_size / (s->cluster_size / 8); The problem with this formula is that the result should be rounded up to the cluster size because an L2 table on disk always takes one full cluster. For example, a 1280 MB qcow2 image with 64 KB clusters needs exactly 160 KB of L2 metadata, but we need 192 KB on disk (3 clusters) even if the last 32 KB of those are not going to be used. However QEMU rounds the numbers down and only creates 2 cache tables (128 KB), which is not enough for the image. A quick test doing 4KB random writes on a 1280 MB image gives me around 500 IOPS, while with the correct cache size I get 16K IOPS. Cc: qemu-stable@nongnu.org Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-09-10 08:58:43 +02:00
Eric Blake	f061656cc3	nbd: Implement client use of NBD FAST_ZERO The client side is fairly straightforward: if the server advertised fast zero support, then we can map that to BDRV_REQ_NO_FALLBACK support. A server that advertises FAST_ZERO but not WRITE_ZEROES is technically broken, but we can ignore that situation as it does not change our behavior. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190823143726.27062-4-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2019-09-05 16:03:26 -05:00
Andrey Shinkevich	294682cc3a	block: workaround for unaligned byte range in fallocate() Revert the commit `118f99442d` 'block/io.c: fix for the allocation failure' and use better error handling for file systems that do not support fallocate() for an unaligned byte range. Allow falling back to pwrite in case fallocate() returns EINVAL. Suggested-by: Kevin Wolf <kwolf@redhat.com> Suggested-by: Eric Blake <eblake@redhat.com> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Message-Id: <1566913973-15490-1-git-send-email-andrey.shinkevich@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2019-09-05 16:01:31 -05:00
Eric Blake	df18c04edf	nbd: Use g_autofree in a few places Thanks to our recent move to use glib's g_autofree, I can join the bandwagon. Getting rid of gotos is fun ;) There are probably more places where we could register cleanup functions and get rid of more gotos; this patch just focuses on the labels that existed merely to call g_free. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190824172813.29720-2-eblake@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2019-09-05 15:52:45 -05:00
Stefan Hajnoczi	236094c738	file-posix: fix request_alignment typo Fixes: `a6b257a08e` ("file-posix: Handle undetectable alignment") Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20190827101328.4062-1-stefanha@redhat.com Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-09-03 14:55:35 +02:00
Max Reitz	bedb8bb419	vmdk: Reject invalid compressed writes Compressed writes generally have to write full clusters, not just in theory but also in practice when it comes to vmdk's streamOptimized subformat. It currently is just silently broken for writes with non-zero in-cluster offsets: $ qemu-img create -f vmdk -o subformat=streamOptimized foo.vmdk 1M $ qemu-io -c 'write 4k 4k' -c 'read 4k 4k' foo.vmdk wrote 4096/4096 bytes at offset 4096 4 KiB, 1 ops; 00.01 sec (443.724 KiB/sec and 110.9309 ops/sec) read failed: Invalid argument (The technical reason is that vmdk_write_extent() just writes the incomplete compressed data actually to offset 4k. When reading the data, vmdk_read_extent() looks at offset 0 and finds the compressed data size to be 0, because that is what it reads from there. This yields an error.) For incomplete writes with zero in-cluster offsets, the error path when reading the rest of the cluster is a bit different, but the result is the same: $ qemu-img create -f vmdk -o subformat=streamOptimized foo.vmdk 1M $ qemu-io -c 'write 0k 4k' -c 'read 4k 4k' foo.vmdk wrote 4096/4096 bytes at offset 0 4 KiB, 1 ops; 00.01 sec (362.641 KiB/sec and 90.6603 ops/sec) read failed: Invalid argument (Here, vmdk_read_extent() finds the data and then sees that the uncompressed data is short.) It is better to reject invalid writes than to make the user believe they might have succeeded and then fail when trying to read it back. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190815153638.4600-5-mreitz@redhat.com Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-09-03 14:55:35 +02:00
Max Reitz	cdc0dd2586	vmdk: Use bdrv_dirname() for relative extent paths This makes iotest 033 pass with e.g. subformat=monolithicFlat. It also turns a former error in 059 into success. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190815153638.4600-3-mreitz@redhat.com Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-09-03 14:55:35 +02:00
Nir Soffer	3a20013fbb	block: posix: Always allocate the first block When creating an image with preallocation "off" or "falloc", the first block of the image is typically not allocated. When using Gluster storage backed by XFS filesystem, reading this block using direct I/O succeeds regardless of request length, fooling alignment detection. In this case we fallback to a safe value (4096) instead of the optimal value (512), which may lead to unneeded data copying when aligning requests. Allocating the first block avoids the fallback. Since we allocate the first block even with preallocation=off, we no longer create images with zero disk size: $ ./qemu-img create -f raw test.raw 1g Formatting 'test.raw', fmt=raw size=1073741824 $ ls -lhs test.raw 4.0K -rw-r--r--. 1 nsoffer nsoffer 1.0G Aug 16 23:48 test.raw And converting the image requires additional cluster: $ ./qemu-img measure -f raw -O qcow2 test.raw required size: 458752 fully allocated size: 1074135040 When using format like vmdk with multiple files per image, we allocate one block per file: $ ./qemu-img create -f vmdk -o subformat=twoGbMaxExtentFlat test.vmdk 4g Formatting 'test.vmdk', fmt=vmdk size=4294967296 compat6=off hwversion=undefined subformat=twoGbMaxExtentFlat $ ls -lhs test*.vmdk 4.0K -rw-r--r--. 1 nsoffer nsoffer 2.0G Aug 27 03:23 test-f001.vmdk 4.0K -rw-r--r--. 1 nsoffer nsoffer 2.0G Aug 27 03:23 test-f002.vmdk 4.0K -rw-r--r--. 1 nsoffer nsoffer 353 Aug 27 03:23 test.vmdk I did quick performance test for copying disks with qemu-img convert to new raw target image to Gluster storage with sector size of 512 bytes: for i in $(seq 10); do rm -f dst.raw sleep 10 time ./qemu-img convert -f raw -O raw -t none -T none src.raw dst.raw done Here is a table comparing the total time spent: Type Before(s) After(s) Diff(%) --------------------------------------- real 530.028 469.123 -11.4 user 17.204 10.768 -37.4 sys 17.881 7.011 -60.7 We can see very clear improvement in CPU usage. Signed-off-by: Nir Soffer <nsoffer@redhat.com> Message-id: 20190827010528.8818-2-nsoffer@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-09-03 14:55:35 +02:00
Vladimir Sementsov-Ogievskiy	5396234b96	block/qcow2: implement .bdrv_co_pwritev(_compressed)_part Implement and use new interface to get rid of hd_qiov. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20190604161514.262241-13-vsementsov@virtuozzo.com Message-Id: <20190604161514.262241-13-vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-08-27 14:58:42 +01:00
Vladimir Sementsov-Ogievskiy	df893d25ce	block/qcow2: implement .bdrv_co_preadv_part Implement and use new interface to get rid of hd_qiov. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20190604161514.262241-12-vsementsov@virtuozzo.com Message-Id: <20190604161514.262241-12-vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-08-27 14:58:42 +01:00
Vladimir Sementsov-Ogievskiy	00721a3529	block/qcow2: refactor qcow2_co_preadv to use buffer-based io Use buffer based io in encrypted case. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20190604161514.262241-11-vsementsov@virtuozzo.com Message-Id: <20190604161514.262241-11-vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-08-27 14:58:42 +01:00
Vladimir Sementsov-Ogievskiy	1acc3466a2	block/io: introduce bdrv_co_p{read, write}v_part Introduce extended variants of bdrv_co_preadv and bdrv_co_pwritev with qiov_offset parameter. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20190604161514.262241-10-vsementsov@virtuozzo.com Message-Id: <20190604161514.262241-10-vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-08-27 14:58:42 +01:00
Vladimir Sementsov-Ogievskiy	28c4da2869	block/io: bdrv_aligned_pwritev: use and support qiov_offset Use and support new API in bdrv_aligned_pwritev. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20190604161514.262241-9-vsementsov@virtuozzo.com Message-Id: <20190604161514.262241-9-vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-08-27 14:58:42 +01:00
Vladimir Sementsov-Ogievskiy	65cd4424b9	block/io: bdrv_aligned_preadv: use and support qiov_offset Use and support new API in bdrv_co_do_copy_on_readv. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20190604161514.262241-8-vsementsov@virtuozzo.com Message-Id: <20190604161514.262241-8-vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-08-27 14:58:42 +01:00
Vladimir Sementsov-Ogievskiy	2275cc90a1	block/io: bdrv_co_do_copy_on_readv: lazy allocation Allocate bounce_buffer only if it is really needed. Also, sub-optimize allocation size (why not?). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20190604161514.262241-7-vsementsov@virtuozzo.com Message-Id: <20190604161514.262241-7-vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-08-27 14:58:42 +01:00
Vladimir Sementsov-Ogievskiy	1143ec5ebf	block/io: bdrv_co_do_copy_on_readv: use and support qiov_offset Use and support new API in bdrv_co_do_copy_on_readv. Note that in case of allocated-in-top we need to shrink read size to MIN(..) by hand, as pre-patch this was actually done implicitly by qemu_iovec_concat (and we used local_qiov.size). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20190604161514.262241-6-vsementsov@virtuozzo.com Message-Id: <20190604161514.262241-6-vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-08-27 14:58:42 +01:00
Vladimir Sementsov-Ogievskiy	ac850bf099	block: define .*_part io handlers in BlockDriver Add handlers supporting qiov_offset parameter: bdrv_co_preadv_part bdrv_co_pwritev_part bdrv_co_pwritev_compressed_part This is used to reduce need of defining local_qiovs and hd_qiovs in all corners of block layer code. The following patches will increase usage of this new API part by part. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20190604161514.262241-5-vsementsov@virtuozzo.com Message-Id: <20190604161514.262241-5-vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-08-27 14:58:42 +01:00
Vladimir Sementsov-Ogievskiy	7a3f542fbd	block/io: refactor padding We have similar padding code in bdrv_co_pwritev, bdrv_co_do_pwrite_zeroes and bdrv_co_preadv. Let's combine and unify it. [Squashed in Vladimir's qemu-iotests 077 fix --Stefan] Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20190604161514.262241-4-vsementsov@virtuozzo.com Message-Id: <20190604161514.262241-4-vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-08-27 14:58:12 +01:00
Vladimir Sementsov-Ogievskiy	f76889e7b9	util/iov: improve qemu_iovec_is_zero We'll need to check a part of qiov soon, so implement it now. Optimization with align down to 4 * sizeof(long) is dropped due to: 1. It is strange: it aligns length of the buffer, but where is a guarantee that buffer pointer is aligned itself? 2. buffer_is_zero() is a better place for optimizations and it has them. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20190604161514.262241-3-vsementsov@virtuozzo.com Message-Id: <20190604161514.262241-3-vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-08-27 14:52:45 +01:00
Max Reitz	fbc8e1b7e4	vpc: Do not return RAW from block_status vpc is not really a passthrough driver, even when using the fixed subformat (where host and guest offsets are equal). It should handle preallocation like all other drivers do, namely by returning DATA \| RECURSE instead of RAW. There is no tangible difference but the fact that bdrv_is_allocated() no longer falls through to the protocol layer. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190725155512.9827-4-mreitz@redhat.com Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-08-19 17:13:26 +02:00
Max Reitz	4dd84ac9a7	vmdk: Make block_status recurse for flat extents Fixes: `69f47505ee` Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190725155512.9827-3-mreitz@redhat.com Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-08-19 17:13:26 +02:00
Max Reitz	ad6434dc62	vdi: Make block_status recurse for fixed images Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Fixes: `69f47505ee` Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190725155512.9827-2-mreitz@redhat.com Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-08-19 17:13:26 +02:00
Max Reitz	9956688a8f	vhdx: Fix .bdrv_has_zero_init() Fixed VHDX images cannot guarantee to be zero-initialized. If the image has the "fixed" subformat, forward the call to the underlying storage node. Reported-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190724171239.8764-9-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-08-19 17:13:26 +02:00
Max Reitz	0a28bf2826	vdi: Fix .bdrv_has_zero_init() Static VDI images cannot guarantee to be zero-initialized. If the image has been statically allocated, forward the call to the underlying storage node. Reported-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Weil <sw@weilnetz.de> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Tested-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20190724171239.8764-8-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-08-19 17:13:26 +02:00
Max Reitz	38841dcd27	qcow2: Fix .bdrv_has_zero_init() If a qcow2 file is preallocated, it can no longer guarantee that it initially appears as filled with zeroes. So implement .bdrv_has_zero_init() by checking whether the file is preallocated; if so, forward the call to the underlying storage node, except for when it is encrypted: Encrypted preallocated images always return effectively random data, so .bdrv_has_zero_init() must always return 0 for them. .bdrv_has_zero_init_truncate() can remain bdrv_has_zero_init_1(), because it presupposes PREALLOC_MODE_OFF. Reported-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190724171239.8764-7-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-08-19 17:13:26 +02:00
Max Reitz	b647d69adc	block: Use bdrv_has_zero_init_truncate() vhdx and parallels call bdrv_has_zero_init() when they do not really care about an image's post-create state but only about what happens when you grow an image. That is a bit ugly, and also overly safe when growing preallocated images without preallocating the new areas. Let them use bdrv_has_zero_init_truncate() instead. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190724171239.8764-6-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> [mreitz: Added commit message] Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-08-19 17:13:26 +02:00
Max Reitz	1dcaf52760	block: Implement .bdrv_has_zero_init_truncate() We need to implement .bdrv_has_zero_init_truncate() for every block driver that supports truncation and has a .bdrv_has_zero_init() implementation. Implement it the same way each driver implements .bdrv_has_zero_init(). This is at least not any more unsafe than what we had before. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190724171239.8764-5-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-08-19 17:13:26 +02:00
Max Reitz	cdf3bc934a	mirror: Fix bdrv_has_zero_init() use bdrv_has_zero_init() only has meaning for newly created images or image areas. If the mirror job itself did not create the image, it cannot rely on bdrv_has_zero_init()'s result to carry any meaning. This is the case for drive-mirror with mode=existing and always for blockdev-mirror. Note that we only have to zero-initialize the target with sync=full, because other modes actually do not promise that the target will contain the same data as the source after the job -- sync=top only promises to copy anything allocated in the top layer, and sync=none will only copy new I/O. (Which is how mirror has always handled it.) Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190724171239.8764-3-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-08-19 17:13:26 +02:00
Maxim Levitsky	672de729a1	LUKS: support preallocation preallocation=off and preallocation=metadata both allocate luks header only, and preallocation=falloc/full is passed to underlying file. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1534951 Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-id: 20190716161901.1430-1-mlevitsk@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-08-19 17:13:26 +02:00
Peter Maydell	3fbd3405d2	- Run the iotest during "make check" -----BEGIN PGP SIGNATURE----- iQJFBAABCAAvFiEEJ7iIR+7gJQEY8+q5LtnXdP5wLbUFAl1XvtURHHRodXRoQHJl ZGhhdC5jb20ACgkQLtnXdP5wLbUqqRAAqfjPRPtXSZxaxl63O5EADugLY72s+04i Zc5MC4Ivwj0x1WA6JIG77kz6xmjIOizRIkF1jkyEkG+AgIrjw3rdYhCD4Iav0n/v nqltkOaf1FzdQYCHUTn0WUYn7Df2bSkjSTPhnqbCaGq5WjXzgzi9jhCFlZpo374J 9yWk74nt3QlOUjLw6+cm0HxEf9IlRQdPwJNYJwsrYHgspJwcwAYJ7xaL34huoxkJ 10fA9q6QK1bh67nZpAJOte3wQ8r35cUT4ZaIiyO0MFMrEiLp4/1gKYpkZwWq0+iV 25rWVjogzRjp+LejMAltY9MmUekCl5ZzVVuhdt2jGPbNanzdHxHydYFUELP6WmrM zAyYQkDvG7JiY2F0M9rKJZOnpO2pYF2hxc/nqD04qF3HD0zG1eoIg056UlKcc5b/ kIgR2srlj0aHWhGJ2/DV3w5ZowjJGKBAYYxQdEJmiuLpGdOimSoWXbjacDEd4mUf DevXv6k9cAvexZxU4cOUzpip4U3MGC0rJ1BNIgTs6eIeKq3geROTpuHjJFSBHZiP H/MqtoT4xTXJYomc9MiZG90fG9KyywlEF6e0GjVIcadJEmFIbJ+DfrAknVRjuaij ThMSIuvMEpXFhyghlApURePNi8W3FIIYHISw0JE/u7+4/7L/iYDToYeM/o563+8O zbj0n9fSewI= =oVZx -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/huth-gitlab/tags/pull-request-2019-08-17' into staging - Run the iotest during "make check" # gpg: Signature made Sat 17 Aug 2019 09:46:13 BST # gpg: using RSA key 27B88847EEE0250118F3EAB92ED9D774FE702DB5 # gpg: issuer "thuth@redhat.com" # gpg: Good signature from "Thomas Huth <th.huth@gmx.de>" [full] # gpg: aka "Thomas Huth <thuth@redhat.com>" [full] # gpg: aka "Thomas Huth <huth@tuxfamily.org>" [full] # gpg: aka "Thomas Huth <th.huth@posteo.de>" [unknown] # Primary key fingerprint: 27B8 8847 EEE0 2501 18F3 EAB9 2ED9 D774 FE70 2DB5 * remotes/huth-gitlab/tags/pull-request-2019-08-17: gitlab-ci: Remove qcow2 tests that are handled by "make check" already tests: Run the iotests during "make check" again block: fix NetBSD qemu-iotests failure Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2019-08-19 14:14:09 +01:00
Paolo Bonzini	f6fc1e30cf	block: fix NetBSD qemu-iotests failure Opening a block device on NetBSD has an additional step compared to other OSes, corresponding to raw_normalize_devicepath. The error message in that function is slightly different from that in raw_open_common and this was causing spurious failures in qemu-iotests. However, in general it is not important to know what exact step was failing, for example in the qemu-iotests case the error message contains the fairly unequivocal "No such file or directory" text from strerror. We can thus fix the failures by standardizing on a single error message for both raw_open_common and raw_normalize_devicepath; in fact, we can even use error_setg_file_open to make sure the error message is the same as in the rest of QEMU. Message-Id: <20190725095920.28419-1-pbonzini@redhat.com> Tested-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>	2019-08-17 09:02:59 +02:00

1 2 3 4 5 ...

4448 Commits