qemu-e2k

Author	SHA1	Message	Date
Denis V. Lunev	ec050f77a5	block: process before_write_notifiers in bdrv_co_discard This is mandatory for correct backup creation. In the other case the content under this area would be lost. Dirty bits are set exactly like in bdrv_aligned_pwritev, i.e. they are set even if notifier has returned a error. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Vladimir Sementsov-Ogievskiy<vsementsov@virtuozzo.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1466093381-6120-4-git-send-email-den@openvz.org CC: Fam Zheng <famz@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-06-20 11:44:12 +01:00
Denis V. Lunev	968d8b0627	block: fix race in bdrv_co_discard with drive-mirror Actually we must set dirty bitmap dirty after we have written all our zeroes for correct processing in drive mirror code. In the other case we can face not zeroes in this area in mirror_iteration. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Vladimir Sementsov-Ogievskiy<vsementsov@virtuozzo.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1466093381-6120-3-git-send-email-den@openvz.org CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-06-20 11:44:12 +01:00
Denis V. Lunev	3a36e474f2	block: fixed BdrvTrackedRequest filling in bdrv_co_discard The request area is specified in bytes, not in sectors. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Vladimir Sementsov-Ogievskiy<vsementsov@virtuozzo.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1466093381-6120-2-git-send-email-den@openvz.org CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-06-20 11:44:12 +01:00
Paolo Bonzini	02d0e09503	os-posix: include sys/mman.h qemu/osdep.h checks whether MAP_ANONYMOUS is defined, but this check is bogus without a previous inclusion of sys/mman.h. Include it in sysemu/os-posix.h and remove it from everywhere else. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-06-16 18:39:03 +02:00
Max Reitz	67882b1535	block/null: Implement bdrv_refresh_filename() The null block driver ignores any filename used for creating its BDSs, which allows creating such BDSs even without any filename at all. In that case, we currently construct a JSON filename when queried instead of a plain "null-co://" or "null-aio://". This patch implements bdrv_refresh_filename() to remedy this behavior. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20160610185750.30956-4-mreitz@redhat.com [mreitz@redhat.com: Added commit message] Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-06-16 15:20:37 +02:00
Max Reitz	274fccee2b	block/mirror: Fix target backing BDS Currently, we are trying to move the backing BDS from the source to the target in bdrv_replace_in_backing_chain() which is called from mirror_exit(). However, mirror_complete() already tries to open the target's backing chain with a call to bdrv_open_backing_file(). First, we should only set the target's backing BDS once. Second, the mirroring block job has a better idea of what to set it to than the generic code in bdrv_replace_in_backing_chain() (in fact, the latter's conditions on when to move the backing BDS from source to target are not really correct). Therefore, remove that code from bdrv_replace_in_backing_chain() and leave it to mirror_complete(). Depending on what kind of mirroring is performed, we furthermore want to use different strategies to open the target's backing chain: - If blockdev-mirror is used, we can assume the user made sure that the target already has the correct backing chain. In particular, we should not try to open a backing file if the target does not have any yet. - If drive-mirror with mode=absolute-paths is used, we can and should reuse the already existing chain of nodes that the source BDS is in. In case of sync=full, no backing BDS is required; with sync=top, we just link the source's backing BDS to the target, and with sync=none, we use the source BDS as the target's backing BDS. We should not try to open these backing files anew because this would lead to two BDSs existing per physical file in the backing chain, and we would like to avoid such concurrent access. - If drive-mirror with mode=existing is used, we have to use the information provided in the physical image file which means opening the target's backing chain completely anew, just as it has been done already. If the target's backing chain shares images with the source, this may lead to multiple BDSs per physical image file. But since we cannot reliably ascertain this case, there is nothing we can do about it. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20160610185750.30956-3-mreitz@redhat.com Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-06-16 15:20:37 +02:00
Vikhyat Umrao	87cd3d20e1	rbd:change error_setg() to error_setg_errno() Ceph RBD block driver does not use error_setg_errno() where it is possible to use. This patch replaces error_setg() from error_setg_errno(). Signed-off-by: Vikhyat Umrao <vumrao@redhat.com> Message-id: 1462780319-5796-1-git-send-email-vumrao@redhat.com Reviewed-by: Josh Durgin <jdurgin@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-06-16 15:20:37 +02:00
Alberto Garcia	834fe28ddf	block: Create the commit block job before reopening any image If the base or overlay images need to be reopened in read-write mode but the block_job_create() call fails then no one will put those images back in read-only mode. We can solve this problem easily by calling block_job_create() first. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-id: aa495045770a6f1a7cc5d408397a17c75097fdd8.1464346103.git.berto@igalia.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-06-16 15:20:37 +02:00
Alberto Garcia	eb1364ceac	block: use the block job list in bdrv_drain_all() bdrv_drain_all() pauses all block jobs by using bdrv_next() to iterate over all top-level BlockDriverStates. Therefore the code is unable to find block jobs in other nodes. This patch uses block_job_next() to iterate over all block jobs. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-id: 55ee7d7d4a65c28aa1a1b28823897ef326f328e2.1464346103.git.berto@igalia.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-06-16 15:20:37 +02:00
Kevin Wolf	c9d20029f4	block: Remove bs->zero_beyond_eof It is always true for open images now. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-06-16 15:19:56 +02:00
Kevin Wolf	734a77584a	qcow2: Let vmstate call qcow2_co_preadv/pwrite directly We don't really want to go through the block layer in order to read from or write to the vmstate in a qcow2 image. Doing so required a few ugly hacks like saving and restoring the old image size (because writing to vmstate offsets would increase the image size) or disabling the "reads after EOF = zeroes" logic. When calling the right functions directly, these hacks aren't necessary any more. Note that .bdrv_vmstate_load/save() return 0 instead of the number of bytes in case of success now. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-06-16 15:19:56 +02:00
Kevin Wolf	1a8ae82217	block: Make bdrv_load/save_vmstate coroutine_fns This allows drivers to share code between normal I/O and vmstate accesses. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-06-16 15:19:56 +02:00
Kevin Wolf	b433d9424d	block: Allow .bdrv_load/save_vmstate() to return 0/-errno The return value of .bdrv_load/save_vmstate() can be any non-negative number in case of success now. It used to be bytes/-errno. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-06-16 15:19:55 +02:00
Kevin Wolf	5ddda0b8f0	block: Make .bdrv_load_vmstate() vectored This brings it in line with .bdrv_save_vmstate(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-06-16 15:19:55 +02:00
Kevin Wolf	f1e8474115	block: Introduce bdrv_preadv() We already have a byte-based bdrv_pwritev(), but the read counterpart was still missing. This commit adds it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-06-16 15:19:55 +02:00
Kevin Wolf	ccb9dc1012	linux-aio: Cancel BH if not needed linux-aio uses a BH in order to make sure that the remaining completions are processed even in nested event loops of completion callbacks in order to avoid deadlocks. There is no need, however, to have the BH overhead for the first call into qemu_laio_completion_bh() or after all pending completions have already been processed. Therefore, this patch calls directly into qemu_laio_completion_bh() in qemu_laio_completion_cb() and cancels the BH after qemu_laio_completion_bh() has processed all pending completions. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>	2016-06-16 15:19:55 +02:00
Kevin Wolf	23b0d9fb1d	block: Don't enforce 512 byte minimum alignment If block drivers say that they can do an alignment < 512 bytes, let's just suppose they mean it. raw-posix used to be an offender with respect to this, but it can actually deal with byte-aligned requests now. The default is still 512 bytes for any drivers that only implement sector-based interfaces, but it is 1 now for drivers that implement .bdrv_co_preadv. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-06-16 15:19:55 +02:00
Kevin Wolf	9d52aa3c38	raw-posix: Implement .bdrv_co_preadv/pwritev The raw-posix block driver actually supports byte-aligned requests now on non-O_DIRECT images, like it already (and previously incorrectly) claimed in bs->request_alignment. For some block drivers this means that a RMW cycle can be avoided when they write sub-sector metadata e.g. for cluster allocation. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-06-16 15:19:55 +02:00
Kevin Wolf	2174f12bde	raw-posix: Switch to bdrv_co_* interfaces In order to use the modern byte-based .bdrv_co_preadv/pwritev() interface, this patch switches raw-posix to coroutine-based interfaces as a first step. In terms of semantics and performance, it doesn't make a difference with the existing code whether we go from a coroutine to a callback-based interface already in block/io.c or only in linux-aio.c As there have been concerns in the past that this change may be a step in the wrong direction with respect to a possible AIO fast path, the old callback-based interface for linux-aio is left around and can be reactivated when a fast path (e.g. directly from virtio-blk dataplane, bypassing the whole block layer) is implemented. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-06-16 15:19:55 +02:00
Kevin Wolf	9896c8765f	block: Prepare bdrv_aligned_pwritev() for byte-aligned requests Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-06-16 15:19:55 +02:00
Kevin Wolf	49c0752600	block: Prepare bdrv_aligned_preadv() for byte-aligned requests This patch makes bdrv_aligned_preadv() ready to accept byte-aligned requests. Note that this doesn't mean that such requests are actually made. The caller still ensures that all requests are aligned to at least 512 bytes. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-06-16 15:19:55 +02:00
Kevin Wolf	244483e64e	block: Byte-based bdrv_co_do_copy_on_readv() In a first step to convert the common I/O path to work on bytes rather than sectors, this converts the copy-on-read logic that is used by bdrv_aligned_preadv(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-06-16 15:19:55 +02:00
Daniel P. Berrange	8c0dcbc4ad	block: drop support for using qcow[2] encryption with system emulators Back in the 2.3.0 release we declared qcow[2] encryption as deprecated, warning people that it would be removed in a future release. commit `a1f688f415` Author: Markus Armbruster <armbru@redhat.com> Date: Fri Mar 13 21:09:40 2015 +0100 block: Deprecate QCOW/QCOW2 encryption The code still exists today, but by a (happy?) accident we entirely broke the ability to use qcow[2] encryption in the system emulators in the 2.4.0 release due to commit `8336aafae1` Author: Daniel P. Berrange <berrange@redhat.com> Date: Tue May 12 17:09:18 2015 +0100 qcow2/qcow: protect against uninitialized encryption key This commit was designed to prevent future coding bugs which might cause QEMU to read/write data on an encrypted block device in plain text mode before a decryption key is set. It turns out this preventative measure was a little too good, because we already had a long standing bug where QEMU read encrypted data in plain text mode during system emulator startup, in order to guess disk geometry: Thread 10 (Thread 0x7fffd3fff700 (LWP 30373)): #0 0x00007fffe90b1a28 in raise () at /lib64/libc.so.6 #1 0x00007fffe90b362a in abort () at /lib64/libc.so.6 #2 0x00007fffe90aa227 in __assert_fail_base () at /lib64/libc.so.6 #3 0x00007fffe90aa2d2 in () at /lib64/libc.so.6 #4 0x000055555587ae19 in qcow2_co_readv (bs=0x5555562accb0, sector_num=0, remaining_sectors=1, qiov=0x7fffffffd260) at block/qcow2.c:1229 #5 0x000055555589b60d in bdrv_aligned_preadv (bs=bs@entry=0x5555562accb0, req=req@entry=0x7fffd3ffea50, offset=offset@entry=0, bytes=bytes@entry=512, align=align@entry=512, qiov=qiov@entry=0x7fffffffd260, flags=0) at block/io.c:908 #6 0x000055555589b8bc in bdrv_co_do_preadv (bs=0x5555562accb0, offset=0, bytes=512, qiov=0x7fffffffd260, flags=<optimized out>) at block/io.c:999 #7 0x000055555589c375 in bdrv_rw_co_entry (opaque=0x7fffffffd210) at block/io.c:544 #8 0x000055555586933b in coroutine_thread (opaque=0x555557876310) at coroutine-gthread.c:134 #9 0x00007ffff64e1835 in g_thread_proxy (data=0x5555562b5590) at gthread.c:778 #10 0x00007ffff6bb760a in start_thread () at /lib64/libpthread.so.0 #11 0x00007fffe917f59d in clone () at /lib64/libc.so.6 Thread 1 (Thread 0x7ffff7ecab40 (LWP 30343)): #0 0x00007fffe91797a9 in syscall () at /lib64/libc.so.6 #1 0x00007ffff64ff87f in g_cond_wait (cond=cond@entry=0x555555e085f0 <coroutine_cond>, mutex=mutex@entry=0x555555e08600 <coroutine_lock>) at gthread-posix.c:1397 #2 0x00005555558692c3 in qemu_coroutine_switch (co=<optimized out>) at coroutine-gthread.c:117 #3 0x00005555558692c3 in qemu_coroutine_switch (from_=0x5555562b5e30, to_=to_@entry=0x555557876310, action=action@entry=COROUTINE_ENTER) at coroutine-gthread.c:175 #4 0x0000555555868a90 in qemu_coroutine_enter (co=0x555557876310, opaque=0x0) at qemu-coroutine.c:116 #5 0x0000555555859b84 in thread_pool_completion_bh (opaque=0x7fffd40010e0) at thread-pool.c:187 #6 0x0000555555859514 in aio_bh_poll (ctx=ctx@entry=0x5555562953b0) at async.c:85 #7 0x0000555555864d10 in aio_dispatch (ctx=ctx@entry=0x5555562953b0) at aio-posix.c:135 #8 0x0000555555864f75 in aio_poll (ctx=ctx@entry=0x5555562953b0, blocking=blocking@entry=true) at aio-posix.c:291 #9 0x000055555589c40d in bdrv_prwv_co (bs=bs@entry=0x5555562accb0, offset=offset@entry=0, qiov=qiov@entry=0x7fffffffd260, is_write=is_write@entry=false, flags=flags@entry=(unknown: 0)) at block/io.c:591 #10 0x000055555589c503 in bdrv_rw_co (bs=bs@entry=0x5555562accb0, sector_num=sector_num@entry=0, buf=buf@entry=0x7fffffffd2e0 "\321,", nb_sectors=nb_sectors@entry=21845, is_write=is_write@entry=false, flags=flags@entry=(unknown: 0)) at block/io.c:614 #11 0x000055555589c562 in bdrv_read_unthrottled (nb_sectors=21845, buf=0x7fffffffd2e0 "\321,", sector_num=0, bs=0x5555562accb0) at block/io.c:622 #12 0x000055555589c562 in bdrv_read_unthrottled (bs=0x5555562accb0, sector_num=sector_num@entry=0, buf=buf@entry=0x7fffffffd2e0 "\321,", nb_sectors=nb_sectors@entry=21845) at block/io.c:634 nb_sectors@entry=1) at block/block-backend.c:504 #14 0x0000555555752e9f in guess_disk_lchs (blk=blk@entry=0x5555562a5290, pcylinders=pcylinders@entry=0x7fffffffd52c, pheads=pheads@entry=0x7fffffffd530, psectors=psectors@entry=0x7fffffffd534) at hw/block/hd-geometry.c:68 #15 0x0000555555752ff7 in hd_geometry_guess (blk=0x5555562a5290, pcyls=pcyls@entry=0x555557875d1c, pheads=pheads@entry=0x555557875d20, psecs=psecs@entry=0x555557875d24, ptrans=ptrans@entry=0x555557875d28) at hw/block/hd-geometry.c:133 #16 0x0000555555752b87 in blkconf_geometry (conf=conf@entry=0x555557875d00, ptrans=ptrans@entry=0x555557875d28, cyls_max=cyls_max@entry=65536, heads_max=heads_max@entry=16, secs_max=secs_max@entry=255, errp=errp@entry=0x7fffffffd5e0) at hw/block/block.c:71 #17 0x0000555555799bc4 in ide_dev_initfn (dev=0x555557875c80, kind=IDE_HD) at hw/ide/qdev.c:174 #18 0x0000555555768394 in device_realize (dev=0x555557875c80, errp=0x7fffffffd640) at hw/core/qdev.c:247 #19 0x0000555555769a81 in device_set_realized (obj=0x555557875c80, value=<optimized out>, errp=0x7fffffffd730) at hw/core/qdev.c:1058 #20 0x00005555558240ce in property_set_bool (obj=0x555557875c80, v=<optimized out>, opaque=0x555557875de0, name=<optimized out>, errp=0x7fffffffd730) at qom/object.c:1514 #21 0x0000555555826c87 in object_property_set_qobject (obj=obj@entry=0x555557875c80, value=value@entry=0x55555784bcb0, name=name@entry=0x55555591cb3d "realized", errp=errp@entry=0x7fffffffd730) at qom/qom-qobject.c:24 #22 0x0000555555825760 in object_property_set_bool (obj=obj@entry=0x555557875c80, value=value@entry=true, name=name@entry=0x55555591cb3d "realized", errp=errp@entry=0x7fffffffd730) at qom/object.c:905 #23 0x000055555576897b in qdev_init_nofail (dev=dev@entry=0x555557875c80) at hw/core/qdev.c:380 #24 0x0000555555799ead in ide_create_drive (bus=bus@entry=0x555557629630, unit=unit@entry=0, drive=0x5555562b77e0) at hw/ide/qdev.c:122 #25 0x000055555579a746 in pci_ide_create_devs (dev=dev@entry=0x555557628db0, hd_table=hd_table@entry=0x7fffffffd830) at hw/ide/pci.c:440 #26 0x000055555579b165 in pci_piix3_ide_init (bus=<optimized out>, hd_table=0x7fffffffd830, devfn=<optimized out>) at hw/ide/piix.c:218 #27 0x000055555568ca55 in pc_init1 (machine=0x5555562960a0, pci_enabled=1, kvmclock_enabled=<optimized out>) at /home/berrange/src/virt/qemu/hw/i386/pc_piix.c:256 #28 0x0000555555603ab2 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4249 So the safety net is correctly preventing QEMU reading cipher text as if it were plain text, during startup and aborting QEMU to avoid bad usage of this data. For added fun this bug only happens if the encrypted qcow2 file happens to have data written to the first cluster, otherwise the cluster won't be allocated and so qcow2 would not try the decryption routines at all, just return all 0's. That no one even noticed, let alone reported, this bug that has shipped in 2.4.0, 2.5.0 and 2.6.0 shows that the number of actual users of encrypted qcow2 is approximately zero. So rather than fix the crash, and backport it to stable releases, just go ahead with what we have warned users about and disable any use of qcow2 encryption in the system emulators. qemu-img/qemu-io/qemu-nbd are still able to access qcow2 encrypted images for the sake of data conversion. In the future, qcow2 will gain support for the alternative luks format, but when this happens it'll be using the '-object secret' infrastructure for getting keys, which avoids this problematic scenario entirely. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-06-16 15:19:55 +02:00
Eric Blake	fa16653874	block: Assert that flags are in range Add a new BDRV_REQ_MASK constant, and use it to make sure that caller flags are always valid. Tested with 'make check' and with qemu-iotests on both '-raw' and '-qcow2'; the only failure turned up was fixed in the previous commit. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-06-16 15:19:55 +02:00
Eric Blake	73698c30ca	block: Avoid bogus flags during mirroring Commit `e253f4b8` converted mirroring from sector-based bdrv_aio_* to byte-based blk_aio_*, but failed to account for the subtle difference in signatures (the former takes a semi-redundant length, the latter takes a flags parameter). Since all of our flags are currently smaller in size than BDRV_SECTOR_SIZE, it has no ill effects until we either perform sub-sector mirroring, or we start asserting that no unexpected flags are set. I found it while testing new asserts when qemu-iotests 132 started warning about an unknown flag 0x200000. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-06-16 15:19:55 +02:00
Kevin Wolf	d46a0bb24d	qcow2: Implement .bdrv_co_pwritev() This changes qcow2 to implement the byte-based .bdrv_co_pwritev interface rather than the sector-based old one. As preallocation uses the same allocation function as normal writes, and the interface of that function needs to be changed, it is converted in the same patch. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2016-06-16 15:19:55 +02:00
Kevin Wolf	8556739355	qcow2: Use bytes instead of sectors for QCowL2Meta In preparation for implementing .bdrv_co_pwritev in qcow2. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2016-06-16 15:19:55 +02:00
Kevin Wolf	aaa4d20b49	qcow2: Make copy_sectors() byte based This will allow copy on write operations where the overwritten part of the cluster is not aligned to sector boundaries. Also rename the function because it has nothing to do with sectors any more. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2016-06-16 15:19:55 +02:00
Kevin Wolf	ecfe186380	qcow2: Implement .bdrv_co_preadv() Reading from qcow2 images is now byte granularity. Most of the affected code in qcow2 actually gets simpler with this change. The only exception is encryption, which is fixed on 512 bytes blocks; in order to keep this working, bs->request_alignment is set for encrypted images. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2016-06-16 15:19:55 +02:00
Kevin Wolf	b2f65d6b02	qcow2: Work with bytes in qcow2_get_cluster_offset() This patch changes the units that qcow2_get_cluster_offset() uses internally, without touching the interface just yet. This will be done in another patch. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2016-06-16 15:19:55 +02:00
Kevin Wolf	515c2f431e	block: Don't emulate natively supported pwritev flags Drivers that implement .bdrv_co_pwritev() get the flags passed as an argument to said function, but we also unconditionally emulate the flags anyway. We shouldn't do that. Fix this by clearing all flags that the driver supports natively after it returns from .bdrv_co_pwritev(). Fixes: `4df863f3` ('block: Make supported_write_flags a per-bds property') Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2016-06-08 10:21:09 +02:00
Kevin Wolf	2a9170bcd4	block: Fix bdrv_all_delete_snapshot() error handling The code to exit the loop after bdrv_snapshot_delete_by_id_or_name() returned failure was duplicated. The first copy of it was too early so that the AioContext lock would not be freed. This patch removes it so that only the second, correct copy remains. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-06-08 10:21:09 +02:00
Denis V. Lunev	f3c3b87dae	qcow2: avoid extra flushes in qcow2 The problem with excessive flushing was found by a couple of performance tests: - parallel directory tree creation (from 2 processes) - 32 cached writes + fsync at the end in a loop For the first one results improved from 2.6 loops/sec to 3.5 loops/sec. Each loop creates 10^3 directories with 10 files in each. For the second one results improved from ~600 fsync/sec to ~1100 fsync/sec. Though, it was run on SSD so it probably won't show such performance gain on rotational media. qcow2_cache_flush() calls bdrv_flush() unconditionally after writing cache entries of a particular cache. This can lead to as many as 2 additional fdatasyncs inside bdrv_flush. We can simply skip all fdatasync calls inside qcow2_co_flush_to_os as bdrv_flush for sure will do the job. These flushes are necessary to keep the right order of writes to the different caches. Though this is not necessary in the current code base as this ordering is ensured through the flush in qcow2_cache_flush_dependency(). Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Pavel Borzenkov <pborzenkov@virtuozzo.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-06-08 10:21:09 +02:00
Fam Zheng	6f6071745b	raw-posix: Fetch max sectors for host block device This is sometimes a useful value we should count in. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-06-08 10:21:09 +02:00
Eric Blake	c1499a5e73	block: Kill bdrv_co_write_zeroes() Now that all drivers have been converted to a byte interface, we no longer need a sector interface. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-06-08 10:21:08 +02:00
Eric Blake	a620f2ae15	vmdk: Convert to bdrv_co_pwrite_zeroes() Another step on our continuing quest to switch to byte-based interfaces. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-06-08 10:21:08 +02:00
Eric Blake	39ad937e16	raw_bsd: Convert to bdrv_co_pwrite_zeroes() Another step on our continuing quest to switch to byte-based interfaces. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-06-08 10:21:08 +02:00
Eric Blake	2ffa76c2bf	raw-posix: Convert to bdrv_co_pwrite_zeroes() Another step on our continuing quest to switch to byte-based interfaces. Signed-off-by: Eric Blake <eblake@redhat.com> [ kwolf: Fixed up trace_paio_submit_co() call for qiov == NULL ] Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-06-08 10:21:08 +02:00
Eric Blake	49a2e48348	qed: Convert to bdrv_co_pwrite_zeroes() Another step on our continuing quest to switch to byte-based interfaces. Kill an abuse of the comma operator while at it (fortunately, the semantics were still right). Also, the test for requests not aligned to clusters should be applied always, not just when a backing file is present. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-06-08 10:21:08 +02:00
Eric Blake	e88a36ebad	gluster: Convert to bdrv_co_pwrite_zeroes() Another step on our continuing quest to switch to byte-based interfaces. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-06-08 10:21:08 +02:00
Eric Blake	9c21a4220b	blkreplay: Convert to bdrv_co_pwrite_zeroes() Another step on our continuing quest to switch to byte-based interfaces. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-06-08 10:21:08 +02:00
Eric Blake	5544b59f8e	qcow2: Convert to bdrv_co_pwrite_zeroes() Another step on our continuing quest to switch to byte-based interfaces. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-06-08 10:21:08 +02:00
Eric Blake	94d047a35b	iscsi: Convert to bdrv_co_pwrite_zeroes() Another step on our continuing quest to switch to byte-based interfaces. As this is the first byte-based iscsi interface, convert is_request_lun_aligned() into two versions, one for sectors and one for bytes. Also, change from outright -EINVAL failure on an unaligned request, to instead failing with -ENOTSUP to trigger a read-modify-write fallback, particularly since the block layer should be honoring bs->request_alignment to avoid -EINVAL on read/write requests. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-06-08 10:21:08 +02:00
Eric Blake	74021bc497	block: Switch bdrv_write_zeroes() to byte interface Rename to bdrv_pwrite_zeroes() to let the compiler ensure we cater to the updated semantics. Do the same for bdrv_co_write_zeroes(). Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-06-08 10:21:08 +02:00
Eric Blake	d05aa8bb4a	block: Add .bdrv_co_pwrite_zeroes() Update bdrv_co_do_write_zeroes() to be byte-based, and select between the new byte-based bdrv_co_pwrite_zeroes() or the old bdrv_co_write_zeroes(). The next patches will convert drivers, then remove the old interface. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-06-08 10:21:08 +02:00
Eric Blake	cf081fca4e	block: Track write zero limits in bytes Another step towards removing sector-based interfaces: convert the maximum write and minimum alignment values from sectors to bytes. Rename the variables to let the compiler check that all users are converted to the new semantics. The maximum remains an int as long as BDRV_REQUEST_MAX_SECTORS is constrained by INT_MAX (this means that we can't even support a 2G write_zeroes, but just under it) - changing operation lengths to unsigned or to 64-bits is a much bigger audit, and debatable if we even want to do it (since at the core, a 32-bit platform will still have ssize_t as its underlying limit on write()). Meanwhile, alignment is changed to 'uint32_t', since it makes no sense to have an alignment larger than the maximum write, and less painful to use an unsigned type with well-defined behavior in bit operations than to have to worry about what happens if a driver mistakenly supplies a negative alignment. Add an assert that no one was trying to use sectors to get a write zeroes larger than 2G, and therefore that a later conversion to bytes won't be impacted by keeping the limit at 32 bits. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-06-08 10:21:08 +02:00
Eric Blake	8b18474451	iscsi: Use block size as minimum zero/discard alignment If hardware does not advertise a minimum zero/discard alignment, we still want to guarantee that the block layer will align requests to our blocks, rather than the arbitrary 512-byte BDRV sector size. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-06-08 10:21:08 +02:00
Eric Blake	ebb718a5c7	qcow2: Catch more unaligned write_zero into zero cluster is_zero_cluster() and is_zero_cluster_top_locked() are used only by qcow2_co_write_zeroes(). The former is too broad (we don't care if the sectors we are about to overwrite are non-zero, only that all other sectors in the cluster are zero), so it needs to be called up to twice but with smaller limits - rename it along with adding the neeeded parameter. The latter can be inlined for more compact code. The testsuite change shows that we now have a sparser top file when an unaligned write_zeroes overwrites the only portion of the backing file with data. Based on a patch proposal by Denis V. Lunev. CC: Denis V. Lunev <den@openvz.org> Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-06-08 10:21:08 +02:00
Denis V. Lunev	5a64e94251	qcow2: add tracepoints for qcow2_co_write_zeroes This patch follows guidelines of all other tracepoints in qcow2, like ones in qcow2_co_writev. I think that they should dump values in the same quantities or be changed all together. Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Eric Blake <eblake@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> Message-Id: <1463476543-3087-4-git-send-email-den@openvz.org> [eblake: typo fix in commit message] Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-06-08 10:21:08 +02:00
Denis V. Lunev	ba142846b0	qcow2: simplify logic in qcow2_co_write_zeroes Unaligned requests will occupy only one cluster. This is true since the previous commit. Simplify the code taking this consideration into account. In other words, the caller is now buggy if it ever passes us an unaligned request that crosses cluster boundaries (the only requests that can cross boundaries will be aligned). There are no other changes so far. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Eric Blake <eblake@redhat.com> CC: Eric Blake <eblake@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> Message-Id: <1463476543-3087-3-git-send-email-den@openvz.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-06-08 10:21:08 +02:00

1 2 3 4 5 ...

2510 Commits