qemu-e2k

Author	SHA1	Message	Date
Eric Blake	465fe887cc	block: Honor BDRV_REQ_FUA during write_zeroes The block layer has a couple of cases where it can lose Force Unit Access semantics when writing a large block of zeroes, such that the request returns before the zeroes have been guaranteed to land on underlying media. SCSI does not support FUA during WRITESAME(10/16); FUA is only supported if it falls back to WRITE(10/16). But where the underlying device is new enough to not need a fallback, it means that any upper layer request with FUA semantics was silently ignoring BDRV_REQ_FUA. Conversely, NBD has situations where it can support FUA but not ZERO_WRITE; when that happens, the generic block layer fallback to bdrv_driver_pwritev() (or the older bdrv_co_writev() in qemu 2.6) was losing the FUA flag. The problem of losing flags unrelated to ZERO_WRITE has been latent in bdrv_co_do_write_zeroes() since commit `aa7bfbff`, but back then, it did not matter because there was no FUA flag. It became observable when commit `93f5e6d8` paved the way for flags that can impact correctness, when we should have been using bdrv_co_writev_flags() with modified flags. Compare to commit `9eeb6dd`, which got flag manipulation right in bdrv_co_do_zero_pwritev(). Symptoms: I tested with qemu-io with default writethrough cache (which is supposed to use FUA semantics on every write), and targetted an NBD client connected to a server that intentionally did not advertise NBD_FLAG_SEND_FUA. When doing 'write 0 512', the NBD client sent two operations (NBD_CMD_WRITE then NBD_CMD_FLUSH) to get the fallback FUA semantics; but when doing 'write -z 0 512', the NBD client sent only NBD_CMD_WRITE. The fix is do to a cleanup bdrv_co_flush() at the end of the operation if any step in the middle relied on a BDS that does not natively support FUA for that step (note that we don't need to flush after every operation, if the operation is broken into chunks based on bounce-buffer sizing). Each BDS gains a new flag .supported_zero_flags, which parallels the use of .supported_write_flags but only when accessing a zero write operation (the flags MUST be different, because of SCSI having different semantics based on WRITE vs. WRITESAME; and also because BDRV_REQ_MAY_UNMAP only makes sense on zero writes). Also fix some documentation to describe -ENOTSUP semantics, particularly since iscsi depends on those semantics. Down the road, we may want to add a driver where its .bdrv_co_pwritev() honors all three of BDRV_REQ_FUA, BDRV_REQ_ZERO_WRITE, and BDRV_REQ_MAY_UNMAP, and advertise this via bs->supported_write_flags for blocks opened by that driver; such a driver should NOT supply .bdrv_co_write_zeroes nor .supported_zero_flags. But none of the drivers touched in this patch want to do that (the act of writing zeroes is different enough from normal writes to deserve a second callback). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:09 +02:00
Paolo Bonzini	dd7f7ed104	linux-aio: make it more type safe Replace void* with an opaque LinuxAioState type. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:07 +02:00
Paolo Bonzini	6b98bd6495	block: plug whole tree at once, introduce bdrv_io_unplugged_begin/end Extract the handling of io_plug "depth" from linux-aio.c and let the main bdrv_drain loop do nothing but wait on I/O. Like the two newly introduced functions, bdrv_io_plug and bdrv_io_unplug now operate on all children. The visit order is now symmetrical between plug and unplug, making it possible for formats to implement plug/unplug. Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:07 +02:00
Programmingkid	d0855f1235	block/raw-posix.c: Make physical devices usable in QEMU under Mac OS X host Mac OS X can be picky when it comes to allowing the user to use physical devices in QEMU. Most mounted volumes appear to be off limits to QEMU. If an issue is detected, a message is displayed showing the user how to unmount a volume. Now QEMU uses both CD and DVD media. Signed-off-by: John Arbuckle <programmingkidx@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-30 11:59:32 +02:00
Veronia Bahaa	f348b6d1a5	util: move declarations out of qemu-common.h Move declarations out of qemu-common.h for functions declared in utils/ files: e.g. include/qemu/path.h for utils/path.c. Move inline functions out of qemu-common.h and into new files (e.g. include/qemu/bcd.h) Signed-off-by: Veronia Bahaa <veroniabahaa@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-03-22 22:20:17 +01:00
Markus Armbruster	da34e65cb4	include/qemu/osdep.h: Don't include qapi/error.h Commit `57cb38b` included qapi/error.h into qemu/osdep.h to get the Error typedef. Since then, we've moved to include qemu/osdep.h everywhere. Its file comment explains: "To avoid getting into possible circular include dependencies, this file should not include any other QEMU headers, with the exceptions of config-host.h, compiler.h, os-posix.h and os-win32.h, all of which are doing a similar job to this file and are under similar constraints." qapi/error.h doesn't do a similar job, and it doesn't adhere to similar constraints: it includes qapi-types.h. That's in excess of 100KiB of crap most .c files don't actually need. Add the typedef to qemu/typedefs.h, and include that instead of qapi/error.h. Include qapi/error.h in .c files that need it and don't get it now. Include qapi-types.h in qom/object.h for uint16List. Update scripts/clean-includes accordingly. Update it further to match reality: replace config.h by config-target.h, add sysemu/os-posix.h, sysemu/os-win32.h. Update the list of includes in the qemu/osdep.h comment quoted above similarly. This reduces the number of objects depending on qapi/error.h from "all of them" to less than a third. Unfortunately, the number depending on qapi-types.h shrinks only a little. More work is needed for that one. Signed-off-by: Markus Armbruster <armbru@redhat.com> [Fix compilation without the spice devel packages. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-03-22 22:20:15 +01:00
Fam Zheng	02650acbc6	raw: Assign bs to file in raw_co_get_block_status Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 1453780743-16806-5-git-send-email-famz@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-02-02 17:50:47 +01:00
Fam Zheng	67a0fd2a9b	block: Add "file" output parameter to block status query functions The added parameter can be used to return the BDS pointer which the valid offset is referring to. Its value should be ignored unless BDRV_BLOCK_OFFSET_VALID in ret is set. Until block drivers fill in the right value, let's clear it explicitly right before calling .bdrv_get_block_status. The "bs->file" condition in bdrv_co_get_block_status is kept now to keep iotest case 102 passing, and will be fixed once all drivers return the right file pointer. Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 1453780743-16806-2-git-send-email-famz@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-02-02 17:50:47 +01:00
Peter Maydell	80c71a241a	block: Clean up includes Clean up includes so that osdep.h is included first and headers which it implies are not included manually. This commit was created with scripts/clean-includes. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-01-20 13:36:23 +01:00
Christian Borntraeger	972b543c6b	block/raw-posix: avoid bogus fixup for cylinders on DASD disks large volume DASD that have > 64k cylinders do claim to have 0xFFFE cylinders as special value in the old 16 bit field. We want to pass this "token" along to the guest, instead of calculating the real number. Otherwise qemu might fail with "cyls must be between 1 and 65535" Cc: qemu-stable@nongnu.org Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-01-19 17:43:55 +01:00
Kevin Wolf	d657c0c289	raw-posix: Make aio=native option binding Traditionally, aio=native was treated as an advice that could simply be ignored if an error occurs while initialising Linux AIO or the feature wasn't compiled in. This behaviour was deprecated in commit `96518254` (qemu 2.3; error during init) and commit `1501ecc1` (qemu 2.5; not compiled in). This patch changes raw-posix to error out in these cases instead of printing a deprecation warning. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-12-18 14:34:43 +01:00
Eric Blake	7fb1cf1606	qapi: Don't let implicit enum MAX member collide Now that we guarantee the user doesn't have any enum values beginning with a single underscore, we can use that for our own purposes. Renaming ENUM_MAX to ENUM__MAX makes it obvious that the sentinel is generated. This patch was mostly generated by applying a temporary patch: \|diff --git a/scripts/qapi.py b/scripts/qapi.py \|index e6d014b..b862ec9 100644 \|--- a/scripts/qapi.py \|+++ b/scripts/qapi.py \|@@ -1570,6 +1570,7 @@ const char const %(c_name)s_lookup[] = { \| max_index = c_enum_const(name, 'MAX', prefix) \| ret += mcgen(''' \| [%(max_index)s] = NULL, \|+// %(max_index)s \| }; \| ''', \| max_index=max_index) then running: $ cat qapi-{types,event}.c tests/test-qapi-types.c \| sed -n 's,^// $.$MAX,s\|\1MAX\|\1_MAX\|g,p' > list $ git grep -l _MAX \| xargs sed -i -f list The only things not generated are the changes in scripts/qapi.py. Rejecting enum members named 'MAX' is now useless, and will be dropped in the next patch. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1447836791-369-23-git-send-email-eblake@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> [Rebased to current master, commit message tweaked] Signed-off-by: Markus Armbruster <armbru@redhat.com>	2015-12-17 08:21:28 +01:00
Programmingkid	98caa5bc00	raw-posix.c: Make GetBSDPath() handle caching options Add support for caching options that can be specified from the command line. The CD-ROM raw char device bypasses the host page cache and therefore has alignment requirements. Alignment probing is necessary so only use the raw char device if BDRV_O_NOCACHE is set. This patch fixes -cdrom /dev/cdrom on Mac OS X hosts, where bdrv_read() used to fail due to misaligned requests during image format probing. Signed-off-by: John Arbuckle <programmingkidx@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-11-25 14:27:43 +01:00
Fam Zheng	83c98d7b92	block: Drop BlockDriver.bdrv_ioctl Now the callback is not used any more, drop the field along with all implementations in block drivers, which are iscsi and raw. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1447064214-29930-8-git-send-email-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-11-12 16:22:43 +01:00
Max Reitz	e031f75048	block: Make bdrv_is_inserted() return a bool Make bdrv_is_inserted(), blk_is_inserted(), and the callback BlockDriver.bdrv_is_inserted() return a bool. Suggested-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-10-23 18:18:22 +02:00
Max Reitz	f709623b3d	block: Remove host floppy support It has been deprecated as of 2.3, so we can now remove it. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-10-23 18:18:22 +02:00
Stefan Hajnoczi	1501ecc1d8	raw-posix: warn about BDRV_O_NATIVE_AIO if libaio is unavailable raw-posix.c silently ignores BDRV_O_NATIVE_AIO if libaio is unavailable. It is confusing when aio=native performance is identical to aio=threads because the binary was accidentally built without libaio. Print a deprecation warning if -drive aio=native is used with a binary that does not support libaio. There are probably users using aio=native who would be inconvenienced if QEMU suddenly refused to start their guests. In the future this will become an error. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-10-16 15:34:30 +02:00
Paolo Bonzini	c84b31926f	block: switch from g_slice allocator to malloc Simplify memory allocation by sticking with a single API. GSlice is not that fast anyway (tcmalloc/jemalloc are better). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-10-12 11:17:45 +01:00
Richard W.M. Jones	73ba05d936	block/raw-posix: Open file descriptor O_RDWR to work around glibc posix_fallocate emulation issue. https://bugzilla.redhat.com/show_bug.cgi?id=1265196 The following command fails on an NFS mountpoint: $ qemu-img create -f qcow2 -o preallocation=falloc disk.img 262144 Formatting 'disk.img', fmt=qcow2 size=262144 encryption=off cluster_size=65536 preallocation='falloc' lazy_refcounts=off qemu-img: disk.img: Could not preallocate data for the new file: Bad file descriptor The reason turns out to be because NFS doesn't support the posix_fallocate call. glibc emulates it instead. However glibc's emulation involves using the pread(2) syscall. The pread syscall fails with EBADF if the file descriptor is opened without the read open-flag (ie. open (..., O_WRONLY)). I contacted glibc upstream about this, and their response is here: https://bugzilla.redhat.com/show_bug.cgi?id=1265196#c9 There are two possible fixes: Use Linux fallocate directly, or (this fix) work around the problem in qemu by opening the file with O_RDWR instead of O_WRONLY. Signed-off-by: Richard W.M. Jones <rjones@redhat.com> BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1265196 Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-10-02 13:48:29 +02:00
Max Reitz	bdd03cdf5d	block/raw-posix: Use raw_normalize_devicepath() The filename given to qemu_open() in block/raw-posix.c should generally have been processed by raw_normalize_devicepath(); unless we are only probing (in which case the caller often checks whether the file is a block device or not, and this property will be changed by raw_normalize_devicepath() on NetBSD) or it is about a deprecated device (i.e. floppy). Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-09-04 20:59:48 +02:00
Richard W.M. Jones	25d9747b64	block/raw-posix: Don't think /dev/fd/<NN> is a floppy drive. In libguestfs we use /dev/fd/<NN> to pass pre-opened file descriptors to qemu-img. Lately I've discovered that although this works, qemu believes that these are floppy disk images. That in itself isn't much of a problem, but now qemu prints a warning about host floppy pass-thru being deprecated. Extend the existing test so that it ignores /dev/fd/ as well as /dev/fdset/ A simple test of this, if you are using the bash shell, is: qemu-img info <( cat /dev/null ) without this patch: $ qemu-img info <( cat /dev/null ) qemu-img: Host floppy pass-through is deprecated Support for it will be removed in a future release. qemu-img: Could not open '/dev/fd/63': Could not refresh total sector count: Illegal seek with this patch: $ qemu-img info <( cat /dev/null ) qemu-img: Could not open '/dev/fd/63': Could not refresh total sector count: Illegal seek Signed-off-by: Richard W.M. Jones <rjones@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-id: 1435761614-31358-1-git-send-email-rjones@redhat.com Fixes: https://bugs.launchpad.net/qemu/+bug/1470536 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-07-07 14:27:14 +01:00
Dimitris Aragiorgis	3307ed7b3f	raw-posix: Introduce hdev_is_sg() Until now, an SG device was identified only by checking if its path started with "/dev/sg". Then, hdev_open() would set the bs->sg flag accordingly. The patch relies on the actual properties of the device instead of the specified file path. To this end, test for an SG device (e.g. /dev/sg0) by ensuring that all of the following holds: - The specified file name corresponds to a character device - The device supports the SG_GET_VERSION_NUM ioctl - The device supports the SG_GET_SCSI_ID ioctl Signed-off-by: Dimitris Aragiorgis <dimara@arrikto.com> Message-id: 1435056300-14924-6-git-send-email-dimara@arrikto.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-06-23 15:08:52 +01:00
Dimitris Aragiorgis	a93a3982a6	raw-posix: Use DPRINTF for DEBUG_FLOPPY Get rid of several #ifdef DEBUG_FLOPPY and substitute them with DPRINTF. Signed-off-by: Dimitris Aragiorgis <dimara@arrikto.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1435056300-14924-5-git-send-email-dimara@arrikto.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-06-23 15:08:52 +01:00
Dimitris Aragiorgis	bcb225550d	raw-posix: DPRINTF instead of DEBUG_BLOCK_PRINT Building the QEMU tools fails if we #define DEBUG_BLOCK inside block/raw-posix.c. Here instead of adding qemu-log.o in block-obj-y so that DEBUG_BLOCK_PRINT can be used, we substitute the latter with a simple DPRINTF() (that does not cause bit-rot). Signed-off-by: Dimitris Aragiorgis <dimara@arrikto.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1435056300-14924-4-git-send-email-dimara@arrikto.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-06-23 15:08:52 +01:00
Dimitris Aragiorgis	b192af8acc	block: Use bdrv_is_sg() everywhere Instead of checking bs->sg use bdrv_is_sg() consistently throughout the code. Signed-off-by: Dimitris Aragiorgis <dimara@arrikto.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1435056300-14924-2-git-send-email-dimara@arrikto.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-06-23 15:08:52 +01:00
Markus Armbruster	d49b683644	qerror: Move #include out of qerror.h Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>	2015-06-22 18:20:40 +02:00
Kevin Wolf	f4a769abaa	raw-posix: Fix .bdrv_co_get_block_status() for unaligned image size Image files with an unaligned image size have a final hole that starts at EOF, i.e. in the middle of a sector. Currently, *pnum == 0 is returned when checking the status of this sector. In qemu-img, this triggers an assertion failure. In order to fix this, one type for the sector that contains EOF must be found. Treating a hole as data is safe, so this patch rounds the calculated number of data sectors up, so that a partial sector at EOF is treated as a full data sector. This fixes https://bugzilla.redhat.com/show_bug.cgi?id=1229394 Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1433840108-9996-1-git-send-email-kwolf@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-06-12 13:58:33 +01:00
Denis V. Lunev	459b4e6612	block: align bounce buffers to page The following sequence int fd = open(argv[1], O_RDWR \| O_CREAT \| O_DIRECT, 0644); for (i = 0; i < 100000; i++) write(fd, buf, 4096); performs 5% better if buf is aligned to 4096 bytes. The difference is quite reliable. On the other hand we do not want at the moment to enforce bounce buffering if guest request is aligned to 512 bytes. The patch changes default bounce buffer optimal alignment to MAX(page size, 4k). 4k is chosen as maximal known sector size on real HDD. The justification of the performance improve is quite interesting. From the kernel point of view each request to the disk was split by two. This could be seen by blktrace like this: 9,0 11 1 0.000000000 11151 Q WS 312737792 + 1023 [qemu-img] 9,0 11 2 0.000007938 11151 Q WS 312738815 + 8 [qemu-img] 9,0 11 3 0.000030735 11151 Q WS 312738823 + 1016 [qemu-img] 9,0 11 4 0.000032482 11151 Q WS 312739839 + 8 [qemu-img] 9,0 11 5 0.000041379 11151 Q WS 312739847 + 1016 [qemu-img] 9,0 11 6 0.000042818 11151 Q WS 312740863 + 8 [qemu-img] 9,0 11 7 0.000051236 11151 Q WS 312740871 + 1017 [qemu-img] 9,0 5 1 0.169071519 11151 Q WS 312741888 + 1023 [qemu-img] After the patch the pattern becomes normal: 9,0 6 1 0.000000000 12422 Q WS 314834944 + 1024 [qemu-img] 9,0 6 2 0.000038527 12422 Q WS 314835968 + 1024 [qemu-img] 9,0 6 3 0.000072849 12422 Q WS 314836992 + 1024 [qemu-img] 9,0 6 4 0.000106276 12422 Q WS 314838016 + 1024 [qemu-img] and the amount of requests sent to disk (could be calculated counting number of lines in the output of blktrace) is reduced about 2 times. Both qemu-img and qemu-io are affected while qemu-kvm is not. The guest does his job well and real requests comes properly aligned (to page). Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1431441056-26198-3-git-send-email-den@openvz.org CC: Paolo Bonzini <pbonzini@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-05-22 09:37:33 +01:00
Denis V. Lunev	4196d2f030	block: minimal bounce buffer alignment The patch introduces new concept: minimal memory alignment for bounce buffers. Original so called "optimal" value is actually minimal required value for aligment. It should be used for validation that the IOVec is properly aligned and bounce buffer is not required. Though, from the performance point of view, it would be better if bounce buffer or IOVec allocated by QEMU will be aligned stricter. The patch does not change any alignment value yet. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1431441056-26198-2-git-send-email-den@openvz.org CC: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-05-22 09:37:33 +01:00
Kevin Wolf	965182549c	raw-posix: Deprecate aio=threads fallback without O_DIRECT Currently, if the user requests aio=native, but forgets to choose a cache mode that sets O_DIRECT, that request is silently ignored and raw falls back to aio=threads. Deprecate that behaviour so we can make it an error in future qemu versions. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com>	2015-03-19 12:30:56 +01:00
Markus Armbruster	92a539d22e	raw-posix: Deprecate host floppy passthrough Raise your hand if you have a physical floppy drive in a computer you've powered on in 2015. Okay, I see we got a few weirdos in the audience. That's okay, weirdos are welcome here. Kidding aside, media change detection doesn't fully work, isn't going to be fixed, and floppy passthrough just isn't earning its keep anymore. Deprecate block driver host_floppy now, so we can drop it after a grace period. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-03-19 11:43:02 +01:00
Stefan Hajnoczi	22d182e82b	block/raw-posix: fix launching with failed disks Since commit `c25f53b06e` ("raw: Probe required direct I/O alignment") QEMU has failed to launch if image files produce I/O errors. Previously, QEMU would launch successfully and the guest would see the errors when attempting I/O. This is a regression and may prevent multipath I/O inside the guest, where QEMU must launch and let the guest figure out by itself which disks are online. Tweak the alignment probing code in raw-posix.c to explicitly look for EINVAL on Linux instead of bailing. The kernel refuses misaligned requests with this error code and other error codes can be ignored. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-03-10 14:02:24 +01:00
Ekaterina Tumanova	1a9335e4a9	block: Add driver methods to probe blocksizes and geometry Introduce driver methods of defining disk blocksizes (physical and logical) and hard drive geometry. Methods are only implemented for "host_device". For "raw" devices driver calls child's method. For now geometry detection will only work for DASD devices. To check that a local check_for_dasd function was introduced. It calls BIODASDINFO2 ioctl and returns its rc. Blocksizes detection function will probe sizes for DASD devices. Signed-off-by: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1424087278-49393-4-git-send-email-tumanova@linux.vnet.ibm.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-03-10 14:02:22 +01:00
Ekaterina Tumanova	8a4ed0d1b1	raw-posix: Factor block size detection out of raw_probe_alignment() Put it in new probe_logical_blocksize(). Signed-off-by: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1424087278-49393-3-git-send-email-tumanova@linux.vnet.ibm.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-03-10 14:02:21 +01:00
Denis V. Lunev	a6dcf097fa	block/raw-posix: fix compilation warning on OSX block/raw-posix.c:947:19: warning: unused variable 's' [-Wunused-variable] BDRVRawState *s = aiocb->bs->opaque; This variable is used only when on of the following macros are defined CONFIG_XFS, CONFIG_FALLOCATE, CONFIG_FALLOCATE_PUNCH_HOLE or CONFIG_FALLOCATE_ZERO_RANGE. Fortunately, CONFIG_FALLOCATE_PUNCH_HOLE and CONFIG_FALLOCATE_ZERO_RANGE could be defined only along with CONFIG_FALLOCATE. Therefore checking for CONFIG_XFS or CONFIG_FALLOCATE would be enough. Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Peter Maydell <peter.maydell@linaro.org> CC: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-03-09 11:11:59 +01:00
Max Reitz	c0191e763b	block: Remove "growable" from BDS Now that request clamping is done in the BlockBackend, the "growable" field can be removed from the BlockDriverState. All BDSs are now treated as being "growable" (that is, they are allowed to grow; they are not necessarily actually able to). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1423162705-32065-16-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-02-16 15:07:19 +00:00
Programmingkid	728dacbda8	block/raw-posix.c: Fix raw_getlength() on Mac OS X block devices This patch replaces the dummy code in raw_getlength() for block devices on OS X, which always returned LLONG_MAX, with a real implementation that returns the actual block device size. Signed-off-by: John Arbuckle <programmingkidx@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 18:00:53 +01:00
Denis V. Lunev	1cdc3239f1	block: use fallocate(FALLOC_FL_PUNCH_HOLE) & fallocate(0) to write zeroes This sequence works efficiently if FALLOC_FL_ZERO_RANGE is not supported. Unfortunately, FALLOC_FL_ZERO_RANGE is supported on really modern systems and only for a couple of filesystems. FALLOC_FL_PUNCH_HOLE is much more mature. The sequence of 2 operations FALLOC_FL_PUNCH_HOLE and 0 is necessary due to the following reasons: - FALLOC_FL_PUNCH_HOLE creates a hole in the file, the file becomes sparse. In order to retain original functionality we must allocate disk space afterwards. This is done using fallocate(0) call - fallocate(0) without preceeding FALLOC_FL_PUNCH_HOLE will do nothing if called above already allocated areas of the file, i.e. the content will not be zeroed This should increase the performance a bit for not-so-modern kernels. CC: Max Reitz <mreitz@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Peter Lieven <pl@kamp.de> CC: Fam Zheng <famz@redhat.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:20 +01:00
Denis V. Lunev	d50d822219	block/raw-posix: call plain fallocate in handle_aiocb_write_zeroes There is a possibility that we are extending our image and thus writing zeroes beyond the end of the file. In this case we do not need to care about the hole to make sure that there is no data in the file under this offset (pre-condition to fallocate(0) to work). We could simply call fallocate(0). This improves the performance of writing zeroes even on really old platforms which do not have even FALLOC_FL_PUNCH_HOLE. Before the patch do_fallocate was used when either CONFIG_FALLOCATE_PUNCH_HOLE or CONFIG_FALLOCATE_ZERO_RANGE are defined. Now the story is different. CONFIG_FALLOCATE is defined when Linux fallocate is defined, posix_fallocate is completely different story (CONFIG_POSIX_FALLOCATE). CONFIG_FALLOCATE is mandatory prerequite for both CONFIG_FALLOCATE_PUNCH_HOLE and CONFIG_FALLOCATE_ZERO_RANGE thus we are on the safe side. CC: Max Reitz <mreitz@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Peter Lieven <pl@kamp.de> CC: Fam Zheng <famz@redhat.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:20 +01:00
Denis V. Lunev	b953f07500	block: use fallocate(FALLOC_FL_ZERO_RANGE) in handle_aiocb_write_zeroes This efficiently writes zeroes on Linux if the kernel is capable enough. FALLOC_FL_ZERO_RANGE correctly handles all cases, including and not including file expansion. CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Peter Lieven <pl@kamp.de> CC: Fam Zheng <famz@redhat.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:20 +01:00
Denis V. Lunev	37cc9f7f68	block/raw-posix: refactor handle_aiocb_write_zeroes a bit move code dealing with a block device to a separate function. This will allow to implement additional processing for ordinary files. Please note, that xfs_code has been moved before checking for s->has_write_zeroes as xfs_write_zeroes does not touch this flag inside. This makes code a bit more consistent. CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Peter Lieven <pl@kamp.de> CC: Fam Zheng <famz@redhat.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:20 +01:00
Denis V. Lunev	0b99171230	block/raw-posix: create do_fallocate helper The pattern do { if (fallocate(s->fd, mode, offset, len) == 0) { return 0; } } while (errno == EINTR); ret = translate_err(-errno); will be commonly useful in next patches. Create helper for it. CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Peter Lieven <pl@kamp.de> CC: Fam Zheng <famz@redhat.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Peter Lieven <pl@kamp.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:20 +01:00
Denis V. Lunev	1486df0e31	block/raw-posix: create translate_err helper to merge errno values actually the code if (ret == -ENODEV \|\| ret == -ENOSYS \|\| ret == -EOPNOTSUPP \|\| ret == -ENOTTY) { ret = -ENOTSUP; } is present twice and will be added a couple more times. Create helper for this. CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Peter Lieven <pl@kamp.de> CC: Fam Zheng <famz@redhat.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Peter Lieven <pl@kamp.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:20 +01:00
Max Reitz	01212d4ed6	block/raw-posix: Fix ret in raw_open_common() The return value must be negative on error; there is one place in raw_open_common() where errp is set, but ret remains 0. Fix it. Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:31:20 +01:00
Max Reitz	5f535a941e	block: Make essential BlockDriver objects public There are some block drivers which are essential to QEMU and may not be removed: These are raw, file and qcow2 (as the default non-raw format). Make their BlockDriver objects public so they can be directly referenced throughout the block layer without needing to call bdrv_find_format() and having to deal with an error at runtime, while the real problem occurred during linking (where raw, file or qcow2 were not linked into qemu). Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:31:19 +01:00
Paolo Bonzini	a56ebc6ba4	block: do not use get_clock() Use the external qemu-timer API instead. No one else should be calling cpu_get_clock(), get_clock() and get_clock_realtime() directly; they are internal functions and they should be confined to qemu-timer.c and cpus.c (where the icount implementation resides). All accesses should go through qemu_clock_get_ns. Cc: kwolf@redhat.com Cc: stefanha@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1417010463-3527-2-git-send-email-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:31:13 +01:00
Max Reitz	098ffa6674	block/raw-posix: Catch fsync() errors fsync() may fail, and that case should be handled. Reported-by: László Érsek <lersek@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-11-18 12:09:00 +01:00
Max Reitz	731de38052	block/raw-posix: Only sync after successful preallocation The loop which filled the file with zeroes may have been left early due to an error. In that case, the fsync() should be skipped. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-11-18 12:09:00 +01:00
Max Reitz	39411cf3c3	block/raw-posix: Fix preallocating write() loop write() may write less bytes than requested; in this case, the number of bytes written is returned. This is the byte count we should be subtracting from the number of bytes still to be written, and not the byte count we requested to write. Reported-by: László Érsek <lersek@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-11-18 12:08:59 +01:00
Markus Armbruster	d1f06fe665	raw-posix: The SEEK_HOLE code is flawed, rewrite it On systems where SEEK_HOLE in a trailing hole seeks to EOF (Solaris, but not Linux), try_seek_hole() reports trailing data instead. Additionally, unlikely lseek() failures are treated badly: * When SEEK_HOLE fails, try_seek_hole() reports trailing data. For -ENXIO, there's in fact a trailing hole. Can happen only when something truncated the file since we opened it. * When SEEK_HOLE succeeds, SEEK_DATA fails, and SEEK_END succeeds, then try_seek_hole() reports a trailing hole. This is okay only when SEEK_DATA failed with -ENXIO (which means the non-trailing hole found by SEEK_HOLE has since become trailing somehow). For other failures (unlikely), it's wrong. * When SEEK_HOLE succeeds, SEEK_DATA fails, SEEK_END fails (unlikely), then try_seek_hole() reports bogus data [-1,start), which its caller raw_co_get_block_status() turns into zero sectors of data. Could theoretically lead to infinite loops in code that attempts to scan data vs. hole forward. Rewrite from scratch, with very careful comments. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2014-11-18 09:45:48 +01:00

1 2 3 4 5

215 Commits