qemu-e2k

Author	SHA1	Message	Date
Max Reitz	bdd03cdf5d	block/raw-posix: Use raw_normalize_devicepath() The filename given to qemu_open() in block/raw-posix.c should generally have been processed by raw_normalize_devicepath(); unless we are only probing (in which case the caller often checks whether the file is a block device or not, and this property will be changed by raw_normalize_devicepath() on NetBSD) or it is about a deprecated device (i.e. floppy). Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-09-04 20:59:48 +02:00
Richard W.M. Jones	25d9747b64	block/raw-posix: Don't think /dev/fd/<NN> is a floppy drive. In libguestfs we use /dev/fd/<NN> to pass pre-opened file descriptors to qemu-img. Lately I've discovered that although this works, qemu believes that these are floppy disk images. That in itself isn't much of a problem, but now qemu prints a warning about host floppy pass-thru being deprecated. Extend the existing test so that it ignores /dev/fd/ as well as /dev/fdset/ A simple test of this, if you are using the bash shell, is: qemu-img info <( cat /dev/null ) without this patch: $ qemu-img info <( cat /dev/null ) qemu-img: Host floppy pass-through is deprecated Support for it will be removed in a future release. qemu-img: Could not open '/dev/fd/63': Could not refresh total sector count: Illegal seek with this patch: $ qemu-img info <( cat /dev/null ) qemu-img: Could not open '/dev/fd/63': Could not refresh total sector count: Illegal seek Signed-off-by: Richard W.M. Jones <rjones@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-id: 1435761614-31358-1-git-send-email-rjones@redhat.com Fixes: https://bugs.launchpad.net/qemu/+bug/1470536 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-07-07 14:27:14 +01:00
Dimitris Aragiorgis	3307ed7b3f	raw-posix: Introduce hdev_is_sg() Until now, an SG device was identified only by checking if its path started with "/dev/sg". Then, hdev_open() would set the bs->sg flag accordingly. The patch relies on the actual properties of the device instead of the specified file path. To this end, test for an SG device (e.g. /dev/sg0) by ensuring that all of the following holds: - The specified file name corresponds to a character device - The device supports the SG_GET_VERSION_NUM ioctl - The device supports the SG_GET_SCSI_ID ioctl Signed-off-by: Dimitris Aragiorgis <dimara@arrikto.com> Message-id: 1435056300-14924-6-git-send-email-dimara@arrikto.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-06-23 15:08:52 +01:00
Dimitris Aragiorgis	a93a3982a6	raw-posix: Use DPRINTF for DEBUG_FLOPPY Get rid of several #ifdef DEBUG_FLOPPY and substitute them with DPRINTF. Signed-off-by: Dimitris Aragiorgis <dimara@arrikto.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1435056300-14924-5-git-send-email-dimara@arrikto.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-06-23 15:08:52 +01:00
Dimitris Aragiorgis	bcb225550d	raw-posix: DPRINTF instead of DEBUG_BLOCK_PRINT Building the QEMU tools fails if we #define DEBUG_BLOCK inside block/raw-posix.c. Here instead of adding qemu-log.o in block-obj-y so that DEBUG_BLOCK_PRINT can be used, we substitute the latter with a simple DPRINTF() (that does not cause bit-rot). Signed-off-by: Dimitris Aragiorgis <dimara@arrikto.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1435056300-14924-4-git-send-email-dimara@arrikto.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-06-23 15:08:52 +01:00
Dimitris Aragiorgis	b192af8acc	block: Use bdrv_is_sg() everywhere Instead of checking bs->sg use bdrv_is_sg() consistently throughout the code. Signed-off-by: Dimitris Aragiorgis <dimara@arrikto.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1435056300-14924-2-git-send-email-dimara@arrikto.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-06-23 15:08:52 +01:00
Markus Armbruster	d49b683644	qerror: Move #include out of qerror.h Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>	2015-06-22 18:20:40 +02:00
Kevin Wolf	f4a769abaa	raw-posix: Fix .bdrv_co_get_block_status() for unaligned image size Image files with an unaligned image size have a final hole that starts at EOF, i.e. in the middle of a sector. Currently, *pnum == 0 is returned when checking the status of this sector. In qemu-img, this triggers an assertion failure. In order to fix this, one type for the sector that contains EOF must be found. Treating a hole as data is safe, so this patch rounds the calculated number of data sectors up, so that a partial sector at EOF is treated as a full data sector. This fixes https://bugzilla.redhat.com/show_bug.cgi?id=1229394 Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1433840108-9996-1-git-send-email-kwolf@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-06-12 13:58:33 +01:00
Denis V. Lunev	459b4e6612	block: align bounce buffers to page The following sequence int fd = open(argv[1], O_RDWR \| O_CREAT \| O_DIRECT, 0644); for (i = 0; i < 100000; i++) write(fd, buf, 4096); performs 5% better if buf is aligned to 4096 bytes. The difference is quite reliable. On the other hand we do not want at the moment to enforce bounce buffering if guest request is aligned to 512 bytes. The patch changes default bounce buffer optimal alignment to MAX(page size, 4k). 4k is chosen as maximal known sector size on real HDD. The justification of the performance improve is quite interesting. From the kernel point of view each request to the disk was split by two. This could be seen by blktrace like this: 9,0 11 1 0.000000000 11151 Q WS 312737792 + 1023 [qemu-img] 9,0 11 2 0.000007938 11151 Q WS 312738815 + 8 [qemu-img] 9,0 11 3 0.000030735 11151 Q WS 312738823 + 1016 [qemu-img] 9,0 11 4 0.000032482 11151 Q WS 312739839 + 8 [qemu-img] 9,0 11 5 0.000041379 11151 Q WS 312739847 + 1016 [qemu-img] 9,0 11 6 0.000042818 11151 Q WS 312740863 + 8 [qemu-img] 9,0 11 7 0.000051236 11151 Q WS 312740871 + 1017 [qemu-img] 9,0 5 1 0.169071519 11151 Q WS 312741888 + 1023 [qemu-img] After the patch the pattern becomes normal: 9,0 6 1 0.000000000 12422 Q WS 314834944 + 1024 [qemu-img] 9,0 6 2 0.000038527 12422 Q WS 314835968 + 1024 [qemu-img] 9,0 6 3 0.000072849 12422 Q WS 314836992 + 1024 [qemu-img] 9,0 6 4 0.000106276 12422 Q WS 314838016 + 1024 [qemu-img] and the amount of requests sent to disk (could be calculated counting number of lines in the output of blktrace) is reduced about 2 times. Both qemu-img and qemu-io are affected while qemu-kvm is not. The guest does his job well and real requests comes properly aligned (to page). Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1431441056-26198-3-git-send-email-den@openvz.org CC: Paolo Bonzini <pbonzini@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-05-22 09:37:33 +01:00
Denis V. Lunev	4196d2f030	block: minimal bounce buffer alignment The patch introduces new concept: minimal memory alignment for bounce buffers. Original so called "optimal" value is actually minimal required value for aligment. It should be used for validation that the IOVec is properly aligned and bounce buffer is not required. Though, from the performance point of view, it would be better if bounce buffer or IOVec allocated by QEMU will be aligned stricter. The patch does not change any alignment value yet. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1431441056-26198-2-git-send-email-den@openvz.org CC: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-05-22 09:37:33 +01:00
Kevin Wolf	965182549c	raw-posix: Deprecate aio=threads fallback without O_DIRECT Currently, if the user requests aio=native, but forgets to choose a cache mode that sets O_DIRECT, that request is silently ignored and raw falls back to aio=threads. Deprecate that behaviour so we can make it an error in future qemu versions. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com>	2015-03-19 12:30:56 +01:00
Markus Armbruster	92a539d22e	raw-posix: Deprecate host floppy passthrough Raise your hand if you have a physical floppy drive in a computer you've powered on in 2015. Okay, I see we got a few weirdos in the audience. That's okay, weirdos are welcome here. Kidding aside, media change detection doesn't fully work, isn't going to be fixed, and floppy passthrough just isn't earning its keep anymore. Deprecate block driver host_floppy now, so we can drop it after a grace period. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-03-19 11:43:02 +01:00
Stefan Hajnoczi	22d182e82b	block/raw-posix: fix launching with failed disks Since commit `c25f53b06e` ("raw: Probe required direct I/O alignment") QEMU has failed to launch if image files produce I/O errors. Previously, QEMU would launch successfully and the guest would see the errors when attempting I/O. This is a regression and may prevent multipath I/O inside the guest, where QEMU must launch and let the guest figure out by itself which disks are online. Tweak the alignment probing code in raw-posix.c to explicitly look for EINVAL on Linux instead of bailing. The kernel refuses misaligned requests with this error code and other error codes can be ignored. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-03-10 14:02:24 +01:00
Ekaterina Tumanova	1a9335e4a9	block: Add driver methods to probe blocksizes and geometry Introduce driver methods of defining disk blocksizes (physical and logical) and hard drive geometry. Methods are only implemented for "host_device". For "raw" devices driver calls child's method. For now geometry detection will only work for DASD devices. To check that a local check_for_dasd function was introduced. It calls BIODASDINFO2 ioctl and returns its rc. Blocksizes detection function will probe sizes for DASD devices. Signed-off-by: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1424087278-49393-4-git-send-email-tumanova@linux.vnet.ibm.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-03-10 14:02:22 +01:00
Ekaterina Tumanova	8a4ed0d1b1	raw-posix: Factor block size detection out of raw_probe_alignment() Put it in new probe_logical_blocksize(). Signed-off-by: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1424087278-49393-3-git-send-email-tumanova@linux.vnet.ibm.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-03-10 14:02:21 +01:00
Denis V. Lunev	a6dcf097fa	block/raw-posix: fix compilation warning on OSX block/raw-posix.c:947:19: warning: unused variable 's' [-Wunused-variable] BDRVRawState *s = aiocb->bs->opaque; This variable is used only when on of the following macros are defined CONFIG_XFS, CONFIG_FALLOCATE, CONFIG_FALLOCATE_PUNCH_HOLE or CONFIG_FALLOCATE_ZERO_RANGE. Fortunately, CONFIG_FALLOCATE_PUNCH_HOLE and CONFIG_FALLOCATE_ZERO_RANGE could be defined only along with CONFIG_FALLOCATE. Therefore checking for CONFIG_XFS or CONFIG_FALLOCATE would be enough. Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Peter Maydell <peter.maydell@linaro.org> CC: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-03-09 11:11:59 +01:00
Max Reitz	c0191e763b	block: Remove "growable" from BDS Now that request clamping is done in the BlockBackend, the "growable" field can be removed from the BlockDriverState. All BDSs are now treated as being "growable" (that is, they are allowed to grow; they are not necessarily actually able to). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1423162705-32065-16-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-02-16 15:07:19 +00:00
Programmingkid	728dacbda8	block/raw-posix.c: Fix raw_getlength() on Mac OS X block devices This patch replaces the dummy code in raw_getlength() for block devices on OS X, which always returned LLONG_MAX, with a real implementation that returns the actual block device size. Signed-off-by: John Arbuckle <programmingkidx@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 18:00:53 +01:00
Denis V. Lunev	1cdc3239f1	block: use fallocate(FALLOC_FL_PUNCH_HOLE) & fallocate(0) to write zeroes This sequence works efficiently if FALLOC_FL_ZERO_RANGE is not supported. Unfortunately, FALLOC_FL_ZERO_RANGE is supported on really modern systems and only for a couple of filesystems. FALLOC_FL_PUNCH_HOLE is much more mature. The sequence of 2 operations FALLOC_FL_PUNCH_HOLE and 0 is necessary due to the following reasons: - FALLOC_FL_PUNCH_HOLE creates a hole in the file, the file becomes sparse. In order to retain original functionality we must allocate disk space afterwards. This is done using fallocate(0) call - fallocate(0) without preceeding FALLOC_FL_PUNCH_HOLE will do nothing if called above already allocated areas of the file, i.e. the content will not be zeroed This should increase the performance a bit for not-so-modern kernels. CC: Max Reitz <mreitz@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Peter Lieven <pl@kamp.de> CC: Fam Zheng <famz@redhat.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:20 +01:00
Denis V. Lunev	d50d822219	block/raw-posix: call plain fallocate in handle_aiocb_write_zeroes There is a possibility that we are extending our image and thus writing zeroes beyond the end of the file. In this case we do not need to care about the hole to make sure that there is no data in the file under this offset (pre-condition to fallocate(0) to work). We could simply call fallocate(0). This improves the performance of writing zeroes even on really old platforms which do not have even FALLOC_FL_PUNCH_HOLE. Before the patch do_fallocate was used when either CONFIG_FALLOCATE_PUNCH_HOLE or CONFIG_FALLOCATE_ZERO_RANGE are defined. Now the story is different. CONFIG_FALLOCATE is defined when Linux fallocate is defined, posix_fallocate is completely different story (CONFIG_POSIX_FALLOCATE). CONFIG_FALLOCATE is mandatory prerequite for both CONFIG_FALLOCATE_PUNCH_HOLE and CONFIG_FALLOCATE_ZERO_RANGE thus we are on the safe side. CC: Max Reitz <mreitz@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Peter Lieven <pl@kamp.de> CC: Fam Zheng <famz@redhat.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:20 +01:00
Denis V. Lunev	b953f07500	block: use fallocate(FALLOC_FL_ZERO_RANGE) in handle_aiocb_write_zeroes This efficiently writes zeroes on Linux if the kernel is capable enough. FALLOC_FL_ZERO_RANGE correctly handles all cases, including and not including file expansion. CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Peter Lieven <pl@kamp.de> CC: Fam Zheng <famz@redhat.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:20 +01:00
Denis V. Lunev	37cc9f7f68	block/raw-posix: refactor handle_aiocb_write_zeroes a bit move code dealing with a block device to a separate function. This will allow to implement additional processing for ordinary files. Please note, that xfs_code has been moved before checking for s->has_write_zeroes as xfs_write_zeroes does not touch this flag inside. This makes code a bit more consistent. CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Peter Lieven <pl@kamp.de> CC: Fam Zheng <famz@redhat.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:20 +01:00
Denis V. Lunev	0b99171230	block/raw-posix: create do_fallocate helper The pattern do { if (fallocate(s->fd, mode, offset, len) == 0) { return 0; } } while (errno == EINTR); ret = translate_err(-errno); will be commonly useful in next patches. Create helper for it. CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Peter Lieven <pl@kamp.de> CC: Fam Zheng <famz@redhat.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Peter Lieven <pl@kamp.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:20 +01:00
Denis V. Lunev	1486df0e31	block/raw-posix: create translate_err helper to merge errno values actually the code if (ret == -ENODEV \|\| ret == -ENOSYS \|\| ret == -EOPNOTSUPP \|\| ret == -ENOTTY) { ret = -ENOTSUP; } is present twice and will be added a couple more times. Create helper for this. CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Peter Lieven <pl@kamp.de> CC: Fam Zheng <famz@redhat.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Peter Lieven <pl@kamp.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:20 +01:00
Max Reitz	01212d4ed6	block/raw-posix: Fix ret in raw_open_common() The return value must be negative on error; there is one place in raw_open_common() where errp is set, but ret remains 0. Fix it. Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:31:20 +01:00
Max Reitz	5f535a941e	block: Make essential BlockDriver objects public There are some block drivers which are essential to QEMU and may not be removed: These are raw, file and qcow2 (as the default non-raw format). Make their BlockDriver objects public so they can be directly referenced throughout the block layer without needing to call bdrv_find_format() and having to deal with an error at runtime, while the real problem occurred during linking (where raw, file or qcow2 were not linked into qemu). Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:31:19 +01:00
Paolo Bonzini	a56ebc6ba4	block: do not use get_clock() Use the external qemu-timer API instead. No one else should be calling cpu_get_clock(), get_clock() and get_clock_realtime() directly; they are internal functions and they should be confined to qemu-timer.c and cpus.c (where the icount implementation resides). All accesses should go through qemu_clock_get_ns. Cc: kwolf@redhat.com Cc: stefanha@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1417010463-3527-2-git-send-email-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:31:13 +01:00
Max Reitz	098ffa6674	block/raw-posix: Catch fsync() errors fsync() may fail, and that case should be handled. Reported-by: László Érsek <lersek@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-11-18 12:09:00 +01:00
Max Reitz	731de38052	block/raw-posix: Only sync after successful preallocation The loop which filled the file with zeroes may have been left early due to an error. In that case, the fsync() should be skipped. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-11-18 12:09:00 +01:00
Max Reitz	39411cf3c3	block/raw-posix: Fix preallocating write() loop write() may write less bytes than requested; in this case, the number of bytes written is returned. This is the byte count we should be subtracting from the number of bytes still to be written, and not the byte count we requested to write. Reported-by: László Érsek <lersek@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-11-18 12:08:59 +01:00
Markus Armbruster	d1f06fe665	raw-posix: The SEEK_HOLE code is flawed, rewrite it On systems where SEEK_HOLE in a trailing hole seeks to EOF (Solaris, but not Linux), try_seek_hole() reports trailing data instead. Additionally, unlikely lseek() failures are treated badly: * When SEEK_HOLE fails, try_seek_hole() reports trailing data. For -ENXIO, there's in fact a trailing hole. Can happen only when something truncated the file since we opened it. * When SEEK_HOLE succeeds, SEEK_DATA fails, and SEEK_END succeeds, then try_seek_hole() reports a trailing hole. This is okay only when SEEK_DATA failed with -ENXIO (which means the non-trailing hole found by SEEK_HOLE has since become trailing somehow). For other failures (unlikely), it's wrong. * When SEEK_HOLE succeeds, SEEK_DATA fails, SEEK_END fails (unlikely), then try_seek_hole() reports bogus data [-1,start), which its caller raw_co_get_block_status() turns into zero sectors of data. Could theoretically lead to infinite loops in code that attempts to scan data vs. hole forward. Rewrite from scratch, with very careful comments. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2014-11-18 09:45:48 +01:00
Markus Armbruster	c4875e5b22	raw-posix: SEEK_HOLE suffices, get rid of FIEMAP Commit `5500316` (May 2012) implemented raw_co_is_allocated() as follows: 1. If defined(CONFIG_FIEMAP), use the FS_IOC_FIEMAP ioctl 2. Else if defined(SEEK_HOLE) && defined(SEEK_DATA), use lseek() 3. Else pretend there are no holes Later on, raw_co_is_allocated() was generalized to raw_co_get_block_status(). Commit `4f11aa8` (May 2014) changed it to try the three methods in order until success, because "there may be implementations which support [SEEK_HOLE/SEEK_DATA] but not [FIEMAP] (e.g., NFSv4.2) as well as vice versa." Unfortunately, we used FIEMAP incorrectly: we lacked FIEMAP_FLAG_SYNC. Commit `38c4d0a` (Sep 2014) added it. Because that's a significant speed hit, the next commit `7c159037` put SEEK_HOLE/SEEK_DATA first. As you see, the obvious use of FIEMAP is wrong, and the correct use is slow. I guess this puts it somewhere between -7 "The obvious use is wrong" and -10 "It's impossible to get right" on Rusty Russel's Hard to Misuse scale[]. "Fortunately", the FIEMAP code is used only when SEEK_HOLE/SEEK_DATA aren't defined, but CONFIG_FIEMAP is Uncommon. SEEK_HOLE had no XFS implementation between 2011 (when it was introduced for ext4 and btrfs) and 2012. * SEEK_HOLE/SEEK_DATA and CONFIG_FIEMAP are defined, but lseek() fails Unlikely. Thus, the FIEMAP code executes rarely. Makes it a nice hidey-hole for bugs. Worse, bugs hiding there can theoretically bite even on a host that has SEEK_HOLE/SEEK_DATA. I don't want to worry about this crap, not even theoretically. Get rid of it. [*] http://ozlabs.org/~rusty/index.cgi/tech/2008-04-01.html Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2014-11-18 09:45:35 +01:00
Markus Armbruster	be2ebc6dad	raw-posix: Fix comment for raw_co_get_block_status() Missed in commit `705be72`. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2014-11-18 09:44:02 +01:00
Max Reitz	d7f62751a1	raw-posix: raw_co_get_block_status() return value Instead of generating the full return value thrice in try_fiemap(), try_seek_hole() and as a fall-back in raw_co_get_block_status() itself, generate the value only in raw_co_get_block_status(). While at it, also remove the pnum parameter from try_fiemap() and try_seek_hole(). Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1414148280-17949-3-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 11:41:47 +00:00
Max Reitz	e6d7ec32dd	raw-posix: Fix raw_co_get_block_status() after EOF As its comment states, raw_co_get_block_status() should unconditionally return 0 and set pnum to 0 for after EOF. An assertion after lseek(..., SEEK_HOLE) tried to catch this case by asserting that errno != -ENXIO (which would indicate a position after the EOF); but it should be errno != ENXIO instead. Regardless of that, there should be no such assertion at all. If bdrv_getlength() returned an outdated value and the image has been resized outside of qemu, lseek() will return with errno == ENXIO. Just return that value as an error then. Setting pnum to 0 and returning 0 should not be done here, as in that case we should update the device length as well. So, from qemu's perspective, the file has not been resized; it's just that there was an error querying sectors beyond a certain point (the actual file size). Additionally, nb_sectors should be clamped against the image end. This was probably not an issue if FIEMAP or SEEK_HOLE/SEEK_DATA worked, but the fallback did not take this case into account. Reported-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1414148280-17949-2-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 11:41:47 +00:00
Roger Pau Monne	3cad83075c	block: char devices on FreeBSD are not behind a pager Introduce a new flag to mark devices that require requests to be aligned and replace the usage of BDRV_O_NOCACHE and O_DIRECT with this flag when appropriate. If a character device is used as a backend on a FreeBSD host set this flag unconditionally. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2014-10-23 16:56:53 +02:00
Markus Armbruster	097310b53e	block: Rename BlockDriverCompletionFunc to BlockCompletionFunc I'll use it with block backends shortly, and the name is going to fit badly there. It's a block layer thing anyway, not just a block driver thing. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:27 +02:00
Markus Armbruster	7c84b1b831	block: Rename BlockDriverAIOCB* to BlockAIOCB* I'll use BlockDriverAIOCB with block backends shortly, and the name is going to fit badly there. It's a block layer thing anyway, not just a block driver thing. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:27 +02:00
Tony Breeds	7c15903789	block/raw-posix: use seek_hole ahead of fiemap try_fiemap() uses FIEMAP_FLAG_SYNC which has a significant performance impact. Prefer seek_hole() over fiemap() to avoid this impact where possible. seek_hole is more widely used and, arguably, has potential to be optimised in the kernel. Reported-By: Michael Steffens <michael_steffens@posteo.de> Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Markus Armbruster <armbru@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Max Reitz <mreitz@redhat.com> Cc: Pádraig Brady <pbrady@redhat.com> Cc: Eric Blake <eblake@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:26 +02:00
Tony Breeds	38c4d0aea3	block/raw-posix: Fix disk corruption in try_fiemap Using fiemap without FIEMAP_FLAG_SYNC is a known corrupter. Add the FIEMAP_FLAG_SYNC flag to the FS_IOC_FIEMAP ioctl. This has the downside of significantly reducing performance. Reported-By: Michael Steffens <michael_steffens@posteo.de> Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Markus Armbruster <armbru@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Max Reitz <mreitz@redhat.com> Cc: Pádraig Brady <pbrady@redhat.com> Cc: Eric Blake <eblake@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:26 +02:00
Kevin Wolf	ed9114356b	raw-posix: Fix build without posix_fallocate() Check for the presence of posix_fallocate() in configure and only compile in support for PREALLOC_MODE_FALLOC when it's there. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2014-09-29 16:28:24 +01:00
Hu Tao	06247428be	raw-posix: Add falloc and full preallocation option This patch adds a new option preallocation for raw format, and implements falloc and full preallocation. Signed-off-by: Hu Tao <hutao@cn.fujitsu.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-09-12 15:43:06 +02:00
Hu Tao	180e95265e	block: don't convert file size to sector size and avoid converting it back later. Signed-off-by: Hu Tao <hutao@cn.fujitsu.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-09-12 15:43:06 +02:00
Hu Tao	c2eb918e32	block: round up file size to nearest sector Currently the file size requested by user is rounded down to nearest sector, causing the actual file size could be a bit less than the size user requested. Since some formats (like qcow2) record virtual disk size in bytes, this can make the last few bytes cannot be accessed. This patch fixes it by rounding up file size to nearest sector so that the actual file size is no less than the requested file size. Signed-off-by: Hu Tao <hutao@cn.fujitsu.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-09-12 15:43:06 +02:00
Stefan Hajnoczi	61ed73cff4	raw-posix: fix O_DIRECT short reads The following O_DIRECT read from a <512 byte file fails: $ truncate -s 320 test.img $ qemu-io -n -c 'read -P 0 0 512' test.img qemu-io: can't open device test.img: Could not read image for determining its format: Invalid argument Note that qemu-io completes successfully without the -n (O_DIRECT) option. This patch fixes qemu-iotests ./check -nocache -vmdk 059. Cc: qemu-stable@nongnu.org Suggested-by: Kevin Wolf <kwolf@redhat.com> Reported-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-08-22 11:00:56 +02:00
Markus Armbruster	5839e53bbc	block: Use g_new() & friends where that makes obvious sense g_new(T, n) is neater than g_malloc(sizeof(T) * n). It's also safer, for two reasons. One, it catches multiplication overflowing size_t. Two, it returns T * rather than void , which lets the compiler catch more type errors. Patch created with Coccinelle, with two manual changes on top: Add const to bdrv_iterate_format() to keep the types straight * Convert the allocation in bdrv_drop_intermediate(), which Coccinelle inexplicably misses Coccinelle semantic patch: @@ type T; @@ -g_malloc(sizeof(T)) +g_new(T, 1) @@ type T; @@ -g_try_malloc(sizeof(T)) +g_try_new(T, 1) @@ type T; @@ -g_malloc0(sizeof(T)) +g_new0(T, 1) @@ type T; @@ -g_try_malloc0(sizeof(T)) +g_try_new0(T, 1) @@ type T; expression n; @@ -g_malloc(sizeof(T) * (n)) +g_new(T, n) @@ type T; expression n; @@ -g_try_malloc(sizeof(T) * (n)) +g_try_new(T, n) @@ type T; expression n; @@ -g_malloc0(sizeof(T) * (n)) +g_new0(T, n) @@ type T; expression n; @@ -g_try_malloc0(sizeof(T) * (n)) +g_try_new0(T, n) @@ type T; expression p, n; @@ -g_realloc(p, sizeof(T) * (n)) +g_renew(T, p, n) @@ type T; expression p, n; @@ -g_try_realloc(p, sizeof(T) * (n)) +g_try_renew(T, p, n) Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-08-20 11:51:28 +02:00
Kevin Wolf	50d4a858e6	raw-posix: Handle failure for potentially large allocations Some code in the block layer makes potentially huge allocations. Failure is not completely unexpected there, so avoid aborting qemu and handle out-of-memory situations gracefully. This patch addresses the allocations in the raw-posix block driver. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-15 15:07:15 +02:00
Kevin Wolf	df26a35025	raw-posix: Fail gracefully if no working alignment is found If qemu couldn't find out what O_DIRECT alignment to use with a given file, it would run into assert(bdrv_opt_mem_align(bs) != 0); in block.c and confuse users. This adds a more descriptive error message for such cases. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-07-18 13:18:43 +01:00
Kevin Wolf	3baca89139	block: Add Error argument to bdrv_refresh_limits() Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-07-18 13:18:43 +01:00
Kevin Wolf	8eb029c26e	block: Assert qiov length matches request length At least raw-posix relies on this because it can allocate bounce buffers based on the request length, but access it using all of the qiov entries later. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2014-07-14 12:03:20 +02:00

1 2 3 4

196 Commits