qemu-e2k

Author	SHA1	Message	Date
Max Reitz	b65a5e12a4	block: Add Error parameter to bdrv_find_protocol() The argument given to bdrv_find_protocol() is just a file name, which makes it difficult for the caller to reconstruct what protocol bdrv_find_protocol() was hoping to find. This patch adds an Error parameter to that function to solve this issue. Suggested-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1423162705-32065-4-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-02-16 15:07:18 +00:00
Max Reitz	ca49a4fdb3	block: Add blk_new_open() blk_new_with_bs() creates a BlockBackend with an empty BlockDriverState attached to it. Empty BDSs are not nice, therefore add an alternative function which combines blk_new_with_bs() with bdrv_open(). Note: In contrast to bdrv_open() which takes a BlockDriver parameter, blk_new_open() does not take such a parameter. This is because bdrv_open() opens a BlockDriverState, therefore it is natural to be able to set the BlockDriver for that BDS. The fact that bdrv_open() can open more than a single BDS is merely some form of a byproduct. blk_new_open() on the other hand is intended to be used to create a whole tree of BlockDriverStates. Therefore, setting a single BlockDriver does not make much sense. Instead, the drivers to be used for each of the nodes must be configured through the "options" QDict; including the driver of the root BDS. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1423162705-32065-3-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-02-16 15:07:18 +00:00
Max Reitz	1ef01253eb	block: Lift some BDS functions to the BlockBackend Create the blk_* counterparts for the following bdrv_* functions (which make sense to call on the BlockBackend level): - bdrv_co_write_zeroes() - bdrv_write_compressed() - bdrv_truncate() - bdrv_nb_sectors() - bdrv_discard() - bdrv_load_vmstate() - bdrv_save_vmstate() Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1423162705-32065-2-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-02-16 15:07:18 +00:00
Jeff Cody	a7be17bee8	block: vmdk - fixed sizeof() error The size compared should be PATH_MAX, rather than sizeof(char *). Reported-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 46d873261433f4527e88885582f96942d61758d6.1423592487.git.jcody@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-02-16 15:07:17 +00:00
Bin Wu	141cabe6f1	nbd: fix the co_queue multi-adding bug When we tested the VM migartion between different hosts with NBD devices, we found if we sent a cancel command after the drive_mirror was just started, a coroutine re-enter error would occur. The stack was as follow: (gdb) bt 00) 0x00007fdfc744d885 in raise () from /lib64/libc.so.6 01) 0x00007fdfc744ee61 in abort () from /lib64/libc.so.6 02) 0x00007fdfca467cc5 in qemu_coroutine_enter (co=0x7fdfcaedb400, opaque=0x0) at qemu-coroutine.c:118 03) 0x00007fdfca467f6c in qemu_co_queue_run_restart (co=0x7fdfcaedb400) at qemu-coroutine-lock.c:59 04) 0x00007fdfca467be5 in coroutine_swap (from=0x7fdfcaf3c4e8, to=0x7fdfcaedb400) at qemu-coroutine.c:96 05) 0x00007fdfca467cea in qemu_coroutine_enter (co=0x7fdfcaedb400, opaque=0x0) at qemu-coroutine.c:123 06) 0x00007fdfca467f6c in qemu_co_queue_run_restart (co=0x7fdfcaedbdc0) at qemu-coroutine-lock.c:59 07) 0x00007fdfca467be5 in coroutine_swap (from=0x7fdfcaf3c4e8, to=0x7fdfcaedbdc0) at qemu-coroutine.c:96 08) 0x00007fdfca467cea in qemu_coroutine_enter (co=0x7fdfcaedbdc0, opaque=0x0) at qemu-coroutine.c:123 09) 0x00007fdfca4a1fa4 in nbd_recv_coroutines_enter_all (s=0x7fdfcaef7dd0) at block/nbd-client.c:41 10) 0x00007fdfca4a1ff9 in nbd_teardown_connection (client=0x7fdfcaef7dd0) at block/nbd-client.c:50 11) 0x00007fdfca4a20f0 in nbd_reply_ready (opaque=0x7fdfcaef7dd0) at block/nbd-client.c:92 12) 0x00007fdfca45ed80 in aio_dispatch (ctx=0x7fdfcae15e90) at aio-posix.c:144 13) 0x00007fdfca45ef1b in aio_poll (ctx=0x7fdfcae15e90, blocking=false) at aio-posix.c:222 14) 0x00007fdfca448c34 in aio_ctx_dispatch (source=0x7fdfcae15e90, callback=0x0, user_data=0x0) at async.c:212 15) 0x00007fdfc8f2f69a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0 16) 0x00007fdfca45c391 in glib_pollfds_poll () at main-loop.c:190 17) 0x00007fdfca45c489 in os_host_main_loop_wait (timeout=1483677098) at main-loop.c:235 18) 0x00007fdfca45c57b in main_loop_wait (nonblocking=0) at main-loop.c:484 19) 0x00007fdfca25f403 in main_loop () at vl.c:2249 20) 0x00007fdfca266fc2 in main (argc=42, argv=0x7ffff517d638, envp=0x7ffff517d790) at vl.c:4814 We find the nbd_recv_coroutines_enter_all function (triggered by a cancel command or a network connection breaking down) will enter a coroutine which is waiting for the sending lock. If the lock is still held by another coroutine, the entering coroutine will be added into the co_queue again. Latter, when the lock is released, a coroutine re-enter error will occur. This bug can be fixed simply by delaying the setting of recv_coroutine as suggested by paolo. After applying this patch, we have tested the cancel operation in mirror phase looply for more than 5 hous and everything is fine. Without this patch, a coroutine re-enter error will occur in 5 minutes. Signed-off-by: Bn Wu <wu.wubin@huawei.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1423552846-3896-1-git-send-email-wu.wubin@huawei.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-02-16 15:07:17 +00:00
Max Reitz	f53a829bb9	nbd: Drop BDS backpointer Before this patch, the "opaque" pointer in an NBD BDS points to a BDRVNBDState, which contains an NbdClientSession object, which in turn contains a pointer to the BDS. This pointer may become invalid due to bdrv_swap(), so drop it, and instead pass the BDS directly to the nbd-client.c functions which then retrieve the NbdClientSession object from there. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1423256778-3340-2-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-02-16 14:36:03 +00:00
Programmingkid	728dacbda8	block/raw-posix.c: Fix raw_getlength() on Mac OS X block devices This patch replaces the dummy code in raw_getlength() for block devices on OS X, which always returned LLONG_MAX, with a real implementation that returns the actual block device size. Signed-off-by: John Arbuckle <programmingkidx@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 18:00:53 +01:00
Max Reitz	8c44dfbc62	qcow2: Rewrite qcow2_alloc_bytes() qcow2_alloc_bytes() is a function with insufficient error handling and an unnecessary goto. This patch rewrites it. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:22 +01:00
Alberto Garcia	8e8cb375e0	block: Give always priority to unused entries in the qcow2 L2 cache The current algorithm to replace entries from the L2 cache gives priority to newer hits by dividing the hit count of all existing entries by two everytime there is a cache miss. However, if there are several cache misses the hit count of the existing entries can easily go down to 0. This will result in those entries being replaced even when there are others that have never been used. This problem is more noticeable with larger disk images and cache sizes, since the chances of having several misses before the cache is full are higher. If we make sure that the hit count can never go down to 0 again, unused entries will always have priority. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:22 +01:00
Denis V. Lunev	fa21e6faa6	nbd: fix max_discard/max_transfer_length nbd_co_discard calls nbd_client_session_co_discard which uses uint32_t as the length in bytes of the data to discard due to the following definition: struct nbd_request { uint32_t magic; uint32_t type; uint64_t handle; uint64_t from; uint32_t len; <-- the length of data to be discarded, in bytes } QEMU_PACKED; Thus we should limit bl_max_discard to UINT32_MAX >> BDRV_SECTOR_BITS to avoid overflow. NBD read/write code uses the same structure for transfers. Fix max_transfer_length accordingly. Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Peter Lieven <pl@kamp.de> CC: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:22 +01:00
Max Reitz	1ce52846d3	nbd: Improve error messages This patch makes use of the Error object for nbd_receive_negotiate() so that errors during negotiation look nicer. Furthermore, this patch adds an additional error message if the received magic was wrong, but would be correct for the other protocol version, respectively: So if an export name was specified, but the NBD server magic corresponds to an old handshake, this condition is explicitly signaled to the user, and vice versa. As these messages are now part of the "Could not open image" error message, additional filtering has to be employed in iotest 083, which this patch does as well. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:22 +01:00
Jeff Cody	e729fa6afe	block: fix off-by-one error in qcow and qcow2 This fixes an off-by-one error introduced in `9a29e18`. Both qcow and qcow2 need to make sure to leave room for string terminator '\0' for the backing file, so the max length of the non-terminated string is either 1023 or PATH_MAX - 1. Reported-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:21 +01:00
Stefan Hajnoczi	0adfa1ed65	qed: check for header size overflow Header size is denoted in clusters. The maximum cluster size is 64 MB but there is no limit on header size. Check for uint32_t overflow in case the header size field has a whacky value. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1421065893-18875-2-git-send-email-stefanha@redhat.com Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:21 +01:00
Peter Wu	177b75104d	block/dmg: improve zeroes handling Disk images may contain large all-zeroes gaps (1.66k sectors or 812 MiB is seen in the real world). These blocks (type 2) do not need to be extracted into a temporary buffer, there is no need to allocate memory for these blocks nor to check its length. (For the test image, the maximum uncompressed size is 1054371 bytes, probably for a bzip2-compressed block.) Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 1420566495-13284-13-git-send-email-peter@lekensteyn.nl Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:21 +01:00
Peter Wu	6b383c08c4	block/dmg: support bzip2 block entry types This patch adds support for bzip2-compressed block entries as introduced with OS X 10.4 (source: https://en.wikipedia.org/wiki/Apple_Disk_Image). It was tested against a 5.2G "OS X Yosemite" installation image which stores the BLXX block in the XML property list (instead of resource forks) and has over 5k chunks. New configure entries are added (--enable-bzip2 / --disable-bzip2) to control inclusion of bzip2 functionality (which requires linking against libbz2). The help message suggests that this option is needed for DMG files, but the tests are generic enough that other parts of QEMU can use bzip2 if needed. The identifiers are based on http://newosxbook.com/DMG.html. The decompression routines are based on the zlib case, but as there is no way to reset the decompression state (unlike zlib), memory is allocated and deallocated for every decompression. This should not be problematic as the decompression takes most of the time and as blocks are typically about/over 1 MiB in size, only one allocation is done every 2000 sectors. Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: John Snow <jsnow@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1420566495-13284-12-git-send-email-peter@lekensteyn.nl Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:21 +01:00
Peter Wu	a8b10c6ead	block/dmg: factor out block type check In preparation for adding bzip2 support, split the type check into a separate function. Make all offsets relative to the begin of a chunk such that it is easier to recognize the position without having to add up all offsets. Some comments are added to describe the fields. There is no functional change. Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 1420566495-13284-11-git-send-email-peter@lekensteyn.nl Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:21 +01:00
Peter Wu	66ec3bba97	block/dmg: use SectorNumber from BLKX header Previously the sector table parsing relied on the previous offset of the DMG file. Now it uses the sector number from the BLKX header (see http://newosxbook.com/DMG.html). The implementation of dmg2img (from vu1tur) does not base the output sector on the location of the terminator (0xffffffff) either so it should be safe to drop this dependency on the previous state. (It makes somehow makes sense, a terminator should halt further processing of a block and is perhaps used to preallocate some space.) Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 1420566495-13284-10-git-send-email-peter@lekensteyn.nl Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:21 +01:00
Peter Wu	c6d34865fa	block/dmg: fix sector data offset calculation This patch addresses two issues: - The data fork offset was not taken into account, resulting in failure to read an InstallESD.dmg file (5164763151 bytes) which had a non-zero DataForkOffset field. - The offset of the previous block ("partition") was unconditionally added to the current block because older files would start the input offset of a new block at zero. Newer files (including vlc-2.1.5.dmg, tuxpaint-0.9.15-macosx.dmg and OS X Yosemite [MAS].dmg) failed in reads because these files have chunk offsets, relative to the begin of a data fork. Now the data offset of the mish is taken into account. While we could check that the data_offset is within the data fork, let's not do that here as it would only result in parse failures on invalid files (rather than gracefully handling such bad files). dmg_read will error out if the offset is incorrect. Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 1420566495-13284-9-git-send-email-peter@lekensteyn.nl Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:21 +01:00
Peter Wu	8daf425794	block/dmg: set virtual size to a non-zero value Right now the virtual size is always reported as zero which makes it impossible to convert between formats. After this patch, the number of sectors will be read from the trailer ("koly" block). To verify the behavior, the output of `dmg2img foo.dmg foo.img` was compared against `qemu-img convert -f dmg -O raw foo.dmg foo.raw`. The tests showed that the file contents are exactly the same, except that QEMU creates a slightly larger file (it matches the total sectors count). Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 1420566495-13284-8-git-send-email-peter@lekensteyn.nl Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:21 +01:00
Peter Wu	0599e56ed4	block/dmg: process XML plists The format is simple enough to avoid using a full-blown XML parser. It assumes that all BLKX items begin with the "mish" magic word, therefore it is not a problem if other values get matched which are not a BLKX block. The offsets are based on the description at http://newosxbook.com/DMG.html For compatibility with glib 2.12, use g_base64_decode (which additionally requires an extra buffer allocation) instead of g_base64_decode_inplace (which is only available since glib 2.20). Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 1420566495-13284-7-git-send-email-peter@lekensteyn.nl Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:21 +01:00
Peter Wu	f6e6652d7c	block/dmg: validate chunk size to avoid overflow Previously the chunk size was not checked, allowing for a large memory allocation. This patch checks whether the chunks size is within the resource fork length, and whether the resource fork is below the trailer of the dmg file. Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 1420566495-13284-6-git-send-email-peter@lekensteyn.nl Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:21 +01:00
Peter Wu	7aee37b93a	block/dmg: process a buffer instead of reading ints As the decoded plist XML is not a pointer in the file, dmg_read_mish_block must be able to process a buffer instead of a file pointer. Since the full buffer must be processed, let's change the return value again to just a success flag. Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 1420566495-13284-5-git-send-email-peter@lekensteyn.nl Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:21 +01:00
Peter Wu	b0e8dc5d54	block/dmg: extract processing of resource forks Besides the offset, also read the resource length. This length is now used in the extracted function to verify the end of the resource fork against "count" from the resource fork. Instead of relying on the value of offset to conclude whether the resource fork is available or not (info_begin==0), check the rsrc_fork_length instead. This would allow a dmg file to begin with a resource fork. This seemingly unnecessary restriction was found while trying to craft a DMG file by hand. Other changes: - Do not require resource data offset to be 0x100 (but check that it is within bounds though). - Further improve boundary checking (resource data must be within the resource fork). - Use correct value for resource data length (spotted by John Snow) - Consider the resource data offset when determining info_end. This fixes an EINVAL on the tuxpaint dmg example. The resource fork format is documented at https://developer.apple.com/legacy/library/documentation/mac/pdf/MoreMacintoshToolbox.pdf#page=151 Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 1420566495-13284-4-git-send-email-peter@lekensteyn.nl Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:21 +01:00
Peter Wu	65a1c7c96a	block/dmg: extract mish block decoding functionality Extract the mish block decoder such that this can be used for other formats in the future. A new DmgHeaderState struct is introduced to share state while decoding. The code is kept unchanged as much as possible, a "fail" label is added for example where a simple return would probably do. In dmg_open, the variable "tmp" is renamed to "rsrc_data_offset" for clarity and comments have been added explaining various data. Note that this patch has one subtle difference with the previous version which should not affect functionality. In the previous code, the end of a resource was inferred from the mish block (the offsets would be increased by the fields). In this patch, the resource length is used instead to avoid the need to rely on the previous offsets. Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1420566495-13284-3-git-send-email-peter@lekensteyn.nl Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:21 +01:00
Peter Wu	fa8354bd22	block/dmg: properly detect the UDIF trailer DMG files have a variable length with a UDIF trailer at the end of a file. This UDIF trailer is essential as it describes the contents of the image. At the moment however, the start of this trailer is almost always incorrect as bdrv_getlength() returns a multiple of the block size (rounded up). This results in a failure to recognize DMG files, resulting in Invalid argument (EINVAL) errors. As there is no API to retrieve the real file size, look for the magic header in the last two sectors to find the start of this 512-byte UDIF trailer (the "koly" block). The resource fork offset ("info_begin") has its offset adjusted as the initial value of offset does not mean "end of file" anymore, but "begin of UDIF trailer". [Replaced error_set(errp, ERROR_CLASS_GENERIC_ERROR, ...) with error_setg(errp, ...) as discussed with Peter. --Stefan] Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1420566495-13284-2-git-send-email-peter@lekensteyn.nl Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:21 +01:00
Francesco Romani	e2462113b2	block: add event when disk usage exceeds threshold Managing applications, like oVirt (http://www.ovirt.org), make extensive use of thin-provisioned disk images. To let the guest run smoothly and be not unnecessarily paused, oVirt sets a disk usage threshold (so called 'high water mark') based on the occupation of the device, and automatically extends the image once the threshold is reached or exceeded. In order to detect the crossing of the threshold, oVirt has no choice but aggressively polling the QEMU monitor using the query-blockstats command. This lead to unnecessary system load, and is made even worse under scale: deployments with hundreds of VMs are no longer rare. To fix this, this patch adds: * A new monitor command `block-set-write-threshold', to set a mark for a given block device. * A new event `BLOCK_WRITE_THRESHOLD', to report if a block device usage exceeds the threshold. * A new `write_threshold' field into the `BlockDeviceInfo' structure, to report the configured threshold. This will allow the managing application to use smarter and more efficient monitoring, greatly reducing the need of polling. [Updated qemu-iotests 067 output to add the new 'write_threshold' property. --Stefan] [Changed g_assert_false() to !g_assert() to fix the build on older glib versions. --Kevin] Signed-off-by: Francesco Romani <fromani@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1421068273-692-1-git-send-email-fromani@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:21 +01:00
Peter Lieven	454057b7d9	block-backend: expose bs->bl.max_transfer_length Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:21 +01:00
Peter Lieven	f4564d53c6	block: add accounting for merged requests Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:21 +01:00
Fam Zheng	35f5a49374	qed: Really remove unused field QEDAIOCB.finished The commit `533ffb17a` that removed qed_aiocb_info.cancel said to remove this but didn't do it. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:21 +01:00
Denis V. Lunev	1cdc3239f1	block: use fallocate(FALLOC_FL_PUNCH_HOLE) & fallocate(0) to write zeroes This sequence works efficiently if FALLOC_FL_ZERO_RANGE is not supported. Unfortunately, FALLOC_FL_ZERO_RANGE is supported on really modern systems and only for a couple of filesystems. FALLOC_FL_PUNCH_HOLE is much more mature. The sequence of 2 operations FALLOC_FL_PUNCH_HOLE and 0 is necessary due to the following reasons: - FALLOC_FL_PUNCH_HOLE creates a hole in the file, the file becomes sparse. In order to retain original functionality we must allocate disk space afterwards. This is done using fallocate(0) call - fallocate(0) without preceeding FALLOC_FL_PUNCH_HOLE will do nothing if called above already allocated areas of the file, i.e. the content will not be zeroed This should increase the performance a bit for not-so-modern kernels. CC: Max Reitz <mreitz@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Peter Lieven <pl@kamp.de> CC: Fam Zheng <famz@redhat.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:20 +01:00
Denis V. Lunev	d50d822219	block/raw-posix: call plain fallocate in handle_aiocb_write_zeroes There is a possibility that we are extending our image and thus writing zeroes beyond the end of the file. In this case we do not need to care about the hole to make sure that there is no data in the file under this offset (pre-condition to fallocate(0) to work). We could simply call fallocate(0). This improves the performance of writing zeroes even on really old platforms which do not have even FALLOC_FL_PUNCH_HOLE. Before the patch do_fallocate was used when either CONFIG_FALLOCATE_PUNCH_HOLE or CONFIG_FALLOCATE_ZERO_RANGE are defined. Now the story is different. CONFIG_FALLOCATE is defined when Linux fallocate is defined, posix_fallocate is completely different story (CONFIG_POSIX_FALLOCATE). CONFIG_FALLOCATE is mandatory prerequite for both CONFIG_FALLOCATE_PUNCH_HOLE and CONFIG_FALLOCATE_ZERO_RANGE thus we are on the safe side. CC: Max Reitz <mreitz@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Peter Lieven <pl@kamp.de> CC: Fam Zheng <famz@redhat.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:20 +01:00
Denis V. Lunev	b953f07500	block: use fallocate(FALLOC_FL_ZERO_RANGE) in handle_aiocb_write_zeroes This efficiently writes zeroes on Linux if the kernel is capable enough. FALLOC_FL_ZERO_RANGE correctly handles all cases, including and not including file expansion. CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Peter Lieven <pl@kamp.de> CC: Fam Zheng <famz@redhat.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:20 +01:00
Denis V. Lunev	37cc9f7f68	block/raw-posix: refactor handle_aiocb_write_zeroes a bit move code dealing with a block device to a separate function. This will allow to implement additional processing for ordinary files. Please note, that xfs_code has been moved before checking for s->has_write_zeroes as xfs_write_zeroes does not touch this flag inside. This makes code a bit more consistent. CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Peter Lieven <pl@kamp.de> CC: Fam Zheng <famz@redhat.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:20 +01:00
Denis V. Lunev	0b99171230	block/raw-posix: create do_fallocate helper The pattern do { if (fallocate(s->fd, mode, offset, len) == 0) { return 0; } } while (errno == EINTR); ret = translate_err(-errno); will be commonly useful in next patches. Create helper for it. CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Peter Lieven <pl@kamp.de> CC: Fam Zheng <famz@redhat.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Peter Lieven <pl@kamp.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:20 +01:00
Denis V. Lunev	1486df0e31	block/raw-posix: create translate_err helper to merge errno values actually the code if (ret == -ENODEV \|\| ret == -ENOSYS \|\| ret == -EOPNOTSUPP \|\| ret == -ENOTTY) { ret = -ENOTSUP; } is present twice and will be added a couple more times. Create helper for this. CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Peter Lieven <pl@kamp.de> CC: Fam Zheng <famz@redhat.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Peter Lieven <pl@kamp.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:20 +01:00
Jeff Cody	cdf9634bdf	block: vhdx - force FileOffsetMB field to '0' for certain block states The v1.0.0 spec calls out PAYLOAD_BLOCK_ZERO FileOffsetMB field as being 'reserved'. In practice, this means that Hyper-V will fail to read a disk image with PAYLOAD_BLOCK_ZERO block states with a FileOffsetMB value other than 0. The other states that indicate a block that is not there (PAYLOAD_BLOCK_UNDEFINED, PAYLOAD_BLOCK_NOT_PRESENT, PAYLOAD_BLOCK_UNMAPPED) have multiple options for what FileOffsetMB may be set to, and '0' is explicitly called out as an option. For all the above states, we will also just set the FileOffsetMB value to 0. Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: a9fe92f53f07e6ab1693811e4312c0d1e958500b.1421787566.git.jcody@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2015-01-23 12:41:32 -05:00
Jeff Cody	9a29e18f7d	block: update string sizes for filename,backing_file,exact_filename The string field entries 'filename', 'backing_file', and 'exact_filename' in the BlockDriverState struct are defined as 1024 bytes. However, many places that use these values accept a maximum of PATH_MAX bytes, so we have a mixture of 1024 byte and PATH_MAX byte allocations. This patch makes the BlockDriverStruct field string sizes match usage. This patch also does a few fixes related to the size that needs to happen now: * the block qapi driver is updated to use PATH_MAX bytes * the qcow and qcow2 drivers have an additional safety check * the block vvfat driver is updated to use PATH_MAX bytes for the size of backing_file, for systems where PATH_MAX is < 1024 bytes. * qemu-img uses PATH_MAX rather than 1024. These instances were not changed to be dynamically allocated, however, as the extra temporary 3K in stack usage for qemu-img does not seem worrisome. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-01-23 18:17:06 +01:00
Jeff Cody	1d33936ea8	block: mirror - change string allocation to 2-bytes The backing_filename string in mirror_run() is only used to check for a NULL string, so we don't need to allocate 1024 bytes (or, later, PATH_MAX bytes), when we only need to copy the first 2 characters. We technically only need 1 byte, as we are just checking for NULL, but since backing_filename[] is populated by bdrv_get_backing_filename(), a string size of 1 will always only return '\0'; Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-01-23 18:17:06 +01:00
Jeff Cody	564d64bdde	block: qapi - move string allocation from stack to the heap Rather than declaring 'backing_filename2' on the stack in bdrv_query_image_info(), dynamically allocate it on the heap. Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-01-23 18:17:06 +01:00
Jeff Cody	fe2065629a	block: vmdk - move string allocations from stack to the heap Functions 'vmdk_parse_extents' and 'vmdk_create' allocate several PATH_MAX sized arrays on the stack. Make these dynamically allocated. Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-01-23 18:17:05 +01:00
Jeff Cody	395a22fae0	block: vmdk - make ret variable usage clear Keep the variable 'ret' something that is returned by the function it is defined in. For the return value of 'sscanf', use a more meaningful variable name. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-01-23 18:17:05 +01:00
Max Reitz	8dd93d9339	qcow2: Add two more unalignment checks This adds checks for unaligned L2 table offsets and unaligned data cluster offsets (actually the preallocated offsets for zero clusters) to the zero cluster expansion function. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-01-23 18:17:05 +01:00
Paolo Bonzini	66552b894b	coroutine: drop qemu_coroutine_adjust_pool_size This is not needed anymore. The new TLS-based algorithm is adaptive. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1417518350-6167-7-git-send-email-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-01-13 13:43:29 +00:00
Fam Zheng	c29c1dd312	qmp: Add command 'blockdev-backup' Similar to drive-backup, but this command uses a device id as target instead of creating/opening an image file. Also add blocker on target bs, since the target is also a named device now. Add check and report error for bs == target which became possible but is an illegal case with introduction of blockdev-backup. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-id: 1418899027-8445-3-git-send-email-famz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2015-01-13 11:47:56 +00:00
Vladimir Sementsov-Ogievskiy	c4237dfa63	block: fix spoiling all dirty bitmaps by mirror and migration Mirror and migration use dirty bitmaps for their purposes, and since commit [block: per caller dirty bitmap] they use their own bitmaps, not the global one. But they use old functions bdrv_set_dirty and bdrv_reset_dirty, which change all dirty bitmaps. Named dirty bitmaps series by Fam and Snow are affected: mirroring and migration will spoil all (not related to this mirroring or migration) named dirty bitmaps. This patch fixes this by adding bdrv_set_dirty_bitmap and bdrv_reset_dirty_bitmap, which change concrete bitmap. Also, to prevent such mistakes in future, old functions bdrv_(set,reset)_dirty are made static, for internal block usage. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com> CC: John Snow <jsnow@redhat.com> CC: Fam Zheng <famz@redhat.com> CC: Denis V. Lunev <den@openvz.org> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1417081246-3593-1-git-send-email-vsementsov@parallels.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2015-01-13 11:47:56 +00:00
Max Reitz	1085daf941	block/vmdk: Relative backing file for creation When a vmdk image is created with a backing file, it is opened to check whether it is indeed a vmdk file by letting qemu probe it. When doing so, the backing filename is relative to the image's base directory so it should be interpreted accordingly. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-01-13 11:47:56 +00:00
Max Reitz	9f07429e88	block: JSON filenames and relative backing files When using a relative backing file name, qemu needs to know the directory of the top image file. For JSON filenames, such a directory cannot be easily determined (e.g. how do you determine the directory of a qcow2 BDS directly on top of a quorum BDS?). Therefore, do not allow relative filenames for the backing file of BDSs only having a JSON filename. Furthermore, BDS::exact_filename should be used whenever possible. If BDS::filename is not equal to BDS::exact_filename, the former will always be a JSON object. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-01-13 11:47:56 +00:00
Peter Wu	debfb917a4	block/iscsi: fix uninitialized variable 'ret' was never initialized in the success path. Signed-off-by: Peter Wu <peter@lekensteyn.nl> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-03 09:22:13 +01:00
Paolo Bonzini	82595da8de	linux-aio: simplify removal of completed iocbs from the list There is no need to do another O(n) pass on the list; the iocb to split the list at is already available through the array we passed to io_submit. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1418305950-30924-6-git-send-email-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-12-12 16:57:55 +00:00
Paolo Bonzini	de35464461	linux-aio: drop return code from laio_io_unplug and ioq_submit These are unused. Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1418305950-30924-5-git-send-email-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-12-12 16:57:55 +00:00

1 2 3 4 5 ...

1843 Commits