qemu-e2k

Author	SHA1	Message	Date
Kevin Wolf	02be4aeb93	commit: Set commit_top_bs->aio_context The filter driver that is inserted by the commit job needs to use the same AioContext as its parent and child nodes. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2017-04-07 14:44:05 +02:00
Kevin Wolf	d35ff5e6b3	block: Ignore guest dev permissions during incoming migration Usually guest devices don't like other writers to the same image, so they use blk_set_perm() to prevent this from happening. In the migration phase before the VM is actually running, though, they don't have a problem with writes to the image. On the other hand, storage migration needs to be able to write to the image in this phase, so the restrictive blk_set_perm() call of qdev devices breaks it. This patch flags all BlockBackends with a qdev device as blk->disable_perm during incoming migration, which means that the requested permissions are stored in the BlockBackend, but not actually applied to its root node yet. Once migration has finished and the VM should be resumed, the permissions are applied. If they cannot be applied (e.g. because the NBD server used for block migration hasn't been shut down), resuming the VM fails. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Tested-by: Kashyap Chamarthy <kchamart@redhat.com>	2017-04-07 14:44:05 +02:00
Peter Maydell	87cc4c6102	* MemoryRegionCache revert * glib optimization workaround * fix "info lapic" segfault on isapc * fix QIOChannel memory leak -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQExBAABCAAbBQJY4oOMFBxwYm9uemluaUByZWRoYXQuY29tAAoJEL/70l94x66D AsIH/i52nJw41utJCs5AevnQyqNs9RnyMkZLHiVoi6a+pdJqX+0mCw8gV/5FsbPZ dtyt1tEuYBSu72adr+/ExE4aIEjwzeyRmnUdOkB+iYPxirHKuf4K/JTuLuvMtaQQ Tqj+FU5tx3wx0jlGOm5A7pzjZ680JUex+oaz3d1bZziv3zCyFCIgiZ2m2UAaaPQe fsd3fksJvc0gKOUKmdLUpu2m/xP3hAQAfQ4P/ozOfbVh9V2CVNaQ/cl935tNtdFK aYN3KleW3/ovb+YSexeNoW7QQH/3ZsjronCW5OmbF4FgHoeoV8MUROfNgu1S2bRU Bne9K/6boPzhD8NDEuSy8SXvf7s= =EdXr -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging * MemoryRegionCache revert * glib optimization workaround * fix "info lapic" segfault on isapc * fix QIOChannel memory leak # gpg: Signature made Mon 03 Apr 2017 18:17:00 BST # gpg: using RSA key 0xBFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: main-loop: Acquire main_context lock around os_host_main_loop_wait. exec: revert MemoryRegionCache nbd: fix memory leak on socket_connect failed ipmi: Fix macro issues target-i386: fix "info lapic" segfault on isapc iscsi: drop unused IscsiAIOCB.qiov field Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-04-04 11:40:55 +01:00
Max Reitz	86d1bd7098	block/parallels: Avoid overflows Change the types of variables in allocate_clusters() to int64_t so we do not have to worry about potential overflows. Add an assertion that our accesses to s->bat[] do not result in a buffer overflow and that the implicit conversion performed when invoking bat_entry_off() does not result in an integer overflow. Coverity-id: 1307776 Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20170331170512.10381-1-mreitz@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-04-03 17:11:40 +02:00
Eric Blake	0c1bd4692f	qcow2: Discard unaligned tail when wiping image There is a subtle difference between the fast (qcow2v3 with no extra data) and slow path (qcow2v2 format [aka 0.10], or when a snapshot is present) of qcow2_make_empty(). The slow path fails to discard the final (partial) cluster of an unaligned image. The problem stems from the fact that qcow2_discard_clusters() was silently ignoring sub-cluster head and tail on unaligned requests. A quick audit of all callers shows that qcow2_snapshot_create() has always passed a cluster-aligned request since the call was added in commit 1ebf561; qcow2_co_pdiscard() has passed a cluster-aligned request since commit `ecdbead` taught the block layer about preferred discard alignment; and qcow2_make_empty() was fixed to pass an aligned start (but not necessarily end) in commit `a3e1505`. Asserting that the start is always aligned also points out that we now have a dead check: rounding the end offset down can never result in a value less than the aligned start offset (the check was rendered dead with commit `ecdbead`). Meanwhile, we do not want to round the end cluster down in the one case of the end offset matching the (unaligned) file size - that final partial cluster should still be discarded. With those fixes in place, the fast and slow paths are back in sync at discarding an entire image; the next patch will update qemu-iotests to ensure we don't regress. Note that bdrv_co_pdiscard ignores ALL partial cluster requests, including the partial cluster at the end of an image; it can be argued that the partial cluster at the end should be special-cased so that a guest issuing discard requests at proper alignments everywhere else can likewise empty the entire image. But that optimization is left for another day. Signed-off-by: Eric Blake <eblake@redhat.com> Message-id: 20170331185356.2479-3-eblake@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-04-03 17:11:40 +02:00
Markus Armbruster	d1c136885b	sheepdog: Fix blockdev-add Commit `831acdc` "sheepdog: Implement bdrv_parse_filename()" and commit `d282f34` "sheepdog: Support blockdev-add" have different ideas on how the QemuOpts parameters for the server address are named. Fix that. While there, rename BlockdevOptionsSheepdog member addr to server, for consistency with BlockdevOptionsSsh, BlockdevOptionsGluster, BlockdevOptionsNbd. Commit 831acdc's example becomes --drive driver=sheepdog,server.type=inet,server.host=fido,server.port=7000,vdi=dolly instead of --drive driver=sheepdog,host=fido,vdi=dolly Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Tested-by: Kashyap Chamarthy <kchamart@redhat.com> Message-id: 1490895797-29094-10-git-send-email-armbru@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-04-03 17:11:39 +02:00
Markus Armbruster	9445673ea6	nbd: Tidy up blockdev-add interface SocketAddress is a simple union, and simple unions are awkward: they have their variant members wrapped in a "data" object on the wire, and require additional indirections in C. I intend to limit its use to existing external interfaces, and convert all internal interfaces to SocketAddressFlat. BlockdevOptionsNbd is an external interface using SocketAddress. We already use SocketAddressFlat elsewhere in blockdev-add. Replace it by SocketAddressFlat while we can (it's new in 2.9) for simplicity and consistency. For example, { "execute": "blockdev-add", "arguments": { "node-name": "foo", "driver": "nbd", "server": { "type": "inet", "data": { "host": "localhost", "port": "12345" } } } } becomes { "execute": "blockdev-add", "arguments": { "node-name": "foo", "driver": "nbd", "server": { "type": "inet", "host": "localhost", "port": "12345" } } } Since the internal interfaces still take SocketAddress, this requires conversion function socket_address_crumple(). It'll go away when I update the interfaces. Unfortunately, SocketAddress is also visible in -drive since 2.8: -drive if=none,driver=nbd,server.type=inet,server.data.host=127.0.0.1,server.data.port=12345 Nobody should be using it, as it's fairly new and has never been documented, so adding still more compatibility gunk to keep it working isn't worth the trouble. You now have to use -drive if=none,driver=nbd,server.type=inet,server.host=127.0.0.1,server.port=12345 Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-id: 1490895797-29094-9-git-send-email-armbru@redhat.com [mreitz: Change iotest 147 accordingly] Because of this interface change, iotest 147 has to be adapted. Unfortunately, we cannot just flatten all of the addresses because nbd-server-start still takes a plain SocketAddress. Therefore, we need both and this is most easily achieved by writing the SocketAddress into the code and flattening it where necessary. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20170330221243.17333-1-mreitz@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-04-03 17:11:39 +02:00
Markus Armbruster	8bc0673f6d	qapi-schema: SocketAddressFlat variants 'vsock' and 'fd' Note that the new variants are impossible in qemu_gluster_glfs_init(), because the gconf->server can only come from qemu_gluster_parse_uri() or qemu_gluster_parse_json(), and neither can create anything but 'inet' or 'unix'. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1490895797-29094-7-git-send-email-armbru@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-04-03 17:11:39 +02:00
Markus Armbruster	fce5d5386d	gluster: Prepare for SocketAddressFlat extension qemu_gluster_glfs_init() and qemu_gluster_parse_json() rely on the fact that SocketAddressFlatType has only two members SOCKET_ADDRESS_FLAT_TYPE_INET and SOCKET_ADDRESS_FLAT_TYPE_UNIX. Correct, but won't stay correct. Make them more robust. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 1490895797-29094-6-git-send-email-armbru@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-04-03 17:11:39 +02:00
Markus Armbruster	129c7d1c53	block: Document -drive problematic code and bugs -blockdev and blockdev_add convert their arguments via QObject to BlockdevOptions for qmp_blockdev_add(), which converts them back to QObject, then to a flattened QDict. The QDict's members are typed according to the QAPI schema. -drive converts its argument via QemuOpts to a (flat) QDict. This QDict's members are all QString. Thus, the QType of a flat QDict member depends on whether it comes from -drive or -blockdev/blockdev_add, except when the QAPI type maps to QString, which is the case for 'str' and enumeration types. The block layer core extracts generic configuration from the flat QDict, and the block driver extracts driver-specific configuration. Both commonly do so by converting (parts of) the flat QDict to QemuOpts, which turns all values into strings. Not exactly elegant, but correct. However, A few places access the flat QDict directly: * Most of them access members that are always QString. Correct. * bdrv_open_inherit() accesses a boolean, carefully. Correct. * nfs_config() uses a QObject input visitor. Correct only because the visited type contains nothing but QStrings. * nbd_config() and ssh_config() use a QObject input visitor, and the visited types contain non-QStrings: InetSocketAddress members @numeric, @to, @ipv4, @ipv6. -drive works as long as you don't try to use them (they're all optional). @to is ignored anyway. Reproducer: -drive driver=ssh,server.host=h,server.port=22,server.ipv4,path=p -drive driver=nbd,server.type=inet,server.data.host=h,server.data.port=22,server.data.ipv4 both fail with "Invalid parameter type for 'data.ipv4', expected: boolean" Add suitable comments to all these places. Mark the buggy ones FIXME. "Fortunately", -drive's driver-specific options are entirely undocumented. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-id: 1490895797-29094-5-git-send-email-armbru@redhat.com [mreitz: Fixed two typos] Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-04-03 17:11:39 +02:00
Markus Armbruster	ca0b64e5ed	nbd sockets vnc: Mark problematic address family tests TODO Certain features make sense only with certain address families. For instance, passing file descriptors requires AF_UNIX. Testing SocketAddress's saddr->type == SOCKET_ADDRESS_KIND_UNIX is obvious, but problematic: it can't recognize AF_UNIX when type == SOCKET_ADDRESS_KIND_FD. Mark such tests of saddr->type TODO. We may want to check the address family with getsockname() there. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1490895797-29094-2-git-send-email-armbru@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-04-03 17:11:39 +02:00
yaolujing	cc1e139139	nbd: fix memory leak on socket_connect failed When TCP connection fails between nbd server and client, the local var, sioc, memory leak. This patch fixes the memory leak. Signed-off-by: yaolujing <yaolujing@huawei.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <1491005709-29989-1-git-send-email-yaolujing@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-04-02 21:18:21 +02:00
Stefan Hajnoczi	b9afaba891	iscsi: drop unused IscsiAIOCB.qiov field The IscsiAIOCB.qiov field has been unused since commit `063c3378a9` ("block/iscsi: introduce bdrv_co_{readv, writev, flush_to_disk}") back in 2013. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20170327165005.22038-1-stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-04-02 21:17:47 +02:00
Max Reitz	34634ca286	block/curl: Check protocol prefix If the user has explicitly specified a block driver and thus a protocol, we have to make sure the URL's protocol prefix matches. Otherwise the latter will silently override the former which might catch some users by surprise. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20170331120431.1767-3-mreitz@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-03-31 15:53:22 -04:00
Eric Blake	e98c6961c8	rbd: Fix regression in legacy key/values containing escaped : Commit `c7cacb3` accidentally broke legacy key-value parsing through pseudo-filename parsing of -drive file=rbd://..., for any key that contains an escaped ':'. Such a key is surprisingly common, thanks to mon_host specifying a 'host:port' string. The break happens because passing things from QDict through QemuOpts back to another QDict requires that we pack our parsed key/value pairs into a string, and then reparse that string, but the intermediate string that we created ("key1=value1:key2=value2") lost the \: escaping that was present in the original, so that we could no longer see which : were used as separators vs. those used as part of the original input. Fix it by collecting the key/value pairs through a QList, and sending that list on a round trip through a JSON QString (as in '["key1","value1","key2","value2"]') on its way through QemuOpts, rather than hand-rolling our own string. Since the string is only handled internally, this was faster than creating a full-blown struct of '[{"key1":"value1"},{"key2":"value2"}]', and safer at guaranteeing order compared to '{"key1":"value1","key2":"value2"}'. It would be nicer if we didn't have to round-trip through QemuOpts in the first place, but that's a much bigger task for later. Reproducer: ./x86_64-softmmu/qemu-system-x86_64 -nodefaults -nographic -qmp stdio \ -drive 'file=rbd:volumes/volume-ea141b5c-cdb3-4765-910d-e7008b209a70'\ ':id=compute:key=AQAVkvxXAAAAABAA9ZxWFYdRmV+DSwKr7BKKXg=='\ ':auth_supported=cephx\;none:mon_host=192.168.1.2\:6789'\ ',format=raw,if=none,id=drive-virtio-disk0,'\ 'serial=ea141b5c-cdb3-4765-910d-e7008b209a70,cache=writeback' Even without an RBD setup, this serves a test of whether we get the incorrect parser error of: qemu-system-x86_64: -drive file=rbd:...cache=writeback: conf option 6789 has no value or the correct behavior of hanging while trying to connect to the requested mon_host of 192.168.1.2:6789. Reported-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com> Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170331152730.12514-1-eblake@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-03-31 13:53:57 -04:00
Markus Armbruster	2836284db6	rbd: Fix bugs around -drive parameter "server" qemu_rbd_open() takes option parameters as a flattened QDict, with keys of the form server.%d.host, server.%d.port, where %d counts up from zero. qemu_rbd_array_opts() extracts these values as follows. First, it calls qdict_array_entries() to find the list's length. For each list element, it formats the list's key prefix (e.g. "server.0."), then creates a new QDict holding the options with that key prefix, then converts that to a QemuOpts, so it can finally get the member values from there. If there's one surefire way to make code using QDict more awkward, it's creating more of them and mixing in QemuOpts for good measure. The extraction of keys starting with server.%d into another QDict makes us ignore parameters like server.0.neither-host-nor-port silently. The conversion to QemuOpts abuses runtime_opts, as described a few commits ago. Rewrite to simply get the values straight from the options QDict. Fixes -drive not to crash when server.. are present, but server..host is absent. Fixes -drive to reject invalid server..*. Permits cleaning up runtime_opts. Do that, and fix -drive to reject bogus parameters host and port instead of silently ignoring them. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 1490691368-32099-11-git-send-email-armbru@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-03-28 10:01:21 -04:00
Markus Armbruster	464444fcc1	rbd: Revert -blockdev and -drive parameter auth-supported This reverts half of commit `0a55679`. We're having second thoughts on the QAPI schema (and thus the external interface), and haven't reached consensus, yet. Issues include: * The implementation uses deprecated rados_conf_set() key "auth_supported". No biggie. * The implementation makes -drive silently ignore invalid parameters "auth" and "auth-supported..X" where X isn't "auth". Fixable (in fact I'm going to fix similar bugs around parameter server), so again no biggie. BlockdevOptionsRbd member @password-secret applies only to authentication method cephx. Should it be a variant member of RbdAuthMethod? * BlockdevOptionsRbd member @user could apply to both methods cephx and none, but I'm not sure it's actually used with none. If it isn't, should it be a variant member of RbdAuthMethod? * The client offers a set of authentication methods, not a list. Should the methods be optional members of BlockdevOptionsRbd instead of members of list @auth-supported? The latter begs the question what multiple entries for the same method mean. Trivial question now that RbdAuthMethod contains nothing but @type, but less so when RbdAuthMethod acquires other members, such the ones discussed above. * How BlockdevOptionsRbd member @auth-supported interacts with settings from a configuration file specified with @conf is undocumented. I suspect it's untested, too. Let's avoid painting ourselves into a corner now, and revert the feature for 2.9. Note that users can still configure authentication methods with a configuration file. They probably do that anyway if they use Ceph outside QEMU as well. Further note that this doesn't affect use of key "auth-supported" in -drive file=rbd:...:key=value. qemu_rbd_array_opts()'s parameter @type now must be RBD_MON_HOST, which is silly. This will be cleaned up shortly. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 1490691368-32099-9-git-send-email-armbru@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-03-28 10:01:21 -04:00
Markus Armbruster	078463977a	rbd: Clean up qemu_rbd_create()'s detour through QemuOpts The conversion from QDict to QemuOpts is pointless. Simply get the stuff straight from the QDict. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 1490691368-32099-8-git-send-email-armbru@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-03-28 10:00:57 -04:00
Markus Armbruster	cbf036b4f3	rbd: Clean up runtime_opts, fix -drive to reject filename runtime_opts is used for three different purposes: * qemu_rbd_open() uses it to accept options it recognizes, such as "pool" and "image". Other .bdrv_open() methods do it similarly. * qemu_rbd_open() accepts additional list-valued options auth-supported and server, with the help of qemu_rbd_array_opts(). The list elements are again dictionaries. qemu_rbd_array_opts() uses runtime_opts to accept their members. Thus, runtime_opts contains recognized sub-sub-options "auth", "host", "port" in addition to recognized options. No other block driver does that. * qemu_rbd_create() uses it to convert the QDict produced by qemu_rbd_parse_filename() to QemuOpts. No other block driver does that. The keys produced by qemu_rbd_parse_filename() are "pool", "image", "snapshot", "conf", "user" and "keyvalue-pairs". qemu_rbd_open() accepts these, so no additional ones here. This is a confusing mess. Dates back to commit `0f9d252`. First step to clean it up is documenting runtime_opts.desc[]: * Reorder entries to match the QAPI schema, like we do in other block drivers. * Document why the schema's "server" and "auth-supported" aren't in .desc[]. * Document why "keyvalue-pairs", "host", "port" and "auth" are in .desc[], but not the schema. * Delete "filename", because none of the three users actually uses it. This fixes -drive to reject parameter filename instead of silently ignoring it. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 1490691368-32099-7-git-send-email-armbru@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-03-28 09:53:16 -04:00
Markus Armbruster	82f20e8547	rbd: Don't accept -drive driver=rbd, keyvalue-pairs=... The way we communicate extra key-value pairs from qemu_rbd_parse_filename() to qemu_rbd_open() exposes option parameter "keyvalue-pairs" on the command line. It's not wanted there. Hack: rename the parameter to "=keyvalue-pairs" to make it inaccessible. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 1490691368-32099-6-git-send-email-armbru@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-03-28 09:53:16 -04:00
Markus Armbruster	8efb339dd4	rbd: Clean up after the previous commit This code in qemu_rbd_parse_filename() found_str = qemu_rbd_next_tok(p, '\0', &p); p = found_str; has no effect. Drop it, and simplify qemu_rbd_next_tok(). Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 1490691368-32099-5-git-send-email-armbru@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-03-28 09:53:16 -04:00
Markus Armbruster	730b00bbfd	rbd: Don't limit length of parameter values We laboriously enforce that parameter values are between one and some arbitrary limit in length. Only RBD_MAX_IMAGE_NAME_SIZE comes from librbd.h, and I'm not sure it applies. Where the other limits come from is unclear. Drop the length checking. The limits librbd actually imposes must be checked by librbd anyway. There's one minor complication: BDRVRBDState member name is a fixed-size array. Depends on the length limit. Make it a pointer to a dynamically allocated string. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 1490691368-32099-4-git-send-email-armbru@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-03-28 09:53:16 -04:00
Markus Armbruster	f51c363c2b	rbd: Fix to cleanly reject -drive without pool or image qemu_rbd_open() neglects to check pool and image are present. Missing image is caught by rbd_open(), but missing pool crashes. Reproducer: $ qemu-system-x86_64 -nodefaults -drive driver=rbd,id=rbd,image=i,... terminate called after throwing an instance of 'std::logic_error' what(): basic_string::_M_construct null not valid Aborted (core dumped) where ... is a working server.0.{host,port} configuration. Doesn't affect -drive with file=..., because qemu_rbd_parse_filename() always sets both pool and image. Doesn't affect -blockdev, because pool and image are mandatory in the QAPI schema. Fix by adding the missing checks. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 1490691368-32099-3-git-send-email-armbru@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-03-28 09:53:16 -04:00
Denis V. Lunev	dc62da88b5	parallels: wrong call to bdrv_truncate Parallels driver should not call bdrv_truncate if the image was opened in the read-only mode. Without the patch qemu-img check harddisk.hds asserts with bdrv_truncate: Assertion `child->perm & BLK_PERM_RESIZE' failed. Parameters used on the write path are not needed if the image is opened in the read-only mode. Signed-off-by: Denis V. Lunev <den@openvz.org> Reported-by: Edgar Kaziahmedov <edos@virtuozzo.mipt.ru> Message-id: 1490625488-7980-1-git-send-email-den@openvz.org CC: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-03-28 11:06:00 +01:00
Peter Maydell	eb06c9e2d3	* MTTCG fix for win32 * virtio-scsi assertion failure * mem-prealloc coverity fix * x86 migration revert which requires more thought * x86 instruction limit (avoids >2 page translation blocks) * nbd dead code cleanup * small memory.c logic fix -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQExBAABCAAbBQJY2Te4FBxwYm9uemluaUByZWRoYXQuY29tAAoJEL/70l94x66D GyIH/jMpl0w5cdW2hxzEba5alqALKx8fz8LMFy47lSndifyr74Nbk7fq9u89m9/6 3dz92sOq4ixUt8+eWEHcy0lJqucrStdMWcA7LsSIioXfgbBN39e9NfJFshXKTSQU RSL3M5f5XvYHZqHWhk/GjzlkA2l+Dq2v7FM+DT4HISnP0fjcmGXEfadfUZi6KLao 94xXGs73pTkln9jm8N1pwn3JuJ4+FbEatrvok01nmTbA7VrrBz0zVbTZjhWz7Tu/ sqBuIBAnPNKhYZFhF8GnNrXUaIciCbw13QdT047JSfpdkSQ7IUfGt7mW48X0+q9z JCHTiTZ35d7/lqeMojgl9ANUDpk= =iED8 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging * MTTCG fix for win32 * virtio-scsi assertion failure * mem-prealloc coverity fix * x86 migration revert which requires more thought * x86 instruction limit (avoids >2 page translation blocks) * nbd dead code cleanup * small memory.c logic fix # gpg: Signature made Mon 27 Mar 2017 17:03:04 BST # gpg: using RSA key 0xBFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: scsi-generic: Fill in opt_xfer_len in INQUIRY reply if it is zero Revert "apic: save apic_delivered flag" nbd: drop unused NBDClientSession.is_unix field win32: replace custom mutex and condition variable with native primitives mem-prealloc: fix sysconf(_SC_NPROCESSORS_ONLN) failure case. tcg/i386: Check the size of instruction being translated virtio-scsi: Fix acquire/release in dataplane handlers virtio-scsi: Make virtio_scsi_acquire/release public clear pending status before calling memory commit Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-03-27 17:34:50 +01:00
Peter Maydell	700f9ce0f9	block/file-posix.c: Fix unused variable warning on OpenBSD On OpenBSD none of the ioctls probe_logical_blocksize() tries exist, so the variable sector_size is unused. Refactor the code to avoid this (and reduce the duplicated code). Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 1490279788-12995-1-git-send-email-peter.maydell@linaro.org Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-03-27 17:28:34 +02:00
Kevin Wolf	e5bcf967fb	file-posix: Make bdrv_flush() failure permanent without O_DIRECT Success for bdrv_flush() means that all previously written data is safe on disk. For fdatasync(), the best semantics we can hope for on Linux (without O_DIRECT) is that all data that was written since the last call was successfully written back. Therefore, and because we can't redo all writes after a flush failure, we have to give up after a single fdatasync() failure. After this failure, we would never be able to make the promise that a successful bdrv_flush() makes. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-id: 20170322210005.16533-1-kwolf@redhat.com Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-03-27 16:53:42 +02:00
Paolo Bonzini	a12a712a7d	nbd-client: fix handling of hungup connections After the switch to reading replies in a coroutine, nothing is reentering pending receive coroutines if the connection hangs. Move nbd_recv_coroutines_enter_all to the reply read coroutine, which is the place where hangups are detected. nbd_teardown_connection can simply wait for the reply read coroutine to detect the hangup and clean up after itself. This wouldn't be enough though because nbd_receive_reply returns 0 (rather than -EPIPE or similar) when reading from a hung connection. Fix the return value check in nbd_read_reply_entry. This fixes qemu-iotests 083. Reported-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20170314111157.14464-1-pbonzini@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-03-27 16:50:36 +02:00
Stefan Hajnoczi	e4548bb640	nbd: drop unused NBDClientSession.is_unix field Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20170327123223.1199-1-stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-03-27 14:41:01 +02:00
Eric Blake	67adf4b398	trace: Fix backwards mirror_yield parameters block/trace-events lists the parameters for mirror_yield consistently with other mirror events (cnt just after s, like in mirror_before_sleep; in_flight last, like in mirror_yield_in_flight). But the callers were passing parameters in the wrong order, leading to poor trace messages, including type truncation when there are more than 4G dirty sectors involved. Broken since its introduction in commit `bd48bde`. While touching this, ensure that all callers use the same type (uint64_t) for cnt, as a later patch will enable the compiler to do stricter type-checking. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-03-24 09:21:42 +00:00
Peter Maydell	e6ebebc204	-----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJY0rRYAAoJEL2+eyfA3jBXEvkQAJSNErQOEdqBoX/gSjeYSX85 PGp+fUrIux0HIYeKySShnsJ3Z1AuIHCogtcfEyHzTo8cDljZssgS4BRKy41ZnNaM 91Q91MgVyAEtwzApg5WNwWhTB7QDkbz7J75mTk74KPN6y9uKNbjSBRSnH4ZbIH/p L3tk6eGpHWf3N0UvoifoKpExlq0A+AYkisuZn7D9C+bBDEnEUWYRcvfEk3sKrZD/ XikclGwNSPKmdBeYenzlLHFA8WyGe85pkys6QRPeRL1l8dDBBPt1so2y4PLzaEBO fImh+ivrHHbKI5TD0RoRVsY4qi9bbH8dK1gDp0oT8uZpwIsO4EWRHA1GZRq6lVIw j7a+p/ZFBiVa2WFvWpicZppRwnkuuswqqm4NVsC1djSMoDvPeO2T24YlcRPYeYrp FVlY04HpP195mw3e7VVWlirRQY+Jo5IwJkSOUKM4xOZpKY/prS2kqT+KQq2bYK5a t3MTKwT04q/7eBtPFoJFf3gwI4q8hyizPtf4X0AN5/YREwJh7J4azQSLEJSjlo2F 37TbMqGVNQPBtwXWnfK2mi12NIHCaP/clh8hqqrQE6EdjFQQcdD5j5df5syLalTK qy+IbxpvoyNt0niXstXI62RnKDbfwsz8YtYYjVIUfv9VkpyQU1gHak7VeodRyHjz zuINtr0Jrmr47n8d9qTj =Gi6q -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/cody/tags/block-pull-request' into staging # gpg: Signature made Wed 22 Mar 2017 17:28:56 GMT # gpg: using RSA key 0xBDBE7B27C0DE3057 # gpg: Good signature from "Jeffrey Cody <jcody@redhat.com>" # gpg: aka "Jeffrey Cody <jeff@codyprime.org>" # gpg: aka "Jeffrey Cody <codyprime@gmail.com>" # Primary key fingerprint: 9957 4B4D 3474 90E7 9D98 D624 BDBE 7B27 C0DE 3057 * remotes/cody/tags/block-pull-request: blockjob: add devops to blockjob backends block-backend: add drained_begin / drained_end ops blockjob: add block_job_start_shim blockjob: avoid recursive AioContext locking Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-03-23 11:39:53 +00:00
John Snow	f4d9cc88ee	block-backend: add drained_begin / drained_end ops Allow block backends to forward drain requests to their devices/users. The initial intended purpose for this patch is to allow BBs to forward requests along to BlockJobs, which will want to pause if their associated BB has entered a drained region. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 20170316212351.13797-3-jsnow@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-03-22 13:26:27 -04:00
Edgar Kaziahmedov	ff5bbe56c6	parallels: fix default options parsing parallels block driver is completely broken since commit commit `75cdcd1553` Author: Markus Armbruster <armbru@redhat.com> Date: Tue Feb 21 21:14:08 2017 +0100 option: Fix checking of sizes for overflow and trailing crap Right now even simple qemu-io -c "read 512 64k" 1.hds ends up with Unexpected error in parse_option_size() at util/qemu-option.c:188: Parameter 'prealloc-size' expects a non-negative number below 2^64 Aborted (core dumped) The cure is simple - we should use 'M' as a suffix in default option value instead of 'MiB'. Signed-off-by: Edgar Kaziahmedov <edos@virtuozzo.mipt.ru> Signed-off-by: Denis V. Lunev <den@openvz.org> Message-id: 1490002022-22653-1-git-send-email-den@openvz.org CC: Markus Armbruster <armbru@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-03-21 10:02:36 +00:00
Paolo Bonzini	eb048026aa	curl: fix compilation on OpenBSD EPROTO is not found in OpenBSD. We usually use EIO when no better errno is available, do that here too. Reported-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20170317152412.8472-1-pbonzini@redhat.com Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Tested-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-03-17 18:27:14 +00:00
Fam Zheng	c1cef67251	block: Always call bdrv_child_check_perm first bdrv_child_set_perm alone is not very usable because the caller must call bdrv_child_check_perm first. This is already encapsulated conveniently in bdrv_child_try_set_perm, so remove the other prototypes from the header and fix the one wrong caller, block/mirror.c. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-03-17 12:54:06 +01:00
Fam Zheng	fed414df9d	file-posix: Don't leak fd in hdev_get_max_segments This fixes a leaked fd introduced in commit `9103f1ce`. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-03-17 12:54:06 +01:00
Changlong Xie	37a9051cc7	replication: clarify permissions Even if hidden_disk, secondary_disk are backing files, they all need write permissions in replication scenario. Otherwise we will encouter below exceptions on secondary side during adding nbd server: {'execute': 'nbd-server-add', 'arguments': {'device': 'colo-disk', 'writable': true } } {"error": {"class": "GenericError", "desc": "Conflicts with use by hidden-qcow2-driver as 'backing', which does not allow 'write' on sec-qcow2-driver-for-nbd"}} CC: Zhang Hailiang <zhang.zhanghailiang@huawei.com> CC: Zhang Chen <zhangchen.fnst@cn.fujitsu.com> CC: Wen Congyang <wencongyang2@huawei.com> Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-03-17 12:54:06 +01:00
Stefan Hajnoczi	6958349085	file-posix: clean up max_segments buffer termination The following pattern is unsafe: char buf[32]; ret = read(fd, buf, sizeof(buf)); ... buf[ret] = 0; If read(2) returns 32 then a byte beyond the end of the buffer is zeroed. In practice this buffer overflow does not occur because the sysfs max_segments file only contains an unsigned short + '\n'. The string is always shorter than 32 bytes. Regardless, avoid this pattern because static analysis tools might complain and it could lead to real buffer overflows if copy-pasted elsewhere in the codebase. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-03-17 12:54:06 +01:00
Kevin Wolf	dcbf37ce41	commit: Implement .bdrv_refresh_filename We want query-block to return the right filename, even if a commit job put a bdrv_commit_top on top of the actual image format driver. Let bdrv_commit_top.bdrv_refresh_filename get the filename from its backing file. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-03-13 12:49:33 +01:00
Kevin Wolf	fd4a6493bb	mirror: Implement .bdrv_refresh_filename We want query-block to return the right filename, even if a mirror job put a bdrv_mirror_top on top of the actual image format driver. Let bdrv_mirror_top.bdrv_refresh_filename get the filename from its backing file. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-03-13 12:49:33 +01:00
Kevin Wolf	9196565866	commit: Implement bdrv_commit_top.bdrv_co_get_block_status In some cases, bdrv_co_get_block_status() is called recursively for the whole backing chain. The automatically inserted bdrv_commit_top filter driver must not stop the recursion, so implement a callback that simply forwards the request to bs->backing. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-03-13 12:49:33 +01:00
Kevin Wolf	b64aa44195	block: Request block status from *file for BDRV_BLOCK_RAW This fixes bdrv_co_get_block_status() for the bdrv_mirror_top block driver, which must fall through to bs->backing instead of bs->file. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-03-13 12:49:33 +01:00
Eric Blake	6f712ee080	vvfat: React to bdrv_is_allocated() errors If bdrv_is_allocated() fails, we should react to that failure. For 2 of the 3 callers, reporting the error was easy. But in cluster_was_modified() and its lone caller get_cluster_count_for_direntry(), it's rather invasive to update the logic to pass the error back; so there, I went with merely documenting the issue by changing the return type to bool (in all likelihood, treating the cluster as modified will then trigger a read which will also fail, and eventually get to an error - but given the appalling number of abort() calls in this code, I'm not making it any worse). Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-03-13 12:49:33 +01:00
Eric Blake	666a9543fa	backup: React to bdrv_is_allocated() errors If bdrv_is_allocated() fails, we should immediately do the backup error action, rather than attempting backup_do_cow() (although that will likely fail too). Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-03-13 12:49:33 +01:00
Eric Blake	e32ccbc6e9	block: Drop unmaintained 'archipelago' driver The driver has failed to build since commit `da34e65`, in qemu 2.6, due to a missing include of qapi/error.h for error_setg(). Since no one has complained in three releases, it is easier to remove the dead code than to keep it around, especially since it is not being built by default and therefore prone to bitrot. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-03-13 12:49:33 +01:00
Fam Zheng	9103f1ceb4	file-posix: Consider max_segments for BlockLimits.max_transfer BlockLimits.max_transfer can be too high without this fix, guest will encounter I/O error or even get paused with werror=stop or rerror=stop. The cause is explained below. Linux has a separate limit, /sys/block/.../queue/max_segments, which in the worst case can be more restrictive than the BLKSECTGET which we already consider (note that they are two different things). So, the failure scenario before this patch is: 1) host device has max_sectors_kb = 4096 and max_segments = 64; 2) guest learns max_sectors_kb limit from QEMU, but doesn't know max_segments; 3) guest issues e.g. a 512KB request thinking it's okay, but actually it's not, because it will be passed through to host device as an SG_IO req that has niov > 64; 4) host kernel doesn't like the segmenting of the request, and returns -EINVAL; This patch checks the max_segments sysfs entry for the host device and calculates a "conservative" bytes limit using the page size, which is then merged into the existing max_transfer limit. Guest will discover this from the usual virtual block device interfaces. (In the case of scsi-generic, it will be done in the INQUIRY reply interception in device model.) The other possibility is to actually propagate it as a separate limit, but it's not better. On the one hand, there is a big complication: the limit is per-LUN in QEMU PoV (because we can attach LUNs from different host HBAs to the same virtio-scsi bus), but the channel to communicate it in a per-LUN manner is missing down the stack; on the other hand, two limits versus one doesn't change much about the valid size of I/O (because guest has no control over host segmenting). Also, the idea to fall back to bounce buffering in QEMU, upon -EINVAL, was explored. Unfortunately there is no neat way to ensure the bounce buffer is less segmented (in terms of DMA addr) than the guest buffer. Practically, this bug is not very common. It is only reported on a Emulex (lpfc), so it's okay to get it fixed in the easier way. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-03-13 12:49:33 +01:00
Vladimir Sementsov-Ogievskiy	a410a7f1af	backup: allow target without .bdrv_get_info Currently backup to nbd target is broken, as nbd doesn't have .bdrv_get_info realization. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-03-13 12:49:33 +01:00
Fam Zheng	b69f00dde4	commit: Don't use error_abort in commit_start bdrv_set_backing_hd failure needn't be abort. Since we already have error parameter, use it. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-03-07 14:53:29 +01:00
Fam Zheng	50bfbe93b2	block: Don't use error_abort in blk_new_open We have an errp and bdrv_root_attach_child can fail permission check, error_abort is not the best choice here. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-03-07 14:53:29 +01:00
Markus Armbruster	c5f1ae3ae7	qapi-schema: Rename SocketAddressFlat's variant tcp to inet QAPI type SocketAddressFlat differs from SocketAddress pointlessly: the discriminator value for variant InetSocketAddress is 'tcp' instead of 'inet'. Rename. The type is so far only used by the Gluster block drivers. Take care to keep 'tcp' working in things like -drive's file.server.0.type=tcp. The "gluster+tcp" URI scheme in pseudo-filenames stays the same. blockdev-add changes, but it has changed incompatibly since 2.8 already. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-03-07 14:53:29 +01:00
Markus Armbruster	2b733709d7	qapi-schema: Rename GlusterServer to SocketAddressFlat As its documentation says, it's not specific to Gluster. Rename it, as I'm going to use it for something else. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-03-07 14:53:29 +01:00
Markus Armbruster	85a82e852d	gluster: Plug memory leaks in qemu_gluster_parse_json() To reproduce, run $ valgrind qemu-system-x86_64 --nodefaults -S --drive driver=gluster,volume=testvol,path=/a/b/c,server.0.type=xxx Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-03-07 14:53:29 +01:00
Markus Armbruster	e152ef7a98	gluster: Don't duplicate qapi-util.c's qapi_enum_parse() Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-03-07 14:53:29 +01:00
Markus Armbruster	fc29458dee	gluster: Drop assumptions on SocketTransport names qemu_gluster_glfs_init() passes the names of QAPI enumeration type SocketTransport to glfs_set_volfile_server(). Works, because they were chosen to match. But the coupling is artificial. Use the appropriate literal strings instead. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-03-07 14:53:28 +01:00
Markus Armbruster	831acdc95e	sheepdog: Implement bdrv_parse_filename() This permits configuration with driver-specific options in addition to pseudo-filename parsed as URI. For instance, --drive driver=sheepdog,host=fido,vdi=dolly instead of --drive driver=sheepdog,file=sheepdog://fido/dolly It's also a first step towards supporting blockdev-add. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-03-07 14:53:28 +01:00
Markus Armbruster	8ecc2f9eab	sheepdog: Use SocketAddress and socket_connect() sd_parse_uri() builds a string from host and port parts for inet_connect(). inet_connect() parses it into host, port and options. Whether this gets exactly the same host, port and no options for all inputs is not obvious. Cut out the string middleman and build a SocketAddress for socket_connect() instead. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-03-07 14:53:28 +01:00
Markus Armbruster	36bcac16fd	sheepdog: Report errors in pseudo-filename more usefully Errors in the pseudo-filename are all reported with the same laconic "Can't parse filename" message. Add real error reporting, such as: $ qemu-system-x86_64 --drive driver=sheepdog,filename=sheepdog:/// qemu-system-x86_64: --drive driver=sheepdog,filename=sheepdog:///: missing file path in URI $ qemu-system-x86_64 --drive driver=sheepdog,filename=sheepgod:///vdi qemu-system-x86_64: --drive driver=sheepdog,filename=sheepgod:///vdi: URI scheme must be 'sheepdog', 'sheepdog+tcp', or 'sheepdog+unix' $ qemu-system-x86_64 --drive driver=sheepdog,filename=sheepdog+unix:///vdi?socke=sheepdog.sock qemu-system-x86_64: --drive driver=sheepdog,filename=sheepdog+unix:///vdi?socke=sheepdog.sock: unexpected query parameters The code to translate legacy syntax to URI fails to escape URI meta-characters. The new error messages are misleading then. Replace them by the old "Can't parse filename" message. "Internal error" would be more honest. Anyway, no worse than before. Also add a FIXME comment. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-03-07 14:53:28 +01:00
Markus Armbruster	daa0b0d4b1	sheepdog: Don't truncate long VDI name in _open(), _create() sd_parse_uri() truncates long VDI names silently. Reject them instead. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-03-07 14:53:28 +01:00
Markus Armbruster	89e2a31d33	sheepdog: Fix snapshot ID parsing in _open(), _create, _goto() sd_parse_uri() and sd_snapshot_goto() screw up error checking after strtoul(), and truncate long tag names silently. Fix by replacing those parts by new sd_parse_snapid_or_tag(), which checks more carefully. sd_snapshot_delete() also parses snapshot IDs, but is currently too broken for me to touch. Mark TODO. Two calls of strtol() without error checking remain in parse_redundancy(). Mark them FIXME. More silent truncation of configuration strings remains elsewhere. Not marked. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-03-07 14:53:28 +01:00
Markus Armbruster	a0dc0e2bfe	sheepdog: Mark sd_snapshot_delete() lossage FIXME sd_snapshot_delete() should delete the snapshot whose ID matches @snapshot_id and whose name matches @name. But that's not what it does. If @snapshot_id is a valid ID, it deletes the snapshot with that ID, else it deletes the snapshot with that name. It doesn't use @name at all. Add suitable FIXME comments, so someone who actually knows Sheepdog can fix it. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-03-07 14:53:28 +01:00
Markus Armbruster	48d7c4af06	sheepdog: Fix error handling sd_create() As a bdrv_create() method, sd_create() must set an error and return negative errno on failure. It prints the error instead of setting it when connect_to_sdog() fails. Fix that. While there, return the value of connect_to_sdog() like we do elsewhere, instead of -EIO. No functional change, as connect_to_sdog() returns no other error code. Many more suspicious uses of error_report() and error_report_err() remain in other functions. Left for another day. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-03-07 14:53:28 +01:00
Markus Armbruster	e25cad6921	sheepdog: Fix error handling in sd_snapshot_delete() As a bdrv_snapshot_delete() method, sd_snapshot_delete() must set an error and return negative errno on failure. It sometimes returns -1, and sometimes neglects to set an error. It also prints error messages with error_report(). Fix all that. Moreover, its handling of an attempt to delete a nonexistent snapshot is wrong: it error_report()s and succeeds. Fix it to set an error and return -ENOENT instead. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-03-07 14:53:28 +01:00
Markus Armbruster	cbc488ee2a	sheepdog: Defuse time bomb in sd_open() error handling When qemu_opts_absorb_qdict() fails, sd_open() closes stdin, because sd->fd is still zero. Fortunately, qemu_opts_absorb_qdict() can't fail, because: 1. it only fails when qemu_opt_parse() fails, and 2. the only member of runtime_opts.desc[] is a QEMU_OPT_STRING, and 3. qemu_opt_parse() can't fail for QEMU_OPT_STRING. Defuse this ticking time bomb by jumping behind the file descriptor cleanup on error. Also do that for the error paths where sd->fd is still -1. The file descriptor cleanup happens to do nothing then, but let's not rely on that here. While there, rename label out to err, because it's on the error path, not the normal path out of the function. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-03-07 14:53:28 +01:00
Kevin Wolf	5fe31c25cc	block: Fix error handling in bdrv_replace_in_backing_chain() When adding an Error parameter, bdrv_replace_in_backing_chain() would become nothing more than a wrapper around change_parent_backing_link(). So make the latter public, renamed as bdrv_replace_node(), and remove bdrv_replace_in_backing_chain(). Most of the callers just remove a node from the graph that they just inserted, so they can use &error_abort, but completion of a mirror job with 'replaces' set can actually fail. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-03-07 14:53:28 +01:00
Kevin Wolf	88f9d1b3d2	mirror: Fix error path for dirty bitmap creation mirror_top_bs must be removed from the graph again when creating the dirty bitmap fails. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-03-07 14:53:28 +01:00
Kevin Wolf	0bf74767ff	mirror: Fix permissions for removing mirror_top_bs mirror_top_bs takes write permissions on its backing file, which can make it impossible to attach that backing file node to another parent. However, this is exactly what needs to be done in order to remove mirror_top_bs from the backing chain. So give up the write permission first. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-03-07 14:53:28 +01:00
Kevin Wolf	7d9fcb391c	mirror: Fix permission problem with 'replaces' The 'replaces' option of drive-mirror can be used to mirror a Quorum node to a new image and then let the target image replace one of the Quorum children. In order for this graph modification to succeed, the mirror job needs to lift its restrictions on the target node first before actually replacing the child. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-03-07 14:53:28 +01:00
Kevin Wolf	b247767aac	commit: Fix error handling Apparently some kind of mismerge happened in commit `8dfba279`, which broke the error handling without any real reason by removing the assignment of the return value to ret in a blk_insert_bs() call. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>	2017-03-07 14:53:28 +01:00
Markus Armbruster	048abb7b20	qapi: Drop unused non-strict qobject input visitor The split between tests/test-qobject-input-visitor.c and tests/test-qobject-input-strict.c now makes less sense than ever. The next commit will take care of that. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <1488544368-30622-20-git-send-email-armbru@redhat.com>	2017-03-05 09:14:19 +01:00
Paolo Bonzini	f6eb0b319e	iscsi: fix missing unlock Reported by Coverity. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-03-03 16:41:20 +01:00
Peter Maydell	54639aed97	-----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJYt5TDAAoJEL2+eyfA3jBXL7UP/ROAc7tc0e4F2+5lPE88bLfJ i4dBI092Qt4LG73POoQDbgpYahs5ywLYE7MAJP2GmgiiUOj4XFN/K/d6wJluqrGj mLiOTlMcYmr2Kbi5CMfZBWlkd27GD/g1knCzpqtAOpM/zIxAwnRr4Mq8RXAHSgio tTvgtNr5JG9tYU088COwY7otr6WDhIMwESN5KQcLiOxmo4GtNkpBmpwfMdsgij9X mxfjQSdWzutITgsA28JiOMfWMbCiLYkwUgo/KCVIG4aB9mNKgEOx6YTIuGXspx2E 8DaGLW54KagsuAhEHFCKFecmYCRjaDCZhwcMEEdif+mn1k1bJGtjK+roNFvn2vEy 8qdKDTn4B7f6uR0fSGNviti04NNpCZTlPmhWpupqRWetY9vCdYNoRjtxRgh6t7wd rziBkN7FW/yxvr2GEXb5lLtH0YN+YTOw6FIXoRS8zKoNNtNuJaKxymfQ57VHxF0V BG9A+XMAIPgCudoqtBIpyaNxlByqhFKm6Z7aMNzzdZoohANtQxsAO6mAxFKk0yTp 3zhOywglHd8wtmTVnZEtQTpdqk/Swd9/qCig6hHq+3U3QydyrHM7ZZbB/Z22xiaO 53DDiLvPr8VbwCZQcpjXg0YOcQXTEMY/xzxDn5AQg10nyWYBr7HKKTxfmT1aew/N CRQi8CycU0TVneRnPMsg =LvuA -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/cody/tags/block-pull-request' into staging # gpg: Signature made Thu 02 Mar 2017 03:42:59 GMT # gpg: using RSA key 0xBDBE7B27C0DE3057 # gpg: Good signature from "Jeffrey Cody <jcody@redhat.com>" # gpg: aka "Jeffrey Cody <jeff@codyprime.org>" # gpg: aka "Jeffrey Cody <codyprime@gmail.com>" # Primary key fingerprint: 9957 4B4D 3474 90E7 9D98 D624 BDBE 7B27 C0DE 3057 * remotes/cody/tags/block-pull-request: block/rbd: add support for 'mon_host', 'auth_supported' via QAPI block/rbd: add blockdev-add support block/rbd: parse all options via bdrv_parse_filename block/rbd: add all the currently supported runtime_opts block/rbd: don't copy strings in qemu_rbd_next_tok() Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-03-02 23:20:37 +00:00
Jeff Cody	0a55679b4a	block/rbd: add support for 'mon_host', 'auth_supported' via QAPI This adds support for three additional options that may be specified by QAPI in blockdev-add: server: host, port auth method: either 'cephx' or 'none' The "server" and "auth-supported" QAPI parameters are arrays. To conform with the rados API, the array items are join as a single string with a ';' character as a delimiter when setting the configuration values. Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-03-01 22:39:25 -05:00
Kevin Wolf	b2c2832c61	block: Add Error parameter to bdrv_append() Aborting on error in bdrv_append() isn't correct. This patch fixes it and lets the callers handle failures. Test case 085 needs a reference output update. This is caused by the reversed order of bdrv_set_backing_hd() and change_parent_backing_link() in bdrv_append(): When the backing file of the new node is set, the parent nodes are still pointing to the old top, so the backing blocker is now initialised with the node name rather than the BlockBackend name. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2017-02-28 20:47:51 +01:00
Kevin Wolf	12fa4af61f	block: Add Error parameter to bdrv_set_backing_hd() Not all callers of bdrv_set_backing_hd() know for sure that attaching the backing file will be allowed by the permission system. Return the error from the function rather than aborting. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2017-02-28 20:47:51 +01:00
Kevin Wolf	c8f6d58edb	block: Assertions for resize permission This adds an assertion that ensures that the necessary resize permission has been granted before bdrv_truncate() is called. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>	2017-02-28 20:47:50 +01:00
Kevin Wolf	afa4b29323	block: Assertions for write permissions This adds assertions that ensure that the necessary write permissions have been granted before someone attempts to write to a node. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2017-02-28 20:47:50 +01:00
Kevin Wolf	85c97ca7a1	block: Pass BdrvChild to bdrv_aligned_preadv/pwritev and copy-on-read This is where we want to check the permissions, so we need to have the BdrvChild around where they are stored. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2017-02-28 20:47:50 +01:00
Kevin Wolf	887354bd13	hmp: Request permissions in qemu-io The HMP command 'qemu-io' is a bit tricky because it wants to work on the original BlockBackend, but additional permissions could be required. The details are explained in a comment in the code, but in summary, just request whatever permissions the current qemu-io command needs. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>	2017-02-28 20:47:50 +01:00
Kevin Wolf	0db832f42e	commit: Add filter-node-name to block-commit Management tools need to be able to know about every node in the graph and need a way to address them. Changing the graph structure was okay because libvirt doesn't really manage the node level yet, but future libvirt versions need to deal with both new and old version of qemu. This new option to blockdev-commit allows the client to set a node-name for the automatically inserted filter driver, and at the same time serves as a witness for a future libvirt that this version of qemu does automatically insert a filter driver. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2017-02-28 20:47:50 +01:00
Kevin Wolf	6cdbceb12c	mirror: Add filter-node-name to blockdev-mirror Management tools need to be able to know about every node in the graph and need a way to address them. Changing the graph structure was okay because libvirt doesn't really manage the node level yet, but future libvirt versions need to deal with both new and old version of qemu. This new option to blockdev-mirror allows the client to set a node-name for the automatically inserted filter driver, and at the same time serves as a witness for a future libvirt that this version of qemu does automatically insert a filter driver. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>	2017-02-28 20:47:50 +01:00
Kevin Wolf	a170a91fd3	stream: Use real permissions in streaming block job The correct permissions are relatively obvious here (and explained in code comments). For intermediate streaming, we need to reopen the top node read-write before creating the job now because the permissions system catches attempts to get the BLK_PERM_WRITE_UNCHANGED permission on a read-only node. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>	2017-02-28 20:47:50 +01:00
Kevin Wolf	4ef85a9c23	mirror: Use real permissions in mirror/active commit block job The mirror block job is mainly used for two different scenarios: Mirroring to an otherwise unused, independent target node, or for active commit where the target node is part of the backing chain of the source. Similarly to the commit block job patch, we need to insert a new filter node to keep the permissions correct during active commit. Note that one change this implies is that job->blk points to mirror_top_bs as its root now, and mirror_top_bs (rather than the actual source node) contains the bs->job pointer. This requires qemu-img commit to get the job by name now rather than just taking bs->job. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Fam Zheng <famz@redhat.com> Acked-by: Max Reitz <mreitz@redhat.com>	2017-02-28 20:47:46 +01:00
Kevin Wolf	4e9e4323d5	backup: Use real permissions in backup block job The backup block job doesn't have very complicated requirements: It needs to read from the source and write to the target, but it's fine with either side being changed. The only restriction is that we can't resize the image because the job uses a cached value. qemu-iotests 055 needs to be changed because it used a target which was already attached to a virtio-blk device. The permission system correctly forbids this (virtio-blk can't accept another writer with its default share-rw=off). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>	2017-02-28 20:40:37 +01:00
Kevin Wolf	d3f0675922	commit: Use real permissions for HMP 'commit' This is a little simpler than the commit block job because it's synchronous and only commits into the immediate backing file, but otherwise doing more or less the same. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>	2017-02-28 20:40:37 +01:00
Kevin Wolf	8dfba27977	commit: Use real permissions in commit block job This is probably one of the most interesting conversions to the new op blocker system because a commit block job intentionally leaves some intermediate block nodes in the backing chain that aren't valid on their own any more; only the whole chain together results in a valid view. In order to provide the 'consistent read' permission to the parents of the 'top' node of the commit job, a new filter block driver is inserted above 'top' which doesn't require 'consistent read' on its backing chain. Subsequently, the commit job can block 'consistent read' on all intermediate nodes without causing a conflict. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2017-02-28 20:40:37 +01:00
Kevin Wolf	76d554e20b	blockjob: Add permissions to block_job_add_bdrv() Block jobs don't actually do I/O through the the reference they create with block_job_add_bdrv(), but they might want to use the permisssion system to express what the block job does to intermediate nodes. This adds permissions to block_job_add_bdrv() to provide the means to request permissions. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>	2017-02-28 20:40:37 +01:00
Kevin Wolf	b541155587	block: Add BdrvChildRole.get_parent_desc() For meaningful error messages in the permission system, we need to get some human-readable description of the parent of a BdrvChild. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>	2017-02-28 20:40:37 +01:00
Kevin Wolf	c6cc12bfa7	blockjob: Add permissions to block_job_create() This functions creates a BlockBackend internally, so the block jobs need to tell it what they want to do with the BB. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>	2017-02-28 20:40:37 +01:00
Kevin Wolf	39829a01ae	block: Allow error return in BlockDevOps.change_media_cb() Some devices allow a media change between read-only and read-write media. They need to adapt the permissions in their .change_media_cb() implementation, which can fail. So add an Error parameter to the function. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>	2017-02-28 20:40:36 +01:00
Kevin Wolf	c62d32f503	block: Request real permissions in blk_new_open() We can figure out the necessary permissions from the flags that the caller passed. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>	2017-02-28 20:40:36 +01:00
Kevin Wolf	55880601d8	block: Add BDRV_O_RESIZE for blk_new_open() blk_new_open() is a convenience function that processes flags rather than QDict options as a simple way to just open an image file. In order to keep it convenient in the future, it must automatically request the necessary permissions. This can easily be inferred from the flags for read and write, but we need another flag that tells us whether to get the resize permission. We can't just always request it because that means that no block jobs can run on the resulting BlockBackend (which is something that e.g. qemu-img commit wants to do), but we also can't request it never because most of the .bdrv_create() implementations call blk_truncate(). The solution is to introduce another flag that is passed by all users that want to resize the image. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2017-02-28 20:40:36 +01:00
Kevin Wolf	d7086422b1	block: Add error parameter to blk_insert_bs() Now that blk_insert_bs() requests the BlockBackend permissions for the node it attaches to, it can fail. Instead of aborting, pass the errors to the callers. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>	2017-02-28 20:40:36 +01:00
Kevin Wolf	6d0eb64d5c	block: Add permissions to blk_new() We want every user to be specific about the permissions it needs, so we'll pass the initial permissions as parameters to blk_new(). A user only needs to call blk_set_perm() if it wants to change the permissions after the fact. The permissions are stored in the BlockBackend and applied whenever a BlockDriverState should be attached in blk_insert_bs(). This does not include actually choosing the right set of permissions everywhere yet. Instead, the usual FIXME comment is added to each place and will be addressed in individual patches. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>	2017-02-28 20:40:36 +01:00
Kevin Wolf	981776b348	block: Add permissions to BlockBackend The BlockBackend can now store the permissions that its user requires. This is necessary because nodes can be ejected from or inserted into a BlockBackend and all of these operations must make sure that the user still gets what it requested initially. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>	2017-02-28 20:40:36 +01:00
Kevin Wolf	91ef38257a	vvfat: Implement .bdrv_child_perm() vvfat is the last remaining driver that can have children, but doesn't implement .bdrv_child_perm() yet. The default handlers aren't suitable here, so let's implement a very simple driver-specific one that protects the internal child from being used by other users as good as our permissions permit. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>	2017-02-28 20:40:36 +01:00
Kevin Wolf	862f215fab	block: Request child permissions in format drivers This makes use of the .bdrv_child_perm() implementation for formats that we just added. All format drivers expose the permissions they actually need nows, so that they can be set accordingly and updated when parents are attached or detached. The only format not included here is raw, which was already converted with the other filter drivers. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>	2017-02-28 20:40:36 +01:00
Kevin Wolf	d7010dfb68	block: Request child permissions in filter drivers All callers will have to request permissions for all of their child nodes. Block drivers that act as simply filters can use the default implementation of .bdrv_child_perm(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>	2017-02-28 20:40:36 +01:00
Kevin Wolf	d5e6f437c5	block: Let callers request permissions when attaching a child node When attaching a node as a child to a new parent, the required and shared permissions for this parent are checked against all other parents of the node now, and an error is returned if there is a conflict. This allows error returns to a function that previously always succeeded, and the same is true for quite a few callers and their callers. Converting all of them within the same patch would be too much, so for now everyone tells that they don't need any permissions and allow everyone else to do anything. This way we can use &error_abort initially and convert caller by caller to pass actual permission requirements and implement error handling. All these places are marked with FIXME comments and it will be the job of the next patches to clean them up again. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>	2017-02-28 20:40:36 +01:00
Kevin Wolf	8b2ff5291f	block: Add Error argument to bdrv_attach_child() It will have to return an error soon, so prepare the callers for it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>	2017-02-28 20:40:35 +01:00
Jeff Cody	c7cacb3e7a	block/rbd: parse all options via bdrv_parse_filename Get rid of qemu_rbd_parsename in favor of bdrv_parse_filename. This simplifies a lot of the parsing as well, as we can treat everything a bit simpler since nonexistent options are simply NULL pointers instead of empty strings. An important item to note: Ceph has many extra option values that can be specified as key/value pairs. This was handled previously in the driver by extracting the values that the QEMU driver cared about, and then blindly passing all extra options to rbd after splitting them into key/value pairs, and cleaning up any special character escaping. The practice is continued in this patch; there is an option "keyvalue-pairs" that is populated with all the key/value pairs that the QEMU driver does not care about. These key/value pairs will override any settings in the 'conf' configuration file, just as they did before. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-02-28 11:32:31 -05:00
Jeff Cody	0f9d252de4	block/rbd: add all the currently supported runtime_opts This adds all the currently supported runtime opts, which are the options as parsed from the filename. All of these options are explicitly checked for during during runtime, with an exception to the "keyvalue-pairs" option. This option contains all the key/value pairs that the QEMU rbd driver merely unescapes, and passes along blindly to rados. This option is a "legacy" option, and will not be exposed in the QAPI or available for introspection. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-02-28 11:32:25 -05:00
Jeff Cody	7830f90998	block/rbd: don't copy strings in qemu_rbd_next_tok() This patch is prep work for parsing options for .bdrv_parse_filename, and using QDict options. The function qemu_rbd_next_tok() searched for various key/value pairs, and copied them into buffers. This will soon be an unnecessary extra step, so we will now return found strings by reference only, and offload the responsibility for safely handling/coping these strings to the caller. This also cleans up error handling some, as the callers now rely on the Error object to determine if there is a parse error. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-02-28 11:32:06 -05:00
Peter Maydell	c8c0a1a784	-----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJYtP3aAAoJEL2+eyfA3jBXBBIQAJ+Cuv7NTZYbRXTe/soZ5Lz5 rO6fNsfHj7b/xx5Sti+vSMvq7EpVpQ51qcf0XrVNh+BGCElW43st2567tA5PGil/ brqWgVCP4MygFVGm9Fbk6txVwnotvnwkw417TVKcaeYeq8hfTo7CyjQK8+hg+C0A 0B//4ARg8mM0YYg9NL4ihjNP/3E6Js9oAneuADv42M7SwB89wiIwh3tHC0pXDJbt IuJCqdR7/yaJyK8ccbDuUkY/qMEDK5VUQDVloPvvWrCcZKVWt/kFh7+5bTIzCu6D 4s4k1j2L5kV00JnnzH6FNAWp3VY2CFdHDGrFV6rFiOiSo5tcGv1K4uINiq7KeEPL kIefFhlTgIKbSAgqk1fDSh4y96Eo3Vez0bJ5KnPvMx4z5EmN5HjiCW0qibmbCM5z uhTKZbZAI90KUAPjkGgJf8URkJjAGxdJQD7W3d7jx4xn010bKOQYjXmuX3qzoYiP vENak0sG9wN5QU2KIDymUd+xE4BgBo2cq+7CpOu3CyO+9O8ESm820vLZA5cdtr7N 5iqeDsBMJjL3LX8o3cuqw+DLKF3vRqZdw4PhuZSaM8ccqCuGSyyywvtoZ4hOYkwS ZGa1nABmiFaMmxl3FHj1/0MF4X4yAwj2Ae5ublHoIlGpiNpZartlVu+D/sGtQ3SC +MOqFgUhc2oRRKqflsn4 =tgky -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/cody/tags/block-pull-request' into staging # gpg: Signature made Tue 28 Feb 2017 04:34:34 GMT # gpg: using RSA key 0xBDBE7B27C0DE3057 # gpg: Good signature from "Jeffrey Cody <jcody@redhat.com>" # gpg: aka "Jeffrey Cody <jeff@codyprime.org>" # gpg: aka "Jeffrey Cody <codyprime@gmail.com>" # Primary key fingerprint: 9957 4B4D 3474 90E7 9D98 D624 BDBE 7B27 C0DE 3057 * remotes/cody/tags/block-pull-request: iscsi: add missing colons to the qapi docs block/mirror: fix broken sparseness detection Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-02-28 13:41:03 +00:00
John Snow	39c11580f3	block/mirror: fix broken sparseness detection int64_t is in all likelihood the actual scalar type we want. Yep, really. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1219541 Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-02-27 14:02:31 -05:00
Paolo Bonzini	d045c466d9	iscsi: do not use aio_context_acquire/release Now that all bottom halves and callbacks take care of taking the AioContext lock, we can migrate some users away from it and to a specific QemuMutex or CoMutex. Protect libiscsi calls with a QemuMutex. Callbacks are invoked using bottom halves, so we don't even have to drop it around callback invocations. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20170222180725.28611-4-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-02-27 13:58:58 +00:00
Paolo Bonzini	37d1e4d9bf	nfs: do not use aio_context_acquire/release Now that all bottom halves and callbacks take care of taking the AioContext lock, we can migrate some users away from it and to a specific QemuMutex or CoMutex. Protect libnfs calls with a QemuMutex. Callbacks are invoked using bottom halves, so we don't even have to drop it around callback invocations. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20170222180725.28611-3-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-02-27 13:58:53 +00:00
Paolo Bonzini	ba3186c4e4	curl: do not use aio_context_acquire/release Now that all bottom halves and callbacks take care of taking the AioContext lock, we can migrate some users away from it and to a specific QemuMutex or CoMutex. Protect BDRVCURLState access with a QemuMutex. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20170222180725.28611-2-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-02-27 13:33:24 +00:00
Peter Maydell	6b4e463ff3	Block layer patches -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJYsHaaAAoJEH8JsnLIjy/WvC4P/iw/WC6XySNzPMOzICrJr9fc IHz447+0MqoIBWqGRFLEU8z4t6k8HGt2YnWKFuS1N+2gU5fSgOdp20nAA+pQvxRb RTyQL7BNJPvFNzrEQPTpdvBt79veYsoNe5dNfSq01z/PdgxMQR4PkS96Jm8w4Jdu 20ZPfULws8+Wq1Snsxl4PdnFpY2n97OOGQRCjl50h5ypol5+fXDDzAvp/GPf6Q8B 09OTWmQ4UIr4OZqT83T1kDdRZRRMIxiRP5qTTKdh0BlJEQdBjrJ9fTClY4bMcIgl VP/3kzkmdoqSW+4D+0L88q7vyvM3Kwc6n2PFDaRf6Lgy9ueYoPKuW9AF5vzcxhED AI0SN9boM+4z1P75kalBaHg/BiRL7UDwpHgdanPVcENoP8IywsKzOIvdjOpqINmX uLU3QfK+qK6x7gflhVXiX2sFzTLzZQlWu5KPjW6QfuMzUuRAeksfMDXh2GXNFVG1 igGnORyqB2nOGNAsudtMrOvjHQRSbwGsnvWcwvaX0swrxOb3oAwc3X+E4nxE/K5/ wJQLrCM9DfD3CKKlKiu+1wQXx6zqaGYIsNxLcrSylJ3TPsGuxfkx3b9GVCCQ0Y+e YSdwWSAIS9LxR4qISJ5ehlnYDA0sRtdco0xGW6ACNE5IrgREYUcH4EJOXbxZ4NHs BK/DVdYhFtNZ60zHeEn4 =OYgF -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches # gpg: Signature made Fri 24 Feb 2017 18:08:26 GMT # gpg: using RSA key 0x7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: tests: Use opened block node for block job tests vvfat: Use opened node as backing file block: Add bdrv_new_open_driver() block: Factor out bdrv_open_driver() block: Use BlockBackend for image probing block: Factor out bdrv_open_child_bs() block: Attach bs->file only during .bdrv_open() block: Pass BdrvChild to bdrv_truncate() mirror: Resize active commit base in mirror_run() qcow2: Use BB for resizing in qcow2_amend_options() blockdev: Use BlockBackend to resize in qmp_block_resize() iotests: Fix another race in 030 qemu-img: Improve documentation for PREALLOC_MODE_FALLOC qemu-img: Truncate before full preallocation qemu-img: Add tests for raw image preallocation qemu-img: Do not truncate before preallocation qemu-iotests: redirect nbd server stdout to /dev/null qemu-iotests: add ability to exclude certain protocols from tests qemu-iotests: Test 137 only supports 'file' protocol Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-02-26 12:26:37 +00:00
tianqing	1d393bdeae	RBD: Add support readv,writev for rbd Rbd can do readv and writev directly, so wo do not need to transform iov to buf or vice versa any more. Signed-off-by: tianqing <tianqing@unitedstack.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-02-24 12:43:01 -05:00
Peter Lieven	ef503a8417	block/nfs: try to avoid the bounce buffer in pwritev if the passed qiov contains exactly one iov we can pass the buffer directly. Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 1487349541-10201-3-git-send-email-pl@kamp.de Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-02-24 12:38:35 -05:00
Peter Lieven	69785a229d	block/nfs: convert to preadv / pwritev Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 1487349541-10201-2-git-send-email-pl@kamp.de Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-02-24 12:38:35 -05:00
Kevin Wolf	a8a4d15c1c	vvfat: Use opened node as backing file We should not try to assign a not yet opened node as the backing file, because as soon as the permission system is added it will fail. The just added bdrv_new_open_driver() function is the right tool to open a file with an internal driver, use it. In case anyone wonders whether that magic fake backing file to trigger a special action on 'commit' actually works today: No, not for me. One reason is that we've been adding a raw format driver on top for several years now and raw doesn't support commit. Other reasons include that the backing file isn't writable and the driver doesn't support reopen, and it's also size 0 and the driver doesn't support bdrv_truncate. All of these are easily fixable, but then 'commit' ended up in an infinite loop deep in the vvfat code for me, so I thought I'd best leave it alone. I'm not really sure what it was supposed to do anyway. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2017-02-24 16:09:23 +01:00
Kevin Wolf	4e4bf5c42c	block: Attach bs->file only during .bdrv_open() The way that attaching bs->file worked was a bit unusual in that it was the only child that would be attached to a node which is not opened yet. Because of this, the block layer couldn't know yet which permissions the driver would eventually need. This patch moves the point where bs->file is attached to the beginning of the individual .bdrv_open() implementations, so drivers already know what they are going to do with the child. This is also more consistent with how driver-specific children work. For a moment, bdrv_open() gets its own BdrvChild to perform image probing, but instead of directly assigning this BdrvChild to the BDS, it becomes a temporary one and the node name is passed as an option to the drivers, so that they can simply use bdrv_open_child() to create another reference for their own use. This duplicated child for (the not opened yet) bs is not the final state, a follow-up patch will change the image probing code to use a BlockBackend, which is completely independent of bs. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2017-02-24 16:09:23 +01:00
Kevin Wolf	52cdbc5869	block: Pass BdrvChild to bdrv_truncate() Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2017-02-24 16:09:23 +01:00
Kevin Wolf	becc347e1c	mirror: Resize active commit base in mirror_run() This is more consistent with the commit block job, and it moves the code to a place where we already have the necessary BlockBackends to resize the base image when bdrv_truncate() is changed to require a BdrvChild. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2017-02-24 16:09:23 +01:00
Kevin Wolf	70b27f3643	qcow2: Use BB for resizing in qcow2_amend_options() In order to able to convert bdrv_truncate() to take a BdrvChild and later to correctly check the resize permission here, we need to use a BlockBackend for resizing the image. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2017-02-24 16:09:23 +01:00
Nir Soffer	c6ccc2c5e6	qemu-img: Improve documentation for PREALLOC_MODE_FALLOC Now that we are truncating the file in both PREALLOC_MODE_FULL and PREALLOC_MODE_OFF, not truncating in PREALLOC_MODE_FALLOC looks odd. Add a comment explaining why we do not truncate in this case. Signed-off-by: Nir Soffer <nirsof@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-02-24 16:09:23 +01:00
Nir Soffer	5a1dad9d5a	qemu-img: Truncate before full preallocation In a previous commit (qemu-img: Do not truncate before preallocation) we moved truncate to the PREALLOC_MODE_OFF branch to avoid slowdown in posix_fallocate(). However this change is not optimal when using PREALLOC_MODE_FULL, since knowing the final size from the beginning could allow the file system driver to do less allocations and possibly avoid fragmentation of the file. Now we truncate also before doing full preallocation. Signed-off-by: Nir Soffer <nirsof@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-02-24 16:09:23 +01:00
Nir Soffer	f6a7240442	qemu-img: Do not truncate before preallocation When using file system that does not support fallocate() (e.g. NFS < 4.2), truncating the file only when preallocation=OFF speeds up creating raw file. Here is example run, tested on Fedora 24 machine, creating raw file on NFS version 3 server. $ time ./qemu-img-master create -f raw -o preallocation=falloc mnt/test 1g Formatting 'mnt/test', fmt=raw size=1073741824 preallocation=falloc real 0m21.185s user 0m0.022s sys 0m0.574s $ time ./qemu-img-fix create -f raw -o preallocation=falloc mnt/test 1g Formatting 'mnt/test', fmt=raw size=1073741824 preallocation=falloc real 0m11.601s user 0m0.016s sys 0m0.525s $ time dd if=/dev/zero of=mnt/test bs=1M count=1024 oflag=direct 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 15.6627 s, 68.6 MB/s real 0m16.104s user 0m0.009s sys 0m0.220s Running with strace we can see that without this change we do one pread() and one pwrite() for each block. With this change, we do only one pwrite() per block. $ strace ./qemu-img-master create -f raw -o preallocation=falloc mnt/test 8192 ... pread64(9, "\0", 1, 4095) = 1 pwrite64(9, "\0", 1, 4095) = 1 pread64(9, "\0", 1, 8191) = 1 pwrite64(9, "\0", 1, 8191) = 1 $ strace ./qemu-img-fix create -f raw -o preallocation=falloc mnt/test 8192 ... pwrite64(9, "\0", 1, 4095) = 1 pwrite64(9, "\0", 1, 8191) = 1 This happens because posix_fallocate is checking if each block is allocated before writing a byte to the block, and when truncating the file before preallocation, all blocks are unallocated. Signed-off-by: Nir Soffer <nirsof@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-02-24 16:09:22 +01:00
Markus Armbruster	7c81e4e9db	block: Don't bother asserting type of output visitor's output After a visit of a complex QAPI type FOO ov = qobject_output_visitor_new(&foo); visit_type_FOO(ov, NULL, expr, &error_abort); visit_complete(ov, &foo); we can safely assume qobject_type(foo) is QTYPE_QDICT. We do in many places, but occasionally assert qobject_type(obj) == QTYPE_QDICT. Don't. The appropriate place to check such fundamental properties of QAPI visitors is the test suite. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <1487363905-9480-15-git-send-email-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-02-22 19:52:20 +01:00
Anton Nefedov	90ab48eb07	mirror: do not increase offset during initial zero_or_discard phase If explicit zeroing out before mirroring is required for the target image, it moves the block job offset counter to EOF, then offset and len counters count the image size twice. There is no harm but stats are confusing, specifically the progress of the operation is always reported as 99% by management tools. The patch skips offset increase for the first "technical" pass over the image. This should not cause any further harm. Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1486045515-8009-1-git-send-email-den@openvz.org CC: Jeff Cody <jcody@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> CC: Eric Blake <eblake@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-02-21 10:38:00 -05:00
Kevin Wolf	31eb1202d3	iscsi: Add blockdev-add support This adds blockdev-add support for iscsi devices. Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-02-21 10:37:34 -05:00
Kevin Wolf	1d56010482	iscsi: Add timeout option This was previously only available with -iscsi. Again, after this patch, the -iscsi option only takes effect if an URL is given. New users are supposed to use the new driver-specific option. All -iscsi options have a corresponding driver-specific option for the iscsi block driver now. Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-02-21 10:37:26 -05:00
Kevin Wolf	81aa2a0fb5	iscsi: Add header-digest option This was previously only available with -iscsi. Again, after this patch, the -iscsi option only takes effect if an URL is given. New users are supposed to use the new driver-specific option. Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-02-21 10:37:16 -05:00
Kevin Wolf	d4e799292c	iscsi: Add initiator-name option This was previously only available with -iscsi. Again, after this patch, the -iscsi option only takes effect if an URL is given. New users are supposed to use the new driver-specific option. Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-02-21 10:37:08 -05:00
Kevin Wolf	4317142020	iscsi: Handle -iscsi user/password in bdrv_parse_filename() This splits the logic in the old parse_chap() function into a part that parses the -iscsi options into the new driver-specific options, and another part that actually applies those options (called apply_chap() now). Note that this means that username and password specified with -iscsi only take effect when a URL is provided. This is intentional, -iscsi is a legacy interface only supported for compatibility, new users should use the proper driver-specific options. Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-02-21 10:36:57 -05:00
Kevin Wolf	d5895fcb1d	iscsi: Split URL into individual options This introduces a .bdrv_parse_filename handler for iscsi which parses an URL if given and translates it to individual options. Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-02-21 10:36:34 -05:00
Paolo Bonzini	1ace7ceac5	coroutine-lock: add mutex argument to CoQueue APIs All that CoQueue needs in order to become thread-safe is help from an external mutex. Add this to the API. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20170213181244.16297-6-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-02-21 11:39:40 +00:00
Paolo Bonzini	b9e413dd37	block: explicitly acquire aiocontext in aio callbacks that need it Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-16-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-02-21 11:39:39 +00:00
Paolo Bonzini	1919631e6b	block: explicitly acquire aiocontext in bottom halves that need it Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-15-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-02-21 11:39:39 +00:00
Paolo Bonzini	9d45665448	block: explicitly acquire aiocontext in callbacks that need it This covers both file descriptor callbacks and polling callbacks, since they execute related code. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-14-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-02-21 11:39:36 +00:00
Paolo Bonzini	2f47da5f7f	block: explicitly acquire aiocontext in timers that need it Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-13-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-02-21 11:14:08 +00:00
Paolo Bonzini	b20123a28b	qed: introduce qed_aio_start_io and qed_aio_next_io_cb qed_aio_start_io and qed_aio_next_io will not have to acquire/release the AioContext, while qed_aio_next_io_cb will. Split the functionality and gain a little type-safety in the process. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-11-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-02-21 11:14:08 +00:00
Paolo Bonzini	e5c67ab552	blkdebug: reschedule coroutine on the AioContext it is running on Keep the coroutine on the same AioContext. Without this change, there would be a race between yielding the coroutine and reentering it. While the race cannot happen now, because the code only runs from a single AioContext, this will change with multiqueue support in the block layer. While doing the change, replace custom bottom half with aio_co_schedule. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-10-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-02-21 11:14:08 +00:00
Paolo Bonzini	ff82911cd3	nbd: convert to use qio_channel_yield In the client, read the reply headers from a coroutine, switching the read side between the "read header" coroutine and the I/O coroutine that reads the body of the reply. In the server, if the server can read more requests it will create a new "read request" coroutine as soon as a request has been read. Otherwise, the new coroutine is created in nbd_request_put. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-8-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-02-21 11:14:08 +00:00
Paolo Bonzini	35f106e684	block-backend: allow blk_prw from coroutine context qcow2_create2 calls this. Do not run a nested event loop, as that breaks when aio_co_wake tries to queue the coroutine on the co_queue_wakeup list of the currently running one. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20170213135235.12274-4-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-02-21 11:14:07 +00:00
Paolo Bonzini	c2b38b277a	block: move AioContext, QEMUTimer, main-loop to libqemuutil AioContext is fairly self contained, the only dependency is QEMUTimer but that in turn doesn't need anything else. So move them out of block-obj-y to avoid introducing a dependency from io/ to block-obj-y. main-loop and its dependency iohandler also need to be moved, because later in this series io/ will call iohandler_get_aio_context. [Changed copyright "the QEMU team" to "other QEMU contributors" as suggested by Daniel Berrange and agreed by Paolo. --Stefan] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20170213135235.12274-2-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-02-21 11:14:07 +00:00
Alberto Garcia	7061a07898	qcow2: Optimize the refcount-block overlap check The metadata overlap checks introduced in `a40f1c2add` help detect corruption in the qcow2 image by verifying that data writes don't overlap with existing metadata sections. The 'refcount-block' check in particular iterates over the refcount table in order to get the addresses of all refcount blocks and check that none of them overlap with the region where we want to write. The problem with the refcount table is that since it always occupies complete clusters its size is usually very big. With the default values of cluster_size=64KB and refcount_bits=16 this table holds 8192 entries, each one of them enough to map 2GB worth of host clusters. So unless we're using images with several TB of allocated data this table is going to be mostly empty, and iterating over it is a waste of CPU. If the storage backend is fast enough this can have an effect on I/O performance. This patch keeps the index of the last used (i.e. non-zero) entry in the refcount table and updates it every time the table changes. The refcount-block overlap check then uses that index instead of reading the whole table. In my tests with a 4GB qcow2 file stored in RAM this doubles the amount of write IOPS. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-id: 20170201123828.4815-1-berto@igalia.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-02-12 00:47:43 +01:00
Peter Lieven	f67409a5bb	block/nfs: fix naming of runtime opts commit `94d6a7a` accidentally left the naming of runtime opts and QAPI scheme inconsistent. As one consequence passing of parameters in the URI is broken. Sync the naming of the runtime opts to the QAPI scheme. Please note that this is technically backwards incompatible with the 2.8 release, but the 2.8 release is the only version that had the wrong naming. Furthermore release 2.8 suffered from a NULL pointer dereference during URI parsing. Fixes: `94d6a7a76e` Cc: qemu-stable@nongnu.org Signed-off-by: Peter Lieven <pl@kamp.de> Message-id: 1485942829-10756-3-git-send-email-pl@kamp.de [mreitz: Fixed commit message] Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-02-12 00:47:42 +01:00
Peter Lieven	8d20abe87a	block/nfs: fix NULL pointer dereference in URI parsing parse_uint_full wants to put the parsed value into the variable passed via its second argument which is NULL. Fixes: `94d6a7a76e` Cc: qemu-stable@nongnu.org Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1485942829-10756-2-git-send-email-pl@kamp.de Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-02-12 00:47:42 +01:00
Dou Liyang	a6baa60807	block/qapi: reduce the execution time of qmp_query_blockstats In order to reduce the execution time, this patch optimize the qmp_query_blockstats(): Remove the next_query_bds function. Remove the bdrv_query_stats function. Remove some judgement sentence. The original qmp_query_blockstats calls next_query_bds to get the next objects in each loops. In the next_query_bds, it checks the query_nodes and blk. It also call bdrv_query_stats to get the stats, In the bdrv_query_stats, it checks blk and bs each times. This waste more times, which may stall the main loop a bit. And if the disk is too many and donot use the dataplane feature, this may affect the performance in main loop thread. This patch removes that two functions, and makes the structure clearly. Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Message-id: 1484467275-27919-3-git-send-email-douly.fnst@cn.fujitsu.com Reviewed-by: Markus Armbruster <armbru@redhat.com> [mreitz: Removed duplicate info->value assignment] Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-02-12 00:47:42 +01:00
Dou Liyang	20a6d768f5	block/qapi: reduce the coupling between the bdrv_query_stats and bdrv_query_bds_stats The bdrv_query_stats and bdrv_query_bds_stats functions need to call each other, that increases the coupling. it also makes the program complicated and makes some unnecessary tests. Remove the call from bdrv_query_bds_stats to bdrv_query_stats, just take some recursion to make it clearly. Avoid testing whether the blk is NULL during querying the bds stats. It is unnecessary. Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Message-id: 1484467275-27919-2-git-send-email-douly.fnst@cn.fujitsu.com Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-02-12 00:47:42 +01:00
QingFeng Hao	4545d4f4af	block/vmdk: Fix the endian problem of buf_len and lba The problem was triggered by qemu-iotests case 055. It failed when it was comparing the compressed vmdk image with original test.img. The cause is that buf_len in vmdk_write_extent wasn't converted to little-endian before it was stored to disk. But later vmdk_read_extent read it and converted it from little-endian to cpu endian. If the cpu is big-endian like s390, the problem will happen and the data length read by vmdk_read_extent will become invalid! The fix is to add the conversion in vmdk_write_extent, meanwhile, repair the endianness problem of lba field which shall also be converted to little-endian before storing to disk. Cc: qemu-stable@nongnu.org Signed-off-by: QingFeng Hao <haoqf@linux.vnet.ibm.com> Signed-off-by: Jing Liu <liujbjl@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20161216052040.53067-2-haoqf@linux.vnet.ibm.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-02-12 00:47:42 +01:00
Fam Zheng	9adceb0213	qapi: Tweak error message of bdrv_query_image_info @bs doesn't always have a device name, such as when it comes from "qemu-img info". Report file name instead. Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 20170119130759.28319-2-famz@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-02-12 00:47:41 +01:00
Peter Maydell	4e9f5244e1	-----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJYkeZAAAoJEJykq7OBq3PI6oUH/3qlRvQrWmhWLR+XCtwU0gON HRApL57Of+B1YbqJzb8wzjLMLfzZQYLoT7kf3FDRON751Iwpv2Qyl6j79kbmOQwy txvtgUTtPZrOZ9HMk6M1VboiKrkM1t0I1QiRYy/af2f1gD3KTqIt8YN1ic3xatKD Fgmx+oD+6EkrNilthemvDyaXtGsdTl4GC9ZbGcJB2VJzzWkksRUfeZWysIu9p2zP l6viegW/1+o5wYgBt6DxMalfNGbEiuBgXgx6PVFPbkw0xNURC52qDHhQ91xTSWt1 pvFrIhYWR/ETN0twJh+jtmCjkawKWSsx2nrLlrSh4H0EpwFoRfFqH/ZrOFSg0wg= =QnCX -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/stefanha/tags/tracing-pull-request' into staging # gpg: Signature made Wed 01 Feb 2017 13:44:32 GMT # gpg: using RSA key 0x9CA4ABB381AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" # Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8 * remotes/stefanha/tags/tracing-pull-request: trace: clean up trace-events files qapi: add missing trace_visit_type_enum() call trace: improve error reporting when parsing simpletrace header trace: update docs to reflect new code generation approach trace: switch to modular code generation for sub-directories trace: move setting of group name into Makefiles trace: move hw/i386/xen events to correct subdir trace: move hw/xen events to correct subdir trace: move hw/block/dataplane events to correct subdir make: move top level dir to end of include search path # Conflicts: # Makefile Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-02-02 16:08:28 +00:00
Paolo Bonzini	acf6e5f096	sheepdog: reorganize check for overlapping requests Wrap the code that was copied repeatedly in the two functions, sd_aio_setup and sd_aio_complete. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20161129113245.32724-6-pbonzini@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-02-01 00:17:20 -05:00
Paolo Bonzini	c4080e9391	sheepdog: simplify inflight_aio_head management Add to the list in add_aio_request and, indirectly, resend_aioreq. Inline free_aio_req in the caller, it does not simply undo alloc_aio_req's job. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20161129113245.32724-5-pbonzini@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-02-01 00:17:20 -05:00
Paolo Bonzini	28ddd08cd6	sheepdog: do not use BlockAIOCB Sheepdog's AIOCB are completely internal entities for a group of requests and do not need dynamic allocation. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20161129113245.32724-4-pbonzini@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-02-01 00:17:20 -05:00
Paolo Bonzini	e80ab33dc0	sheepdog: reorganize coroutine flow Delimit co_recv's lifetime clearly in aio_read_response. Do a simple qemu_coroutine_enter in aio_read_response, letting sd_co_writev call sd_write_done. Handle nr_pending in the same way in sd_co_rw_vector, sd_write_done and sd_co_flush_to_disk. Remove sd_co_rw_vector's return value; just leave with no pending requests. [Jeff: added missing 'return' back, spotted by Paolo after series was applied.] Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-02-01 00:17:20 -05:00
Paolo Bonzini	a71264f9f5	sheepdog: remove unused cancellation support SheepdogAIOCB is internal to sheepdog.c, hence it is never canceled. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20161129113245.32724-2-pbonzini@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-02-01 00:17:20 -05:00
Stefan Hajnoczi	7f4076c1bb	trace: clean up trace-events files There are a number of unused trace events that scripts/cleanup-trace-events.pl finds. The "hw/vfio/pci-quirks.c" filename was typoed and "qapi/qapi-visit-core.c" was missing the qapi/ directory prefix. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20170126171613.1399-3-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-01-31 17:12:15 +00:00
Peter Lieven	2deb63c2da	block/iscsi: statically link qemu_iscsi_opts commit `f57b4b5f` moved qemu_iscsi_opts into vl.c. This made them invisible for qemu-img, qemu-nbd etc. Fixes: `f57b4b5fb1` Cc: qemu-stable@nongnu.org Signed-off-by: Peter Lieven <pl@kamp.de> Message-Id: <1485262161-18543-1-git-send-email-pl@kamp.de> [Drop useless #ifdef. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-01-27 18:07:58 +01:00
Eric Farman	c4c41a0a65	block: get max_transfer limit for char (scsi-generic) devices We can get the maximum number of bytes for a single I/O transfer from the BLKSECTGET ioctl, but we only perform this for block devices. scsi-generic devices are represented as character devices, and so do not issue this today. Update this, so that virtio-scsi devices using the scsi-generic interface can return the same data. Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com> Message-Id: <20170120162527.66075-4-farman@linux.vnet.ibm.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-01-27 18:07:31 +01:00
Eric Farman	482652502e	block: Fix target variable of BLKSECTGET ioctl Commit `6f6071745b` ("raw-posix: Fetch max sectors for host block device") introduced a routine to call the kernel BLKSECTGET ioctl, which stores the result back to user space. However, the size of the data returned depends on the routine handling the ioctl. The (compat_)blkdev_ioctl returns a short, while sg_ioctl returns an int. Thus, on big-endian systems, we can find ourselves accidentally shifting the result to a much larger value. (On s390x, a short is 16 bits while an int is 32 bits.) Also, the two ioctl handlers return values in different scales (block returns sectors, while sg returns bytes), so some tweaking of the outputs is required such that hdev_get_max_transfer_length returns a value in a consistent set of units. Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com> Message-Id: <20170120162527.66075-3-farman@linux.vnet.ibm.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-01-27 18:07:31 +01:00
Peter Lieven	1da45e0c4c	block/iscsi: avoid data corruption with cache=writeback nb_cls_shrunk in iscsi_allocmap_update can become -1 if the request starts and ends within the same cluster. This results in passing -1 to bitmap_set and bitmap_clear and they don't handle negative values properly. In the end this leads to data corruption. Fixes: `e1123a3b40` Cc: qemu-stable@nongnu.org Signed-off-by: Peter Lieven <pl@kamp.de> Message-Id: <1484579832-18589-1-git-send-email-pl@kamp.de> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-01-27 18:07:31 +01:00
Ashijeet Acharya	fe44dc9180	migration: disallow migrate_add_blocker during migration If a migration is already in progress and somebody attempts to add a migration blocker, this should rightly fail. Add an errp parameter and a retcode return value to migrate_add_blocker. Signed-off-by: John Snow <jsnow@redhat.com> Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com> Message-Id: <1484566314-3987-5-git-send-email-ashijeetacharya@gmail.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Acked-by: Greg Kurz <groug@kaod.org> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Merged with recent 'Allow invtsc migration' change	2017-01-24 18:00:30 +00:00
Ashijeet Acharya	05551e58ee	block/vvfat: Remove the undesirable comment Remove the "// assert(is_consistent(s))" comment in block/vvfat.c Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com> Message-Id: <1484566314-3987-2-git-send-email-ashijeetacharya@gmail.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-01-24 17:54:47 +00:00
Paolo Bonzini	8f90b5e91d	block: get rid of bdrv_io_unplugged_begin/end bdrv_io_plug and bdrv_io_unplug are only called (via their BlockBackend equivalents) after starting asynchronous I/O. bdrv_drain is not going to be called while they are running, because---even if a coroutine runs for some reason---it will only drain in the next iteration of the event loop through bdrv_co_yield_to_drain. So this mechanism is unnecessary, get rid of it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20161129113334.605-1-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-01-16 13:25:17 +00:00
Eric Blake	c1bb86cd8a	block: Rename raw-{posix,win32} to file-*.c These files deal with the file protocol, not the raw format (the file protocol is often used with other formats, and the raw format is not forced to use the file protocol). Rename things to make it a bit easier to follow. Suggested-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-01-09 13:30:53 +01:00
Eric Blake	2e6fc7eb1a	block: Rename raw_bsd to raw-format.c Given that we have raw-win32.c and raw-posix.c, my initial guess at raw_bsd.c was that it was for dealing with raw files using code specific to the BSD operating system (beyond what raw-posix could do). Not so - this name was chosen back in commit `e1c66c6` to distinguish that it was a BSD licensed file, in contrast to the then-existing raw.c with an unclear and potentially unusable license. But since it has been more than three years since the rewrite, it's time to pick a more useful name for this file to avoid this type of confusion to future contributors that don't know the backstory, as none of our other files are named solely by the license they use. In reality, this file deals with the raw format, which is useful with any number of protocols, while raw-{win32,posix} deal with the file protocol (and in turn, that protocol is not limited to use with the raw format). So rename raw_bsd to raw-format.c. We could have also used the shorter name raw.c, except that collides with the earlier use of that filename for a different license, and it's better to be safe than risk license pollution. The next patch will also rename raw-win32.c and raw-posix.c to further distinguish the difference in roles. It doesn't hurt that this gets rid of an underscore in the filename, thereby making tab-completion on 'ra<TAB>' easier (now I don't have to type the shift key, which slows things down :) Suggested-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-01-09 13:30:52 +01:00
Kevin Wolf	44b6789299	blkverify: Implement bdrv_co_preadv/pwritev/flush This enables byte granularity requests for blkverify, and at the same time gets us rid of another user of the BDS-level AIO emulation. The reference output of a test case must be changed because the verification failure message reports byte offsets instead of sectors now. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-01-09 13:30:52 +01:00
Kevin Wolf	7c3a998531	blkdebug: Implement bdrv_co_preadv/pwritev/flush This enables byte granularity requests for blkdebug, and at the same time gets us rid of another user of the BDS-level AIO emulation. Note that unless align=512 is specified, this can behave subtly different from the old behaviour because bdrv_co_preadv/pwritev don't have to perform alignment adjustments any more. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-01-09 13:30:52 +01:00
Kevin Wolf	7c37f941d0	quorum: Clean up quorum_aio_get() Make sure that all fields of the new QuorumAIOCB are zeroed when the function returns even without explicitly setting them. This will protect us when new fields are added, removes some explicit zero assignment and makes the code a little nicer to read. Suggested-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com>	2017-01-09 13:30:52 +01:00
Kevin Wolf	a7e159025e	quorum: Inline quorum_fifo_aio_cb() Inlining the function removes some boilerplace code and replaces recursion by a simple loop, so the code becomes somewhat easier to understand. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-01-09 13:30:52 +01:00
Kevin Wolf	6847da3808	quorum: Implement .bdrv_co_preadv/pwritev() This enables byte granularity requests on quorum nodes. Note that the QMP events emitted by the driver are an external API that we were careless enough to define as sector based. The offset and length of requests reported in events are rounded therefore. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com>	2017-01-09 13:30:52 +01:00
Kevin Wolf	dee66e2882	quorum: Avoid bdrv_aio_writev() for rewrites Replacing it with bdrv_co_pwritev() prepares us for byte granularity requests and gets us rid of the last bdrv_aio_*() user in quorum. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-01-09 13:30:52 +01:00
Kevin Wolf	7cd9b3964e	quorum: Inline quorum_aio_cb() This is a conversion to a more natural coroutine style and improves the readability of the driver. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-01-09 13:30:52 +01:00
Kevin Wolf	0f31977d9d	quorum: Do cleanup in caller coroutine Instead of calling quorum_aio_finalize() deeply nested in what used to be an AIO callback, do it in the same functions that allocated the AIOCB. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-01-09 13:30:52 +01:00
Kevin Wolf	ce15dc08ef	quorum: Implement .bdrv_co_readv/writev This converts the quorum block driver from implementing callback-based interfaces for read/write to coroutine-based ones. This is the first step that will allow us further simplification of the code. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com>	2017-01-09 13:30:52 +01:00
Kevin Wolf	10c8551968	quorum: Remove s from quorum_aio_get() arguments There is no point in passing the value of bs->opaque in order to overwrite it with itself. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com>	2017-01-09 13:30:52 +01:00
Stefan Hajnoczi	ee68697551	linux-aio: poll ring for completions The Linux AIO userspace ABI includes a ring that is shared with the kernel. This allows userspace programs to process completions without system calls. Add an AioContext poll handler to check for completions in the ring. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20161201192652.9509-6-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-01-03 16:38:48 +00:00
Stefan Hajnoczi	f6a51c84cd	aio: add AioPollFn and io_poll() interface The new AioPollFn io_poll() argument to aio_set_fd_handler() and aio_set_event_handler() is used in the next patch. Keep this code change separate due to the number of files it touches. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20161201192652.9509-3-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-01-03 16:38:48 +00:00
Stefan Hajnoczi	68701de136	Block layer patches for 2.8.0-rc3 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJYRs7XAAoJEH8JsnLIjy/W5ScP/0ukNPAUUYGnU5dIj6q20kRk kIhEDZXNRMqT9qhSypi5ivOc/HDIXBUM5jsuAPHTK/JxMWeAFCwOwKXzKLPEZ9iG nA5UGMST4uwwB0bANUAyweGIHdTIfhgN2dgLKvKIsbeTNmCyMaJieZ29fkVbNKyS msOFmaVn+nm4rgE/q7HXtg/hUdjuoaEOsWsJ7YY3Bj8kgQR/H8iPCvNCl3YWGwIW 9vXQL25QUaQbBWinA+rHhHiGPAK2GzitVry3fQGch5j4OqpXYt3IQTSqXZMQLHcT zUwcx99C16hG9R7sjhNuto+lMuw6qtK75s/7PGpjw8aQFwYR5ITAyB369dxmrGqc 1bBPsRCXfhWku/4wMzrj4fO7iszMadBIzChwk+IsCRNAWHFGoc9VHvk3mdT4puBB 2W4JlOzGY/FD/rQRetGGmGN09HheRZ5sW7o9DoUoGBLCk1llIrVs/fKtHLDtx1T9 sNe5e7EYdNufBTpy7p/75nRMyYlVlENJW/A+nw2pvsEZjU9LAdWqEEfmU1OU1rdl TEZxBZQtPq0ZkfQIV6mPUFBhE3e8EbB573jJaD2P8StpR6jCFRZlU2hkW4c7FnZ1 pUfc8PJFP4Bo6CaNy7PMUCUHD7z8eZP/BFndODAgBFm2WAxN2tsRrfPDWh23huZV k9+VZFjeT2jz+mHtRCbA =9y67 -----END PGP SIGNATURE----- Merge remote-tracking branch 'kwolf/tags/for-upstream' into staging Block layer patches for 2.8.0-rc3 # gpg: Signature made Tue 06 Dec 2016 02:44:39 PM GMT # gpg: using RSA key 0x7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * kwolf/tags/for-upstream: qcow2: Don't strand clusters near 2G intervals during commit Message-id: 1481037418-10239-1-git-send-email-kwolf@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-12-06 17:35:29 +00:00
Eric Blake	a3e1505dae	qcow2: Don't strand clusters near 2G intervals during commit The qcow2_make_empty() function is reached during 'qemu-img commit', in order to clear out ALL clusters of an image. However, if the image cannot use the fast code path (true if the image is format 0.10, or if the image contains a snapshot), the cluster size is larger than 512, and the image is larger than 2G in size, then our choice of sector_step causes problems. Since it is not cluster aligned, but qcow2_discard_clusters() silently ignores an unaligned head or tail, we are leaving clusters allocated. Enhance the testsuite to expose the flaw, and patch the problem by ensuring our step size is aligned. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-12-06 15:37:02 +01:00
Prasanna Kumar Kalever	7103d9165b	block/nfs: fix QMP to match debug option The QMP definition of BlockdevOptionsNfs: { 'struct': 'BlockdevOptionsNfs', 'data': { 'server': 'NFSServer', 'path': 'str', 'user': 'int', 'group': 'int', 'tcp-syn-count': 'int', 'readahead-size': 'int', 'page-cache-size': 'int', 'debug-level': 'int' } } To make this consistent with other block protocols like gluster, lets change s/debug-level/debug/ Suggested-by: Eric Blake <eblake@redhat.com> Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-12-05 16:30:21 -05:00
Prasanna Kumar Kalever	1a417e46ae	block/gluster: fix QMP to match debug option The QMP definition of BlockdevOptionsGluster: { 'struct': 'BlockdevOptionsGluster', 'data': { 'volume': 'str', 'path': 'str', 'server': ['GlusterServer'], 'debug-level': 'int', 'logfile': 'str' } } But instead of 'debug-level we have exported 'debug' as the option for choosing debug level of gluster protocol driver. This patch fix QMP definition BlockdevOptionsGluster s/debug-level/debug/ Suggested-by: Eric Blake <eblake@redhat.com> Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-12-05 16:30:15 -05:00
Stefan Hajnoczi	f05234df63	Block layer patches for 2.8.0-rc2 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJYPZu6AAoJEH8JsnLIjy/W1IAP/AwV0sWafsSMnWiz/4NVqeh3 Yk2cBtxCBmnq1y+PilLoZBdHui/RumwVuZKaShs3JA1n5CB1AjsVtEVl/6rQM7lv yymLr32pODuf4eaGwGY09FqTiL0Erlm846zbSDjkiKbTYoKpzRv0PT2iiA6yTnjO Mrs5nG7kEWdXPZ0ZsJyEyU3+vs7rNg+4N/VfTdPmCrV5DVBvAeCawM6JXHQNc7LV ER6Y8W9PAu5mYqwekjAW07lPCudytAsOTrbTTO9Sv/+kZUdKEmv7ZHJrPdECCb6N vcPOYOzKsEvvR8E0YZtuJDK9W4RTakxdlTste+TtW3VSt1Cs0zpvCFytaGuC+Kmq mhlA4lYLDvaiNOMl09SvIjjxGI7+FO+1XsY7e4rI5PJzOKWZMFOIwQMNxE3B2qUI dxd6izf7fzF4V5uDDwHTJ8TAiJDSAe6Bkz+vzipQtu5NARl/isbQuIPIGXPkxZln fkCYA8/7EXrLXqd3khiRqEHS60ZtNgfm4ss8euMlWAgJAz0RLC1d/XhOIxaCQOg3 R/F9UdJAon6mfOgamZs5yzJgaPU6M90g/QipMB3Ub00VODacTiA81QUjZdEgELBB zvhgeja7qdIvOh9r9heCCuUTmkWRRppmkrKdqFLowZ2aWosISy/UjiPTGdjDdq7Z LsfYiRXsW94FmNXdKCqg =3WTz -----END PGP SIGNATURE----- Merge remote-tracking branch 'kwolf/tags/for-upstream' into staging Block layer patches for 2.8.0-rc2 # gpg: Signature made Tue 29 Nov 2016 03:16:10 PM GMT # gpg: using RSA key 0x7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * kwolf/tags/for-upstream: docs: Specify that cache-clean-interval is only supported in Linux qcow2: Remove stale comment qcow2: Allow 'cache-clean-interval' in Linux only qcow2: Make qcow2_cache_table_release() work only in Linux Message-id: 1480436227-2211-1-git-send-email-kwolf@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-11-29 17:06:39 +00:00
Alberto Garcia	a8b99dd516	qcow2: Remove stale comment We haven't been using CONFIG_MADVISE since `02d0e09503` Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-11-25 13:51:30 +01:00
Alberto Garcia	91203f08f0	qcow2: Allow 'cache-clean-interval' in Linux only The cache-clean-interval option of qcow2 only works on Linux. However we allow setting it in other systems regardless of whether it works or not. In those systems this option is not simply a no-op: it actually invalidates perfectly valid cache tables for no good reason without freeing their memory. This patch forbids using that option in non-Linux systems. Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-11-25 13:51:30 +01:00
Alberto Garcia	2f2c8d6b37	qcow2: Make qcow2_cache_table_release() work only in Linux We are using QEMU_MADV_DONTNEED to discard the memory of individual L2 cache tables. The problem with this is that those semantics are specific to the Linux madvise() system call. Other implementations of madvise() (including the very Linux implementation of posix_madvise()) don't do that, so we cannot use them for the same purpose. This patch makes the code Linux-specific and uses madvise() directly since there's no point in going through qemu_madvise() for this. Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-11-25 13:51:30 +01:00
Stefan Hajnoczi	f0c10c392f	Small fixes for rc1. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJYNMYwAAoJEL/70l94x66DBrUIAKeNK59lTbUm1WVl15nyB2qM jE2804Kcp+EGTwFHeo5GGsb+CplK54uMzHq2wzN6G3EmnaV3xbbdiZ7cmNl5Q6Tr qq7/pAer/T+xvQ3iDOTkAvJcqiMUZIx+MXrFED46KBUtqANJ2tAg2uEEqbI0RbOU +qtMZlPxo3IOuYnVROug1PPdNQDluBvZjrCYtb7VfZNo13u2UGYmRjZttobVfihF AQjv57uiawPs2e3VmUvIH8fjjEgV4MlPLiilL1eYsLaszjIBgdfrQOO7bdfetLo8 THkNJEZTpS9T9ChcbcTKS7yovI3OiIxPMwyftELClacX3wVtSie2WNx0sj/3Xpw= =DPxR -----END PGP SIGNATURE----- Merge remote-tracking branch 'bonzini/tags/for-upstream' into staging Small fixes for rc1. # gpg: Signature made Tue 22 Nov 2016 10:26:56 PM GMT # gpg: using RSA key 0xBFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * bonzini/tags/for-upstream: scsi/esp: do not raise an interrupt when reading the FIFO register nbd: Allow unmap and fua during write zeroes cpu_ldst.h: use correct guest address parameter Message-id: 1479853676-35995-1-git-send-email-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-11-23 11:44:29 +00:00
Eric Blake	169407e1f7	nbd: Allow unmap and fua during write zeroes Commit `fa778fff` wired up support to send the NBD_CMD_WRITE_ZEROES, but forgot to inform the block layer that FUA unmapping of zeroes is supported. Without BDRV_REQ_MAY_UNMAP listed as a supported flag, the block layer will always insist on the NBD layer passing NBD_CMD_FLAG_NO_HOLE, resulting in the server always allocating things even when it was desired to let the server punch holes. Similarly, failing to set BDRV_REQ_FUA means that the client may send unnecessary NBD_CMD_FLUSH when it could have instead used the NBD_CMD_FLAG_FUA bit. CC: qemu-stable@nongnu.org Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1479413642-22463-2-git-send-email-eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-11-22 23:26:51 +01:00
Eric Blake	3482b9bc41	block: Pass unaligned discard requests to drivers Discard is advisory, so rounding the requests to alignment boundaries is never semantically wrong from the data that the guest sees. But at least the Dell Equallogic iSCSI SANs has an interesting property that its advertised discard alignment is 15M, yet documents that discarding a sequence of 1M slices will eventually result in the 15M page being marked as discarded, and it is possible to observe which pages have been discarded. Between commits `9f1963b` and `b8d0a980`, we converted the block layer to a byte-based interface that ultimately ignores any unaligned head or tail based on the driver's advertised discard granularity, which means that qemu 2.7 refuses to pass any discard request smaller than 15M down to the Dell Equallogic hardware. This is a slight regression in behavior compared to earlier qemu, where a guest executing discards in power-of-2 chunks used to be able to get every page discarded, but is now left with various pages still allocated because the guest requests did not align with the hardware's 15M pages. Since the SCSI specification says nothing about a minimum discard granularity, and only documents the preferred alignment, it is best if the block layer gives the driver every bit of information about discard requests, rather than rounding it to alignment boundaries early. Rework the block layer discard algorithm to mirror the write zero algorithm: always peel off any unaligned head or tail and manage that in isolation, then do the bulk of the request on an aligned boundary. The fallback when the driver returns -ENOTSUP for an unaligned request is to silently ignore that portion of the discard request; but for devices that can pass the partial request all the way down to hardware, this can result in the hardware coalescing requests and discarding aligned pages after all. Reported by: Peter Lieven <pl@kamp.de> CC: qemu-stable@nongnu.org Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-11-22 15:59:23 +01:00
Eric Blake	49228d1e95	block: Return -ENOTSUP rather than assert on unaligned discards Right now, the block layer rounds discard requests, so that individual drivers are able to assert that discard requests will never be unaligned. But there are some ISCSI devices that track and coalesce multiple unaligned requests, turning it into an actual discard if the requests eventually cover an entire page, which implies that it is better to always pass discard requests as low down the stack as possible. In isolation, this patch has no semantic effect, since the block layer currently never passes an unaligned request through. But the block layer already has code that silently ignores drivers that return -ENOTSUP for a discard request that cannot be honored (as well as drivers that return 0 even when nothing was done). But the next patch will update the block layer to fragment discard requests, so that clients are guaranteed that they are either dealing with an unaligned head or tail, or an aligned core, making it similar to the block layer semantics of write zero fragmentation. CC: qemu-stable@nongnu.org Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-11-22 15:59:22 +01:00
Eric Blake	b2f95feec5	block: Let write zeroes fallback work even with small max_transfer Commit `443668ca` rewrote the write_zeroes logic to guarantee that an unaligned request never crosses a cluster boundary. But in the rewrite, the new code assumed that at most one iteration would be needed to get to an alignment boundary. However, it is easy to trigger an assertion failure: the Linux kernel limits loopback devices to advertise a max_transfer of only 64k. Any operation that requires falling back to writes rather than more efficient zeroing must obey max_transfer during that fallback, which means an unaligned head may require multiple iterations of the write fallbacks before reaching the aligned boundaries, when layering a format with clusters larger than 64k atop the protocol of file access to a loopback device. Test case: $ qemu-img create -f qcow2 -o cluster_size=1M file 10M $ losetup /dev/loop2 /path/to/file $ qemu-io -f qcow2 /dev/loop2 qemu-io> w 7m 1k qemu-io> w -z 8003584 2093056 In fairness to Denis (as the original listed author of the culprit commit), the faulty logic for at most one iteration is probably all my fault in reworking his idea. But the solution is to restore what was in place prior to that commit: when dealing with an unaligned head or tail, iterate as many times as necessary while fragmenting the operation at max_transfer boundaries. Reported-by: Ed Swierk <eswierk@skyportsystems.com> CC: qemu-stable@nongnu.org CC: Denis V. Lunev <den@openvz.org> Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-11-22 15:59:22 +01:00
Eric Blake	ecdbead659	qcow2: Inform block layer about discard boundaries At the qcow2 layer, discard is only possible on a per-cluster basis; at the moment, qcow2 silently rounds any unaligned requests to this granularity. However, an upcoming patch will fix a regression in the block layer ignoring too much of an unaligned discard request, by changing the block layer to break up a discard request at alignment boundaries; for that to work, the block layer must know about our limits. However, we can't go one step further by changing qcow2_discard_clusters() to assert that requests are always aligned, since that helper function is reached on paths outside of the block layer. CC: qemu-stable@nongnu.org Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-11-22 15:59:22 +01:00
Kevin Wolf	668c0e441d	gluster: Fix use after free in glfs_clear_preopened() This fixes a use-after-free bug introduced in commit `6349c154`. We need to use QLIST_FOREACH_SAFE() when freeing elements in the loop. Spotted by Coverity. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1479378608-11962-1-git-send-email-kwolf@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-11-21 17:04:43 -05:00
Paolo Bonzini	bdffb31d8e	mirror: do not flush every time the disks are synced This puts a huge strain on the disks when there are many concurrent migrations. With this patch we only flush twice: just before issuing the event, and just before pivoting to the destination. If management will complete the job close to the BLOCK_JOB_READY event, the cost of the second flush should be small anyway. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20161109162008.27287-2-pbonzini@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-11-14 22:49:26 -05:00
Max Reitz	4e504535c1	block/curl: Do not wait for data beyond EOF libcurl will only give us as much data as there is, not more. The block layer will deny requests beyond the end of file for us; but since this block driver is still using a sector-based interface, we can still get in trouble if the file size is not a multiple of 512. While we have already made sure not to attempt transfers beyond the end of the file, we are currently still trying to receive data from there if the original request exceeds the file size. This patch fixes this issue and invokes qemu_iovec_memset() on the iovec's tail. Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20161025025431.24714-5-mreitz@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-11-14 22:47:34 -05:00
Max Reitz	ff5ca1664a	block/curl: Remember all sockets For some connection types (like FTP, generally), more than one socket may be used (in FTP's case: control vs. data stream). As of commit `838ef60249` ("curl: Eliminate unnecessary use of curl_multi_socket_all"), we have to remember all of the sockets used by libcurl, but in fact we only did that for a single one. Since one libcurl connection may use multiple sockets, however, we have to remember them all. Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20161025025431.24714-4-mreitz@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-11-14 22:47:34 -05:00
Max Reitz	4e7676571b	block/curl: Fix return value from curl_read_cb While commit `38bbc0a580` is correct in that the callback is supposed to return the number of bytes handled; what it does not mention is that libcurl will throw an error if the callback did not "handle" all of the data passed to it. Therefore, if the callback receives some data that it cannot handle (either because the receive buffer has not been set up yet or because it would not fit into the receive buffer) and we have to ignore it, we still have to report that the data has been handled. Obviously, this should not happen normally. But it does happen at least for FTP connections where some data (that we do not expect) may be generated when the connection is established. Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20161025025431.24714-3-mreitz@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-11-14 22:47:34 -05:00
Max Reitz	9054d9f6b0	block/curl: Use BDRV_SECTOR_SIZE Currently, curl defines its own constant SECTOR_SIZE. There is no advantage over using the global BDRV_SECTOR_SIZE, so drop it. Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20161025025431.24714-2-mreitz@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-11-14 22:47:34 -05:00
Max Reitz	23dce3873f	block/curl: Drop TFTP "support" Because TFTP does not support byte ranges, it was never usable with our curl block driver. Since apparently nobody has ever complained loudly enough for someone to take care of the issue until now, it seems reasonable to assume that nobody has ever actually used it. Therefore, it should be safe to just drop it from curl's protocol list. [Jeff Cody: Below is additional summary pulled, with some rewording, from followup emails between Max and Markus, to explain what worked and what didn't] TFTP would sometimes work, to a limited extent, for images <= the curl "readahead" size, so long as reads started at offset zero. By default, that readahead size is 256KB. Reads starting at a non-zero offset would also have returned data from a zero offset. It can become more complicated still, with mixed reads at zero offset and non-zero offsets, due to data buffering. In short, TFTP could only have worked before in very specific scenarios with unrealistic expectations and constraints. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 20161102175539.4375-4-mreitz@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-11-14 22:47:34 -05:00
John Snow	111049a4ec	blockjob: refactor backup_start as backup_job_create Refactor backup_start as backup_job_create, which only creates the job, but does not automatically start it. The old interface, 'backup_start', is not kept in favor of limiting the number of nearly-identical interfaces that would have to be edited to keep up with QAPI changes in the future. Callers that wish to synchronously start the backup_block_job can instead just call block_job_start immediately after calling backup_job_create. Transactions are updated to use the new interface, calling block_job_start only during the .commit phase, which helps prevent race conditions where jobs may finish before we even finish building the transaction. This may happen, for instance, during empty block backup jobs. Reported-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: John Snow <jsnow@redhat.com> Message-id: 1478587839-9834-6-git-send-email-jsnow@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-11-14 22:47:34 -05:00
John Snow	5ccac6f186	blockjob: add block_job_start Instead of automatically starting jobs at creation time via backup_start et al, we'd like to return a job object pointer that can be started manually at later point in time. For now, add the block_job_start mechanism and start the jobs automatically as we have been doing, with conversions job-by-job coming in later patches. Of note: cancellation of unstarted jobs will perform all the normal cleanup as if the job had started, particularly abort and clean. The only difference is that we will not emit any events, because the job never actually started. Signed-off-by: John Snow <jsnow@redhat.com> Message-id: 1478587839-9834-5-git-send-email-jsnow@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-11-14 22:47:34 -05:00
John Snow	a7815a764c	blockjob: add .start field Add an explicit start field to specify the entrypoint. We already have ownership of the coroutine itself AND managing the lifetime of the coroutine, let's take control of creation of the coroutine, too. This will allow us to delay creation of the actual coroutine until we know we'll actually start a BlockJob in block_job_start. This avoids the sticky question of how to "un-create" a Coroutine that hasn't been started yet. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1478587839-9834-4-git-send-email-jsnow@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-11-14 22:47:34 -05:00
John Snow	e8a40bf71d	blockjob: add .clean property Cleaning up after we have deferred to the main thread but before the transaction has converged can be dangerous and result in deadlocks if the job cleanup invokes any BH polling loops. A job may attempt to begin cleaning up, but may induce another job to enter its cleanup routine. The second job, part of our same transaction, will block waiting for the first job to finish, so neither job may now make progress. To rectify this, allow jobs to register a cleanup operation that will always run regardless of if the job was in a transaction or not, and if the transaction job group completed successfully or not. Move sensitive cleanup to this callback instead which is guaranteed to be run only after the transaction has converged, which removes sensitive timing constraints from said cleanup. Furthermore, in future patches these cleanup operations will be performed regardless of whether or not we actually started the job. Therefore, cleanup callbacks should essentially confine themselves to undoing create operations, e.g. setup actions taken in what is now backup_start. Reported-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1478587839-9834-3-git-send-email-jsnow@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-11-14 22:47:34 -05:00
Stefan Hajnoczi	682df581c6	-----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJYKeNwAAoJEH3vgQaq/DkO2uAQAKUidQMRQjHs3T5vyb7PcXCe DVV3PO+xKIFl+eWbjDYH2OdPL8OzgyNcGnwtHkdogKklWvYMD002vQ9YmNa2cbJn cO5d8jzSRtsTTLSbtjipFIrvJ8FxedX3Jay0cvEbaEqkgZXJV1sXN5CJ/Cseyf+G IZrG047Kf4V3inV8RDvJ9U/VcSlIcst9icZOuLlONvhXM7f+R5CkvqwUn4yVOObt Wwq32r47Dd9BwzrpxM//7haDvAXYm/xcP3bImN/3LAAwYPGkswxOe1I7Q62+fbpe dd/FSfhe6nRjStKTtH7T+AQk1VJKw34su9/FSxzIZaCzHYMco5CIziCwi0s4BocR GqZ0E0oPxWvrrFhljBxt1wA4d2j354Wq2cGbmb7rQpJTEbfGH5nDHqF1FAbMmd8N F9H6tSCvh1xJaJngGZjlMsgs6TkqyQEnCjk7SSAs1XS+qyrcyOWk7ydzAAc/RIHl iIN4aLcL7ix1rcoVttw+4VOSvihas6nTvRPPwVTbHO5003QpXdr3dckQaASP3PTd wky7blVk8+O8Y242F0AAYUb04agZ+KpqsaOcCL3SIPc3yBv3JCNCNy0gH4WIBX66 yYxTgRtaNhHiUWaVQLximq1QUjz+vsTE07FI56PSabz1e/RkRp+BbrwaYLKYy+/F jBfRpP7pkPIWJhrPmYpJ =fKei -----END PGP SIGNATURE----- Merge remote-tracking branch 'jsnow/tags/ide-pull-request' into staging # gpg: Signature made Mon 14 Nov 2016 04:16:48 PM GMT # gpg: using RSA key 0x7DEF8106AAFC390E # gpg: Good signature from "John Snow (John Huston) <jsnow@redhat.com>" # Primary key fingerprint: FAEB 9711 A12C F475 812F 18F2 88A9 064D 1835 61EB # Subkey fingerprint: F9B7 ABDB BCAC DF95 BE76 CBD0 7DEF 8106 AAFC 390E * jsnow/tags/ide-pull-request: ahci-test: add QMP tray test for ATAPI libqos/ahci: Add get_sense and test_ready libqos/ahci: Add ATAPI tray macros libqos/ahci: Support expected errors libqtest: add qmp_eventwait_ref block-backend: Always notify on blk_eject ahci-test: test atapi read_cd with bcl, nb_sectors = 0 ahci-test: Create smaller test ISO images atapi: classify read_cd as conditionally returning data Message-id: 1479140746-22142-1-git-send-email-jsnow@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-11-14 17:07:16 +00:00
John Snow	c47ee043dc	block-backend: Always notify on blk_eject blk_eject is only used by scsi-disk and atapi, and in both cases we only attempt to invoke blk_eject if we have a bona-fide change in tray state. The "issue" here is that the tray state does not generate a QMP event unless there is a medium/BDS attached to the device, so if libvirt et al are waiting for a tray event to occur from an empty-but-closed drive, software opening that drive will not emit an event and libvirt will wait forever. Change this by modifying blk_eject to always emit an event, instead of conditionally on a "real" backend eject. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1373264 Reported-by: Peter Krempa <pkrempa@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1478553214-497-2-git-send-email-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2016-11-14 11:15:54 -05:00
Fam Zheng	4e6d13c983	raw-posix: Rename 'raw_s' to 'rs' It is too confusing because it sounds like a BDRVRawState variable. Suggested-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 1477565117-17230-1-git-send-email-famz@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-11-11 15:56:22 +01:00

... 2 3 4 5 6 ...

3147 Commits