qemu-e2k

Commit Graph

Author	SHA1	Message	Date
Stefano Stabellini	cf45183b71	Revert "9p: init_in_iov_from_pdu can truncate the size" This reverts commit `16724a1730`. It causes https://bugs.launchpad.net/bugs/1877688. Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com> Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Message-Id: <20200521192627.15259-1-sstabellini@kernel.org> Signed-off-by: Greg Kurz <groug@kaod.org>	2020-05-25 11:45:38 +02:00
Dan Robertson	03556ea920	9pfs: include linux/limits.h for XATTR_SIZE_MAX linux/limits.h should be included for the XATTR_SIZE_MAX definition used by v9fs_xattrcreate. Fixes: `3b79ef2cf4` ("9pfs: limit xattr size in xattrcreate") Signed-off-by: Dan Robertson <dan@dlrobertson.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Message-Id: <20200515203015.7090-2-dan@dlrobertson.com> Signed-off-by: Greg Kurz <groug@kaod.org>	2020-05-25 10:38:03 +02:00
Markus Armbruster	b69c3c21a5	qdev: Unrealize must not fail Devices may have component devices and buses. Device realization may fail. Realization is recursive: a device's realize() method realizes its components, and device_set_realized() realizes its buses (which should in turn realize the devices on that bus, except bus_set_realized() doesn't implement that, yet). When realization of a component or bus fails, we need to roll back: unrealize everything we realized so far. If any of these unrealizes failed, the device would be left in an inconsistent state. Must not happen. device_set_realized() lets it happen: it ignores errors in the roll back code starting at label child_realize_fail. Since realization is recursive, unrealization must be recursive, too. But how could a partly failed unrealize be rolled back? We'd have to re-realize, which can fail. This design is fundamentally broken. device_set_realized() does not roll back at all. Instead, it keeps unrealizing, ignoring further errors. It can screw up even for a device with no buses: if the lone dc->unrealize() fails, it still unregisters vmstate, and calls listeners' unrealize() callback. bus_set_realized() does not roll back either. Instead, it stops unrealizing. Fortunately, no unrealize method can fail, as we'll see below. To fix the design error, drop parameter @errp from all the unrealize methods. Any unrealize method that uses @errp now needs an update. This leads us to unrealize() methods that can fail. Merely passing it to another unrealize method cannot cause failure, though. Here are the ones that do other things with @errp: * virtio_serial_device_unrealize() Fails when qbus_set_hotplug_handler() fails, but still does all the other work. On failure, the device would stay realized with its resources completely gone. Oops. Can't happen, because qbus_set_hotplug_handler() can't actually fail here. Pass &error_abort to qbus_set_hotplug_handler() instead. * hw/ppc/spapr_drc.c's unrealize() Fails when object_property_del() fails, but all the other work is already done. On failure, the device would stay realized with its vmstate registration gone. Oops. Can't happen, because object_property_del() can't actually fail here. Pass &error_abort to object_property_del() instead. * spapr_phb_unrealize() Fails and bails out when remove_drcs() fails, but other work is already done. On failure, the device would stay realized with some of its resources gone. Oops. remove_drcs() fails only when chassis_from_bus()'s object_property_get_uint() fails, and it can't here. Pass &error_abort to remove_drcs() instead. Therefore, no unrealize method can fail before this patch. device_set_realized()'s recursive unrealization via bus uses object_property_set_bool(). Can't drop @errp there, so pass &error_abort. We similarly unrealize with object_property_set_bool() elsewhere, always ignoring errors. Pass &error_abort instead. Several unrealize methods no longer handle errors from other unrealize methods: virtio_9p_device_unrealize(), virtio_input_device_unrealize(), scsi_qdev_unrealize(), ... Much of the deleted error handling looks wrong anyway. One unrealize methods no longer ignore such errors: usb_ehci_pci_exit(). Several realize methods no longer ignore errors when rolling back: v9fs_device_realize_common(), pci_qdev_unrealize(), spapr_phb_realize(), usb_qdev_realize(), vfio_ccw_realize(), virtio_device_realize(). Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200505152926.18877-17-armbru@redhat.com>	2020-05-15 07:08:14 +02:00
Christian Schoenebeck	d36a5c2270	9pfs: validate count sent by client with T_readdir A good 9p client sends T_readdir with "count" parameter that's sufficiently smaller than client's initially negotiated msize (maximum message size). We perform a check for that though to avoid the server to be interrupted with a "Failed to encode VirtFS reply type 41" transport error message by bad clients. This count value constraint uses msize - 11, because 11 is the header size of R_readdir. Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <3990d3891e8ae2074709b56449e96ab4b4b93b7d.1579567020.git.qemu_oss@crudebyte.com> [groug: added comment ] Signed-off-by: Greg Kurz <groug@kaod.org>	2020-02-08 09:28:54 +01:00
Christian Schoenebeck	e16453a31a	9pfs: require msize >= 4096 A client establishes a session by sending a Tversion request along with a 'msize' parameter which client uses to suggest server a maximum message size ever to be used for communication (for both requests and replies) between client and server during that session. If client suggests a 'msize' smaller than 4096 then deny session by server immediately with an error response (Rlerror for "9P2000.L" clients or Rerror for "9P2000.u" clients) instead of replying with Rversion. So far any msize submitted by client with Tversion was simply accepted by server without any check. Introduction of some minimum msize makes sense, because e.g. a msize < 7 would not allow any subsequent 9p operation at all, because 7 is the size of the header section common by all 9p message types. A substantial higher value of 4096 was chosen though to prevent potential issues with some message types. E.g. Rreadlink may yield up to a size of PATH_MAX which is usually 4096, and like almost all 9p message types, Rreadlink is not allowed to be truncated by the 9p protocol. This chosen size also prevents a similar issue with Rreaddir responses (provided client always sends adequate 'count' parameter with Treaddir), because even though directory entries retrieval may be split up over several T/Rreaddir messages; a Rreaddir response must not truncate individual directory entries though. So msize should be large enough to return at least one directory entry with the longest possible file name supported by host. Most file systems support a max. file name length of 255. Largest known file name lenght limit would be currently ReiserFS with max. 4032 bytes, which is also covered by this min. msize value because 4032 + 35 < 4096. Furthermore 4096 is already the minimum msize of the Linux kernel's 9pfs client. Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <8ceecb7fb9fdbeabbe55c04339349a36929fb8e3.1579567019.git.qemu_oss@crudebyte.com> Signed-off-by: Greg Kurz <groug@kaod.org>	2020-02-08 09:28:43 +01:00
Daniel Henrique Barboza	b858e80a02	9pfs/9p.c: remove unneeded labels 'out' label in v9fs_xattr_write() and 'out_nofid' label in v9fs_complete_rename() can be replaced by appropriate return calls. CC: Greg Kurz <groug@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Acked-by: Greg Kurz <groug@kaod.org> Signed-off-by: Greg Kurz <groug@kaod.org>	2020-01-20 15:11:39 +01:00
Greg Kurz	16724a1730	9p: init_in_iov_from_pdu can truncate the size init_in_iov_from_pdu might not be able to allocate the full buffer size requested, which comes from the client and could be larger than the transport has available at the time of the request. Specifically, this can happen with read operations, with the client requesting a read up to the max allowed, which might be more than the transport has available at the time. Today the implementation of init_in_iov_from_pdu throws an error, both Xen and Virtio. Instead, change the V9fsTransport interface so that the size becomes a pointer and can be limited by the implementation of init_in_iov_from_pdu. Change both the Xen and Virtio implementations to set the size to the size of the buffer they managed to allocate, instead of throwing an error. However, if the allocated buffer size is less than P9_IOHDRSZ (the size of the header) still throw an error as the case is unhandable. Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com> CC: groug@kaod.org CC: anthony.perard@citrix.com CC: roman@zededa.com CC: qemu_oss@crudebyte.com [groug: fix 32-bit build] Signed-off-by: Greg Kurz <groug@kaod.org>	2020-01-20 15:11:39 +01:00
Dan Schatzberg	68d654daee	9pfs: Fix divide by zero bug Some filesystems may return 0s in statfs (trivially, a FUSE filesystem can do so). QEMU should handle this gracefully and just behave the same as if statfs failed. Signed-off-by: Dan Schatzberg <dschatzberg@fb.com> Acked-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Signed-off-by: Greg Kurz <groug@kaod.org>	2019-11-23 15:51:48 +01:00
Christian Schoenebeck	6b6aa8285d	9p: Use variable length suffixes for inode remapping Use variable length suffixes for inode remapping instead of the fixed 16 bit size prefixes before. With this change the inode numbers on guest will typically be much smaller (e.g. around >2^1 .. >2^7 instead of >2^48 with the previous fixed size inode remapping. Additionally this solution is more efficient, since inode numbers in practice can take almost their entire 64 bit range on guest as well, so there is less likely a need for generating and tracking additional suffixes, which might also be beneficial for nested virtualization where each level of virtualization would shift up the inode bits and increase the chance of expensive remapping actions. The "Exponential Golomb" algorithm is used as basis for generating the variable length suffixes. The algorithm has a parameter k which controls the distribution of bits on increasing indeces (minimum bits at low index vs. maximum bits at high index). With k=0 the generated suffixes look like: Index Dec/Bin -> Generated Suffix Bin 1 [1] -> [1] (1 bits) 2 [10] -> [010] (3 bits) 3 [11] -> [110] (3 bits) 4 [100] -> [00100] (5 bits) 5 [101] -> [10100] (5 bits) 6 [110] -> [01100] (5 bits) 7 [111] -> [11100] (5 bits) 8 [1000] -> [0001000] (7 bits) 9 [1001] -> [1001000] (7 bits) 10 [1010] -> [0101000] (7 bits) 11 [1011] -> [1101000] (7 bits) 12 [1100] -> [0011000] (7 bits) ... 65533 [1111111111111101] -> [1011111111111111000000000000000] (31 bits) 65534 [1111111111111110] -> [0111111111111111000000000000000] (31 bits) 65535 [1111111111111111] -> [1111111111111111000000000000000] (31 bits) Hence minBits=1 maxBits=31 And with k=5 they would look like: Index Dec/Bin -> Generated Suffix Bin 1 [1] -> [000001] (6 bits) 2 [10] -> [100001] (6 bits) 3 [11] -> [010001] (6 bits) 4 [100] -> [110001] (6 bits) 5 [101] -> [001001] (6 bits) 6 [110] -> [101001] (6 bits) 7 [111] -> [011001] (6 bits) 8 [1000] -> [111001] (6 bits) 9 [1001] -> [000101] (6 bits) 10 [1010] -> [100101] (6 bits) 11 [1011] -> [010101] (6 bits) 12 [1100] -> [110101] (6 bits) ... 65533 [1111111111111101] -> [0011100000000000100000000000] (28 bits) 65534 [1111111111111110] -> [1011100000000000100000000000] (28 bits) 65535 [1111111111111111] -> [0111100000000000100000000000] (28 bits) Hence minBits=6 maxBits=28 Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Signed-off-by: Greg Kurz <groug@kaod.org>	2019-10-10 11:36:23 +02:00
Antonios Motakis	f3fe4a2d92	9p: stat_to_qid: implement slow path stat_to_qid attempts via qid_path_prefixmap to map unique files (which are identified by 64 bit inode nr and 32 bit device id) to a 64 QID path value. However this implementation makes some assumptions about inode number generation on the host. If qid_path_prefixmap fails, we still have 48 bits available in the QID path to fall back to a less memory efficient full mapping. Signed-off-by: Antonios Motakis <antonios.motakis@huawei.com> [CS: - Rebased to https://github.com/gkurz/qemu/commits/9p-next (SHA1 7fc4c49e91). - Updated hash calls to new xxhash API. - Removed unnecessary parantheses in qpf_lookup_func(). - Removed unnecessary g_malloc0() result checks. - Log error message when running out of prefixes in qid_path_fullmap(). - Log warning message about potential degraded performance in qid_path_prefixmap(). - Wrapped qpf_table initialization to dedicated qpf_table_init() function. - Fixed typo in comment. ] Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Signed-off-by: Greg Kurz <groug@kaod.org>	2019-10-10 11:36:14 +02:00
Antonios Motakis	1a6ed33cc5	9p: Added virtfs option 'multidevs=remap\|forbid\|warn' 'warn' (default): Only log an error message (once) on host if more than one device is shared by same export, except of that just ignore this config error though. This is the default behaviour for not breaking existing installations implying that they really know what they are doing. 'forbid': Like 'warn', but except of just logging an error this also denies access of guest to additional devices. 'remap': Allows to share more than one device per export by remapping inodes from host to guest appropriately. To support multiple devices on the 9p share, and avoid qid path collisions we take the device id as input to generate a unique QID path. The lowest 48 bits of the path will be set equal to the file inode, and the top bits will be uniquely assigned based on the top 16 bits of the inode and the device id. Signed-off-by: Antonios Motakis <antonios.motakis@huawei.com> [CS: - Rebased to https://github.com/gkurz/qemu/commits/9p-next (SHA1 7fc4c49e91). - Added virtfs option 'multidevs', original patch simply did the inode remapping without being asked. - Updated hash calls to new xxhash API. - Updated docs for new option 'multidevs'. - Fixed v9fs_do_readdir() not having remapped inodes. - Log error message when running out of prefixes in qid_path_prefixmap(). - Fixed definition of QPATH_INO_MASK. - Wrapped qpp_table initialization to dedicated qpp_table_init() function. - Dropped unnecessary parantheses in qpp_lookup_func(). - Dropped unnecessary g_malloc0() result checks. ] Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> [groug: - Moved "multidevs" parsing to the local backend. - Added hint to invalid multidevs option error. - Turn "remap" into "x-remap". ] Signed-off-by: Greg Kurz <groug@kaod.org>	2019-10-10 11:36:05 +02:00
Antonios Motakis	3b5ee9e86b	9p: Treat multiple devices on one export as an error The QID path should uniquely identify a file. However, the inode of a file is currently used as the QID path, which on its own only uniquely identifies files within a device. Here we track the device hosting the 9pfs share, in order to prevent security issues with QID path collisions from other devices. We only print a warning for now but a subsequent patch will allow users to have finer control over the desired behaviour. Failing the I/O will be one the proposed behaviour, so we also change stat_to_qid() to return an error here in order to keep other patches simpler. Signed-off-by: Antonios Motakis <antonios.motakis@huawei.com> [CS: - Assign dev_id to export root's device already in v9fs_device_realize_common(), not postponed in stat_to_qid(). - error_report_once() if more than one device was shared by export. - Return -ENODEV instead of -ENOSYS in stat_to_qid(). - Fixed typo in log comment. ] Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> [groug, changed to warning, updated message and changelog] Signed-off-by: Greg Kurz <groug@kaod.org>	2019-10-10 11:36:05 +02:00
Greg Kurz	c0da0cb761	9p: Simplify error path of v9fs_device_realize_common() Make v9fs_device_unrealize_common() idempotent and use it for rollback, in order to reduce code duplication. Signed-off-by: Greg Kurz <groug@kaod.org>	2019-10-10 11:36:04 +02:00
Antonios Motakis	8703283352	9p: unsigned type for type, version, path There is no need for signedness on these QID fields for 9p. Signed-off-by: Antonios Motakis <antonios.motakis@huawei.com> [CS: - Also make QID type unsigned. - Adjust donttouch_stat() to new types. - Adjust trace-events to new types. ] Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Signed-off-by: Greg Kurz <groug@kaod.org>	2019-10-10 11:36:04 +02:00
Markus Armbruster	db72581598	Include qemu/main-loop.h less In my "build everything" tree, changing qemu/main-loop.h triggers a recompile of some 5600 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h). It includes block/aio.h, which in turn includes qemu/event_notifier.h, qemu/notify.h, qemu/processor.h, qemu/qsp.h, qemu/queue.h, qemu/thread-posix.h, qemu/thread.h, qemu/timer.h, and a few more. Include qemu/main-loop.h only where it's needed. Touching it now recompiles only some 1700 objects. For block/aio.h and qemu/event_notifier.h, these numbers drop from 5600 to 2800. For the others, they shrink only slightly. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190812052359.30071-21-armbru@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>	2019-08-16 13:31:52 +02:00
Greg Kurz	1923923bfa	9p: use g_new(T, n) instead of g_malloc(sizeof(T) * n) Because it is a recommended coding practice (see HACKING). Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>	2018-12-12 14:18:10 +01:00
Greg Kurz	1d20398694	9p: fix QEMU crash when renaming files When using the 9P2000.u version of the protocol, the following shell command line in the guest can cause QEMU to crash: while true; do rm -rf aa; mkdir -p a/b & touch a/b/c & mv a aa; done With 9P2000.u, file renaming is handled by the WSTAT command. The v9fs_wstat() function calls v9fs_complete_rename(), which calls v9fs_fix_path() for every fid whose path is affected by the change. The involved calls to v9fs_path_copy() may race with any other access to the fid path performed by some worker thread, causing a crash like shown below: Thread 12 "qemu-system-x86" received signal SIGSEGV, Segmentation fault. 0x0000555555a25da2 in local_open_nofollow (fs_ctx=0x555557d958b8, path=0x0, flags=65536, mode=0) at hw/9pfs/9p-local.c:59 59 while (*path && fd != -1) { (gdb) bt #0 0x0000555555a25da2 in local_open_nofollow (fs_ctx=0x555557d958b8, path=0x0, flags=65536, mode=0) at hw/9pfs/9p-local.c:59 #1 0x0000555555a25e0c in local_opendir_nofollow (fs_ctx=0x555557d958b8, path=0x0) at hw/9pfs/9p-local.c:92 #2 0x0000555555a261b8 in local_lstat (fs_ctx=0x555557d958b8, fs_path=0x555556b56858, stbuf=0x7fff84830ef0) at hw/9pfs/9p-local.c:185 #3 0x0000555555a2b367 in v9fs_co_lstat (pdu=0x555557d97498, path=0x555556b56858, stbuf=0x7fff84830ef0) at hw/9pfs/cofile.c:53 #4 0x0000555555a1e9e2 in v9fs_stat (opaque=0x555557d97498) at hw/9pfs/9p.c:1083 #5 0x0000555555e060a2 in coroutine_trampoline (i0=-669165424, i1=32767) at util/coroutine-ucontext.c:116 #6 0x00007fffef4f5600 in __start_context () at /lib64/libc.so.6 #7 0x0000000000000000 in () (gdb) The fix is to take the path write lock when calling v9fs_complete_rename(), like in v9fs_rename(). Impact: DoS triggered by unprivileged guest users. Fixes: CVE-2018-19489 Cc: P J P <ppandit@redhat.com> Reported-by: zhibin hu <noirfate@gmail.com> Reviewed-by: Prasad J Pandit <pjp@fedoraproject.org> Signed-off-by: Greg Kurz <groug@kaod.org>	2018-11-23 13:28:03 +01:00
Greg Kurz	5b3c77aa58	9p: take write lock on fid path updates (CVE-2018-19364) Recent commit `5b76ef50f6` fixed a race where v9fs_co_open2() could possibly overwrite a fid path with v9fs_path_copy() while it is being accessed by some other thread, ie, use-after-free that can be detected by ASAN with a custom 9p client. It turns out that the same can happen at several locations where v9fs_path_copy() is used to set the fid path. The fix is again to take the write lock. Fixes CVE-2018-19364. Cc: P J P <ppandit@redhat.com> Reported-by: zhibin hu <noirfate@gmail.com> Reviewed-by: Prasad J Pandit <pjp@fedoraproject.org> Signed-off-by: Greg Kurz <groug@kaod.org>	2018-11-20 13:00:35 +01:00
Keno Fischer	aca6897fba	9p: xattr: Properly translate xattrcreate flags As with unlinkat, these flags come from the client and need to be translated to their host values. The protocol values happen to match linux, but that need not be true in general. Signed-off-by: Keno Fischer <keno@juliacomputing.com> Signed-off-by: Greg Kurz <groug@kaod.org>	2018-06-07 12:17:22 +02:00
Keno Fischer	67e8734574	9p: Properly check/translate flags in unlinkat The 9p-local code previously relied on P9_DOTL_AT_REMOVEDIR and AT_REMOVEDIR having the same numerical value and deferred any errorchecking to the syscall itself. However, while the former assumption is true on Linux, it is not true in general. 9p-handle did this properly however. Move the translation code to the generic 9p server code and add an error if unrecognized flags are passed. Signed-off-by: Keno Fischer <keno@juliacomputing.com> Signed-off-by: Greg Kurz <groug@kaod.org>	2018-06-07 12:17:22 +02:00
Keno Fischer	a647502c58	9p: xattr: Fix crashes due to free of uninitialized value If the size returned from llistxattr/lgetxattr is 0, we skipped the malloc call, leaving xattr.value uninitialized. However, this value is later passed to `g_free` without any further checks, causing an error. Fix that by always calling g_malloc unconditionally. If `size` is 0, it will return NULL, which is safe to pass to g_free. Signed-off-by: Keno Fischer <keno@juliacomputing.com> Signed-off-by: Greg Kurz <groug@kaod.org>	2018-06-07 12:17:22 +02:00
Greg Kurz	8f9c64bfa5	9p: add trace event for v9fs_setattr() Don't print the tv_nsec part of atime and mtime, to stay below the 10 argument limit of trace events. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>	2018-05-02 08:59:24 +02:00
Marc-André Lureau	e446a1eb5e	9p: v9fs_path_copy() readability lhs/rhs doesn't tell much about how argument are handled, dst/src is and const arguments is clearer in my mind. Use g_memdup() while at it. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Greg Kurz <groug@kaod.org>	2018-02-19 18:27:15 +01:00
Greg Kurz	357e2f7f4e	tests: virtio-9p: add FLUSH operation test The idea is to send a victim request that will possibly block in the server and to send a flush request to cancel the victim request. This patch adds two test to verifiy that: - the server does not reply to a victim request that was actually cancelled - the server replies to the flush request after replying to the victim request if it could not cancel it 9p request cancellation reference: http://man.cat-v.org/plan_9/5/flush Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> (groug, change the test to only write a single byte to avoid any alignment or endianess consideration)	2018-02-02 11:11:55 +01:00
Keno Fischer	fc78d5ee76	9pfs: Correctly handle cancelled requests # Background I was investigating spurious non-deterministic EINTR returns from various 9p file system operations in a Linux guest served from the qemu 9p server. ## EINTR, ERESTARTSYS and the linux kernel When a signal arrives that the Linux kernel needs to deliver to user-space while a given thread is blocked (in the 9p case waiting for a reply to its request in 9p_client_rpc -> wait_event_interruptible), it asks whatever driver is currently running to abort its current operation (in the 9p case causing the submission of a TFLUSH message) and return to user space. In these situations, the error message reported is generally ERESTARTSYS. If the userspace processes specified SA_RESTART, this means that the system call will get restarted upon completion of the signal handler delivery (assuming the signal handler doesn't modify the process state in complicated ways not relevant here). If SA_RESTART is not specified, ERESTARTSYS gets translated to EINTR and user space is expected to handle the restart itself. ## The 9p TFLUSH command The 9p TFLUSH commands requests that the server abort an ongoing operation. The man page [1] specifies: ``` If it recognizes oldtag as the tag of a pending transaction, it should abort any pending response and discard that tag. [...] When the client sends a Tflush, it must wait to receive the corresponding Rflush before reusing oldtag for subsequent messages. If a response to the flushed request is received before the Rflush, the client must honor the response as if it had not been flushed, since the completed request may signify a state change in the server ``` In particular, this means that the server must not send a reply with the orignal tag in response to the cancellation request, because the client is obligated to interpret such a reply as a coincidental reply to the original request. # The bug When qemu receives a TFlush request, it sets the `cancelled` flag on the relevant pdu. This flag is periodically checked, e.g. in `v9fs_co_name_to_path`, and if set, the operation is aborted and the error is set to EINTR. However, the server then violates the spec, by returning to the client an Rerror response, rather than discarding the message entirely. As a result, the client is required to assume that said Rerror response is a result of the original request, not a result of the cancellation and thus passes the EINTR error back to user space. This is not the worst thing it could do, however as discussed above, the correct error code would have been ERESTARTSYS, such that user space programs with SA_RESTART set get correctly restarted upon completion of the signal handler. Instead, such programs get spurious EINTR results that they were not expecting to handle. It should be noted that there are plenty of user space programs that do not set SA_RESTART and do not correctly handle EINTR either. However, that is then a userspace bug. It should also be noted that this bug has been mitigated by a recent commit to the Linux kernel [2], which essentially prevents the kernel from sending Tflush requests unless the process is about to die (in which case the process likely doesn't care about the response). Nevertheless, for older kernels and to comply with the spec, I believe this change is beneficial. # Implementation The fix is fairly simple, just skipping notification of a reply if the pdu was previously cancelled. We do however, also notify the transport layer that we're doing this, so it can clean up any resources it may be holding. I also added a new trace event to distinguish operations that caused an error reply from those that were cancelled. One complication is that we only omit sending the message on EINTR errors in order to avoid confusing the rest of the code (which may assume that a client knows about a fid if it sucessfully passed it off to pud_complete without checking for cancellation status). This does mean that if the server acts upon the cancellation flag, it always needs to set err to EINTR. I believe this is true of the current code. [1] https://9fans.github.io/plan9port/man/man9/flush.html [2] https://github.com/torvalds/linux/commit/9523feac272ccad2ad8186ba4fcc891 Signed-off-by: Keno Fischer <keno@juliacomputing.com> Reviewed-by: Greg Kurz <groug@kaod.org> [groug, send a zero-sized reply instead of detaching the buffer] Signed-off-by: Greg Kurz <groug@kaod.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>	2018-02-01 21:21:27 +01:00
Greg Kurz	066eb006b5	9pfs: drop v9fs_register_transport() No good reasons to do this outside of v9fs_device_realize_common(). Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>	2018-02-01 21:21:27 +01:00
Greg Kurz	65603a801e	fsdev: improve error handling of backend init This patch changes some error messages in the backend init code and convert backends to propagate QEMU Error objects instead of calling error_report(). One notable improvement is that the local backend now provides a more detailed error report when it fails to open the shared directory. Signed-off-by: Greg Kurz <groug@kaod.org>	2018-01-08 11:18:23 +01:00
Greg Kurz	7567359094	9pfs: make pdu_marshal() and pdu_unmarshal() static functions They're only used by the 9p core code. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Eric Blake <eblake@redhat.com>	2018-01-08 11:18:22 +01:00
Greg Kurz	d1471233bb	9pfs: fix error path in pdu_submit() If we receive an unsupported request id, we first decide to return -ENOTSUPP to the client, but since the request id causes is_read_only_op() to return false, we change the error to be -EROFS if the fsdev is read-only. This doesn't make sense since we don't know what the client asked for. This patch ensures that -EROFS can only be returned if the request id is supported. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Eric Blake <eblake@redhat.com>	2018-01-08 11:18:22 +01:00
Greg Kurz	8e71b96c62	9pfs: fix some type definitions To comply with the QEMU coding style. Signed-off-by: Greg Kurz <groug@kaod.org>	2018-01-08 11:18:22 +01:00
Greg Kurz	267fcadf32	9pfs: fix v9fs_mark_fids_unreclaim() return value The return value of v9fs_mark_fids_unreclaim() is then propagated to pdu_complete(). It should be a negative errno, not -1. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-11-06 18:05:35 +01:00
Prasad J Pandit	7bd9275630	9pfs: use g_malloc0 to allocate space for xattr 9p back-end first queries the size of an extended attribute, allocates space for it via g_malloc() and then retrieves its value into allocated buffer. Race between querying attribute size and retrieving its could lead to memory bytes disclosure. Use g_malloc0() to avoid it. Reported-by: Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi> Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org> Signed-off-by: Greg Kurz <groug@kaod.org>	2017-10-16 14:21:59 +02:00
Jan Dakinevich	772a73692e	9pfs: check the size of transport buffer before marshaling v9fs_do_readdir_with_stat() should check for a maximum buffer size before an attempt to marshal gathered data. Otherwise, buffers assumed as misconfigured and the transport would be broken. The patch brings v9fs_do_readdir_with_stat() in conformity with v9fs_do_readdir() behavior. Signed-off-by: Jan Dakinevich <jan.dakinevich@gmail.com> [groug, regression caused my commit `8d37de41ca` # 2.10] Signed-off-by: Greg Kurz <groug@kaod.org>	2017-09-20 08:48:52 +02:00
Jan Dakinevich	4d8bc7334b	9pfs: fix name_to_path assertion in v9fs_complete_rename() The third parameter of v9fs_co_name_to_path() must not contain `/' character. The issue is most likely related to 9p2000.u protocol only. Signed-off-by: Jan Dakinevich <jan.dakinevich@gmail.com> [groug, regression caused by commit `f57f587857` # 2.10] Signed-off-by: Greg Kurz <groug@kaod.org>	2017-09-20 08:48:52 +02:00
Jan Dakinevich	6069537f43	9pfs: fix readdir() for 9p2000.u If the client is using 9p2000.u, the following occurs: $ cd ${virtfs_shared_dir} $ mkdir -p a/b/c $ ls a/b ls: cannot access 'a/b/a': No such file or directory ls: cannot access 'a/b/b': No such file or directory a b c instead of the expected: $ ls a/b c This is a regression introduced by commit f57f5878578a; local_name_to_path() now resolves ".." and "." in paths, and v9fs_do_readdir_with_stat()->stat_to_v9stat() then copies the basename of the resulting path to the response. With the example above, this means that "." and ".." are turned into "b" and "a" respectively... stat_to_v9stat() currently assumes it is passed a full canonicalized path and uses it to do two different things: 1) to pass it to v9fs_co_readlink() in case the file is a symbolic link 2) to set the name field of the V9fsStat structure to the basename part of the given path It only has two users: v9fs_stat() and v9fs_do_readdir_with_stat(). v9fs_stat() really needs 1) and 2) to be performed since it starts with the full canonicalized path stored in the fid. It is different for v9fs_do_readdir_with_stat() though because the name we want to put into the V9fsStat structure is the d_name field of the dirent actually (ie, we want to keep the "." and ".." special names). So, we only need 1) in this case. This patch hence adds a basename argument to stat_to_v9stat(), to be used to set the name field of the V9fsStat structure, and moves the basename logic to v9fs_stat(). Signed-off-by: Jan Dakinevich <jan.dakinevich@gmail.com> (groug, renamed old name argument to path and updated changelog) Signed-off-by: Greg Kurz <groug@kaod.org>	2017-09-20 08:48:51 +02:00
Philippe Mathieu-Daudé	403a905b03	9pfs: avoid sign conversion error simplifying the code (note this is how other functions also handle the errors). hw/9pfs/9p.c:948:18: warning: Loss of sign in implicit conversion offset = err; ^~~ Reported-by: Clang Static Analyzer Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Greg Kurz <groug@kaod.org>	2017-09-05 14:01:16 +02:00
Alistair Francis	3dc6f86936	Convert error_report() to warn_report() Convert all uses of error_report("warning:"... to use warn_report() instead. This helps standardise on a single method of printing warnings to the user. All of the warnings were changed using these two commands: find ./* -type f -exec sed -i \ 's\|error_report(".*warning[,:] \|warn_report("\|Ig' {} + Indentation fixed up manually afterwards. The test-qdev-global-props test case was manually updated to ensure that this patch passes make check (as the test cases are case sensitive). Signed-off-by: Alistair Francis <alistair.francis@xilinx.com> Suggested-by: Thomas Huth <thuth@redhat.com> Cc: Jeff Cody <jcody@redhat.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Max Reitz <mreitz@redhat.com> Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Lieven <pl@kamp.de> Cc: Josh Durgin <jdurgin@redhat.com> Cc: "Richard W.M. Jones" <rjones@redhat.com> Cc: Markus Armbruster <armbru@redhat.com> Cc: Peter Crosthwaite <crosthwaite.peter@gmail.com> Cc: Richard Henderson <rth@twiddle.net> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Greg Kurz <groug@kaod.org> Cc: Rob Herring <robh@kernel.org> Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Peter Chubb <peter.chubb@nicta.com.au> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Marcel Apfelbaum <marcel@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Alexander Graf <agraf@suse.de> Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Cornelia Huck <cohuck@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Greg Kurz <groug@kaod.org> Acked-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed by: Peter Chubb <peter.chubb@data61.csiro.au> Acked-by: Max Reitz <mreitz@redhat.com> Acked-by: Marcel Apfelbaum <marcel@redhat.com> Message-Id: <e1cfa2cd47087c248dd24caca9c33d9af0c499b0.1499866456.git.alistair.francis@xilinx.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2017-07-13 13:49:58 +02:00
Greg Kurz	06a37db7b1	9pfs: handle transport errors in pdu_complete() Contrary to what is written in the comment, a buggy guest can misconfigure the transport buffers and pdu_marshal() may return an error. If this ever happens, it is up to the transport layer to handle the situation (9P is transport agnostic). This fixes Coverity issue CID1348518. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>	2017-06-29 15:11:51 +02:00
Greg Kurz	8d37de41ca	virtio-9p: break device if buffers are misconfigured The 9P protocol is transport agnostic: if the guest misconfigured the buffers, the best we can do is to set the broken flag on the device. Signed-off-by: Greg Kurz <groug@kaod.org>	2017-06-29 15:11:51 +02:00
Tobias Schramm	b96feb2cb9	9pfs: local: Add support for custom fmode/dmode in 9ps mapped security modes In mapped security modes, files are created with very restrictive permissions (600 for files and 700 for directories). This makes file sharing between virtual machines and users on the host rather complicated. Imagine eg. a group of users that need to access data produced by processes on a virtual machine. Giving those users access to the data will be difficult since the group access mode is always 0. This patch makes the default mode for both files and directories configurable. Existing setups that don't know about the new parameters keep using the current secure behavior. Signed-off-by: Tobias Schramm <tobleminer@gmail.com> Signed-off-by: Greg Kurz <groug@kaod.org>	2017-06-29 15:11:50 +02:00
Greg Kurz	4fa62005d0	9pfs: check return value of v9fs_co_name_to_path() These v9fs_co_name_to_path() call sites have always been around. I guess no care was taken to check the return value because the name_to_path operation could never fail at the time. This is no longer true: the handle and synth backends can already fail this operation, and so will the local backend soon. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-05-25 10:30:14 +02:00
Greg Kurz	a17d8659c4	9pfs: drop pdu_push_and_notify() Only pdu_complete() needs to notify the client that a request has completed. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>	2017-05-25 10:30:13 +02:00
Greg Kurz	506f327582	virtio-9p/xen-9p: move 9p specific bits to core 9p code These bits aren't related to the transport so let's move them to the core code. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>	2017-05-25 10:30:13 +02:00
Juan Quintela	795c40b8bd	migration: Create migration/blocker.h This allows us to remove lots of includes of migration/migration.h Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-05-17 12:04:59 +02:00
Greg Kurz	6d54af0ea9	9pfs: clear migration blocker at session reset The migration blocker survives a device reset: if the guest mounts a 9p share and then gets rebooted with system_reset, it will be unmigratable until it remounts and umounts the 9p share again. This happens because the migration blocker is supposed to be cleared when we put the last reference on the root fid, but virtfs_reset() wrongly calls free_fid() instead of put_fid(). This patch fixes virtfs_reset() so that it honor the way fids are supposed to be manipulated: first get a reference and later put it back when you're done. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Li Qiang <liqiang6-s@360.cn>	2017-04-04 18:06:01 +02:00
Greg Kurz	18adde86dd	9pfs: fix multiple flush for same request If a client tries to flush the same outstanding request several times, only the first flush completes. Subsequent ones keep waiting for the request completion in v9fs_flush() and, therefore, leak a PDU. This will cause QEMU to hang when draining active PDUs the next time the device is reset. Let have each flush request wake up the next one if any. The last waiter frees the cancelled PDU. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-04-04 18:06:01 +02:00
Li Qiang	d63fb193e7	9pfs: fix file descriptor leak The v9fs_create() and v9fs_lcreate() functions are used to create a file on the backend and to associate it to a fid. The fid shouldn't be already in-use, otherwise both functions may silently leak a file descriptor or allocated memory. The current code doesn't check that. This patch ensures that the fid isn't already associated to anything before using it. Signed-off-by: Li Qiang <liqiang6-s@360.cn> (reworded the changelog, Greg Kurz) Signed-off-by: Greg Kurz <groug@kaod.org>	2017-03-27 21:13:19 +02:00
Greg Kurz	d5f2af7b95	9pfs: don't try to flush self and avoid QEMU hang on reset According to the 9P spec [], when a client wants to cancel a pending I/O request identified by a given tag (uint16), it must send a Tflush message and wait for the server to respond with a Rflush message before reusing this tag for another I/O. The server may still send a completion message for the I/O if it wasn't actually cancelled but the Rflush message must arrive after that. QEMU hence waits for the flushed PDU to complete before sending the Rflush message back to the client. If a client sends 'Tflush tag oldtag' and tag == oldtag, QEMU will then allocate a PDU identified by tag, find it in the PDU list and wait for this same PDU to complete... i.e. wait for a completion that will never happen. This causes a tag and ring slot leak in the guest, and a PDU leak in QEMU, all of them limited by the maximal number of PDUs (128). But, worse, this causes QEMU to hang on device reset since v9fs_reset() wants to drain all pending I/O. This insane behavior is likely to denote a bug in the client, and it would deserve an Rerror message to be sent back. Unfortunately, the protocol allows it and requires all flush requests to suceed (only a Tflush response is expected). The only option is to detect when we have to handle a self-referencing flush request and report success to the client right away. [] http://man.cat-v.org/plan_9/5/flush Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Greg Kurz <groug@kaod.org>	2017-03-21 09:12:47 +01:00
Pradeep Jagadeesh	b8bbdb886e	fsdev: add IO throttle support to fsdev devices This patchset adds the throttle support for the 9p-local driver. For now this functionality can be enabled only through qemu cli options. QMP interface and support to other drivers need further extensions. To make it simple for other 9p drivers, the throttle code has been put in separate files. Signed-off-by: Pradeep Jagadeesh <pradeep.jagadeesh@huawei.com> Reviewed-by: Alberto Garcia <berto@igalia.com> (pass extra NULL CoMutex * argument to qemu_co_queue_wait(), added options to qemu-options.hx, Greg Kurz) Signed-off-by: Greg Kurz <groug@kaod.org>	2017-02-28 10:31:46 +01:00
Paolo Bonzini	4bae2b397f	9pfs: fix v9fs_lock error case In this case, we are marshaling an error status instead of the errno value. Reorganize the out and out_nofid labels to look like all the other cases. Coverity reports this because the "err = -ENOENT" and "err = -EINVAL" assignments above are dead, overwritten by the call to pdu_marshal. (Coverity issues CID1348512 and CID1348513) Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> (also open-coded the success path since locking is a nop for us, Greg Kurz) Signed-off-by: Greg Kurz <groug@kaod.org>	2017-02-28 10:31:46 +01:00

1 2

98 Commits