qemu-e2k

Author	SHA1	Message	Date
Christian Schoenebeck	f83df00900	9pfs: fix crash in v9fs_walk() v9fs_walk() utilizes the v9fs_co_run_in_worker({...}) macro to run the supplied fs driver code block on a background worker thread. When either the 'Twalk' client request was interrupted or if the client requested fid for that 'Twalk' request caused a stat error then that fs driver code block was left by 'break' keyword, with the intention to return from worker thread back to main thread as well: v9fs_co_run_in_worker({ if (v9fs_request_cancelled(pdu)) { err = -EINTR; break; } err = s->ops->lstat(&s->ctx, &dpath, &fidst); if (err < 0) { err = -errno; break; } ... }); However that 'break;' statement also skipped the v9fs_co_run_in_worker() macro's final and mandatory /* re-enter back to qemu thread */ qemu_coroutine_yield(); call and thus caused the rest of v9fs_walk() to be continued being executed on the worker thread instead of main thread, eventually leading to a crash in the transport virtio transport driver. To fix this issue and to prevent the same error from happening again by other users of v9fs_co_run_in_worker() in future, auto wrap the supplied code block into its own do { } while (0); loop inside the 'v9fs_co_run_in_worker' macro definition. Full discussion and backtrace: https://lists.gnu.org/archive/html/qemu-devel/2021-08/msg05209.html https://lists.gnu.org/archive/html/qemu-devel/2021-09/msg00174.html Fixes: `8d6cb10073` Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Cc: qemu-stable@nongnu.org Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <E1mLTBg-0002Bh-2D@lizzy.crudebyte.com>	2021-09-02 13:26:22 +02:00
Christian Schoenebeck	869605b5a0	hw/9pfs: use g_autofree in v9fs_walk() where possible Suggested-by: Greg Kurz <groug@kaod.org> Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <b51670d2a39399535a035f6bc77c3cbeed85edae.1629208359.git.qemu_oss@crudebyte.com>	2021-09-02 13:26:22 +02:00
Christian Schoenebeck	97b1d8fdf6	hw/9pfs: avoid 'path' copy in v9fs_walk() The v9fs_walk() function resolves all client submitted path nodes to the local 'pathes' array. Using a separate string scalar variable 'path' inside the background worker thread loop and copying that local 'path' string scalar variable subsequently to the 'pathes' array (at the end of each loop iteration) is not necessary. Instead simply resolve each path directly to the 'pathes' array and don't use the string scalar variable 'path' inside the fs worker thread loop at all. The only advantage of the 'path' scalar was that in case of an error the respective 'pathes' element would not be filled. Right now this is not an issue as the v9fs_walk() function returns as soon as any error occurs. Suggested-by: Greg Kurz <groug@kaod.org> Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <7dacbecf25b2c9b4a0ce12d689a8a535f09a31e3.1629208359.git.qemu_oss@crudebyte.com>	2021-09-02 13:26:22 +02:00
Christian Schoenebeck	8d6cb10073	9pfs: reduce latency of Twalk As with previous performance optimization on Treaddir handling; reduce the overall latency, i.e. overall time spent on processing a Twalk request by reducing the amount of thread hops between the 9p server's main thread and fs worker thread(s). In fact this patch even reduces the thread hops for Twalk handling to its theoritical minimum of exactly 2 thread hops: main thread -> fs worker thread -> main thread This is achieved by doing all the required fs driver tasks altogether in a single v9fs_co_run_in_worker({ ... }); code block. Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <1a6701674afc4f08d40396e3aa2631e18a4dbb33.1622821729.git.qemu_oss@crudebyte.com>	2021-07-05 13:03:16 +02:00
Christian Schoenebeck	66550339b7	9pfs: drop root_qid There is no longer a user of root_qid, so drop it. Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <6896dd161d3257db6b0513842a14f87ca191fdf6.1622821729.git.qemu_oss@crudebyte.com>	2021-07-05 13:03:16 +02:00
Christian Schoenebeck	f22cad4228	9pfs: replace not_same_qid() by same_stat_id() As we are actually only comparing the filesystem ID (i.e. device number and inode number pair) let's use the POSIX stat buffer instead of QIDs, because resolving QIDs requires to be done on 9p server's main thread only as it might mutate the server state if inode remapping is enabled. Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <26aa465ff9cc9c07e053331554a02fdae3994417.1622821729.git.qemu_oss@crudebyte.com>	2021-07-05 13:03:16 +02:00
Christian Schoenebeck	1d0fc0d0ee	9pfs: drop fid_to_qid() There is only one user of fid_to_qid() which is v9fs_walk(). Let's open-code fid_to_qid() directly within v9fs_walk(), because fid_to_qid() hides the POSIX stat buffer which we are going to need in the subsequent patch. Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <e9a4c9c7a0792ed4db6578d105a0823ea05bc324.1622821729.git.qemu_oss@crudebyte.com>	2021-07-05 13:03:16 +02:00
Christian Schoenebeck	110243750d	9pfs: capture root stat We already capture the QID of the exported 9p root path, i.e. to prevent client access outside the defined, exported filesystem's tree. This is currently checked by comparing the root QID with another FID's QID. The problem with the latter is that resolving a QID of any given 9p path can only be done on 9p server's main thread, that's because it might mutate the server's state if inode remapping is enabled. For that reason also capture the POSIX stat info of the root path for being able to identify on any (e.g. worker) thread whether an arbitrary given path is identical to the export root. Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <eb07d6c2e9925788454cfe33d3802e4ffb23ea9a.1622821729.git.qemu_oss@crudebyte.com>	2021-07-05 13:03:16 +02:00
Christian Schoenebeck	8bf27550ef	9pfs: fix not_same_qid() There is only one user of not_same_qid() which is v9fs_walk() and the latter is using it for comparing a client supplied path with the 9p export root path, for the sole purpose to prevent a Twalk request from escaping from the exported 9p tree via "..". However for that specific purpose the implementation of not_same_qid() is wrong; if mtime of the 9p export root path changed between Tattach and Twalk then not_same_qid() returns true when actually comparing against the export root path. To fix for the actual semantic being used, only compare QID path members, but do not compare version or type members. Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <ca0abae4a899d81c6e87f683732d6c1f56915232.1622821729.git.qemu_oss@crudebyte.com>	2021-07-05 13:03:16 +02:00
Christian Schoenebeck	232a4d2c25	9pfs: simplify v9fs_walk() There is only one comparison between nwnames and P9_MAXWELEM required. Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <E1liKiz-0006BC-Ja@lizzy.crudebyte.com>	2021-07-05 13:03:16 +02:00
Christian Schoenebeck	6f56908427	9pfs: add link to 9p developer docs To lower the entry level for new developers, add a link to the 9p developer docs (i.e. qemu wiki) to MAINTAINERS and to the beginning of 9p source files, that is to: https://wiki.qemu.org/Documentation/9p Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Acked-by: Greg Kurz <groug@kaod.org> Message-Id: <E1leeDf-0008GZ-9q@lizzy.crudebyte.com>	2021-07-05 13:03:16 +02:00
Stefano Garzarella	d0fb9657a3	docs: fix references to docs/devel/tracing.rst Commit `e50caf4a5c` ("tracing: convert documentation to rST") converted docs/devel/tracing.txt to docs/devel/tracing.rst. We still have several references to the old file, so let's fix them with the following command: sed -i s/tracing.txt/tracing.rst/ $(git grep -l docs/devel/tracing.txt) Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210517151702.109066-2-sgarzare@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>	2021-06-02 06:51:09 +02:00
Mahmoud Mandour	e4fd889f51	hw/9pfs/9p-synth: Replaced qemu_mutex_lock with QEMU_LOCK_GUARD Replaced a call to qemu_mutex_lock and its respective call to qemu_mutex_unlock and used QEMU_LOCK_GUARD macro in their place. This simplifies the code by removing the call required to unlock and also eliminates goto paths. Signed-off-by: Mahmoud Mandour <ma.mandourr@gmail.com> Acked-by: Greg Kurz <groug@kaod.org> Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Message-Id: <20210311031538.5325-9-ma.mandourr@gmail.com> Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>	2021-03-16 11:41:49 +01:00
Chen Qun	d6eb39b554	qtest: delete superfluous inclusions of qtest.h There are 23 files that include the "sysemu/qtest.h", but they do not use any qtest functions. Signed-off-by: Chen Qun <kuhn.chenqun@huawei.com> Acked-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20210226081414.205946-1-kuhn.chenqun@huawei.com> Signed-off-by: Thomas Huth <thuth@redhat.com>	2021-03-09 06:03:53 +01:00
Greg Kurz	81f9766b7a	9pfs: Convert reclaim list to QSLIST Use QSLIST instead of open-coding for a slightly improved readability. No behavioral change. Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Message-Id: <20210122143514.215780-1-groug@kaod.org> Signed-off-by: Greg Kurz <groug@kaod.org>	2021-01-22 18:26:40 +01:00
Greg Kurz	20b7f45b22	9pfs: Improve unreclaim loop If a fid was actually re-opened by v9fs_reopen_fid(), we re-traverse the fid list from the head in case some other request created a fid that needs to be marked unreclaimable as well (i.e. the client opened a new handle on the path that is being unlinked). This is suboptimal since most if not all fids that require it have likely been taken care of already. This is mostly the result of new fids being added to the head of the list. Since the list is now a QSIMPLEQ, add new fids at the end instead to avoid the need to rewind. Take a reference on the fid to ensure it doesn't go away during v9fs_reopen_fid() and that it can be safely passed to QSIMPLEQ_NEXT() afterwards. Since the associated put_fid() can also yield, same is done with the next fid. So the logic here is to get a reference on a fid and only put it back during the next iteration after we could get a reference on the next fid. Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Message-Id: <20210121181510.1459390-1-groug@kaod.org> Signed-off-by: Greg Kurz <groug@kaod.org>	2021-01-22 15:17:19 +01:00
Greg Kurz	feabd6cf78	9pfs: Convert V9fsFidState::fid_list to QSIMPLEQ The fid_list is currently open-coded. This doesn't seem to serve any purpose that cannot be met with QEMU's generic lists. Let's go for a QSIMPLEQ : this will allow to add new fids at the end of the list and to improve the logic in v9fs_mark_fids_unreclaim(). Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Message-Id: <20210118142300.801516-3-groug@kaod.org> Signed-off-by: Greg Kurz <groug@kaod.org>	2021-01-21 17:49:45 +01:00
Greg Kurz	2e53160fc6	9pfs: Convert V9fsFidState::clunked to bool This can only be 0 or 1. Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Message-Id: <20210118142300.801516-2-groug@kaod.org> Signed-off-by: Greg Kurz <groug@kaod.org>	2021-01-21 17:49:45 +01:00
Greg Kurz	acef3f8b47	9pfs/proxy: Check return value of proxy_marshal() This should always successfully write exactly two 32-bit integers. Make it clear with an assert(), like v9fs_receive_status() and v9fs_receive_response() already do when unmarshalling the same header. Fixes: Coverity CID 1438968 Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Message-Id: <161035859647.1221144.4691749806675653934.stgit@bahia.lan> Signed-off-by: Greg Kurz <groug@kaod.org>	2021-01-21 17:49:45 +01:00
Greg Kurz	89fbea8737	9pfs: Fully restart unreclaim loop (CVE-2021-20181) Depending on the client activity, the server can be asked to open a huge number of file descriptors and eventually hit RLIMIT_NOFILE. This is currently mitigated using a reclaim logic : the server closes the file descriptors of idle fids, based on the assumption that it will be able to re-open them later. This assumption doesn't hold of course if the client requests the file to be unlinked. In this case, we loop on the entire fid list and mark all related fids as unreclaimable (the reclaim logic will just ignore them) and, of course, we open or re-open their file descriptors if needed since we're about to unlink the file. This is the purpose of v9fs_mark_fids_unreclaim(). Since the actual opening of a file can cause the coroutine to yield, another client request could possibly add a new fid that we may want to mark as non-reclaimable as well. The loop is thus restarted if the re-open request was actually transmitted to the backend. This is achieved by keeping a reference on the first fid (head) before traversing the list. This is wrong in several ways: - a potential clunk request from the client could tear the first fid down and cause the reference to be stale. This leads to a use-after-free error that can be detected with ASAN, using a custom 9p client - fids are added at the head of the list : restarting from the previous head will always miss fids added by a some other potential request All these problems could be avoided if fids were being added at the end of the list. This can be achieved with a QSIMPLEQ, but this is probably too much change for a bug fix. For now let's keep it simple and just restart the loop from the current head. Fixes: CVE-2021-20181 Buglink: https://bugs.launchpad.net/qemu/+bug/1911666 Reported-by: Zero Day Initiative <zdi-disclosures@trendmicro.com> Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Message-Id: <161064025265.1838153.15185571283519390907.stgit@bahia.lan> Signed-off-by: Greg Kurz <groug@kaod.org>	2021-01-15 08:44:28 +01:00
Philippe Mathieu-Daudé	e6b99460b1	hw/9pfs: Fix Kconfig dependency problem between 9pfs and Xen Commit `b2c00bce54` ("meson: convert hw/9pfs, cleanup") introduced CONFIG_9PFS (probably a wrong conflict resolution). This config is not used anywhere. Backends depend on CONFIG_FSDEV_9P which itself depends on CONFIG_VIRTFS. Remove the invalid CONFIG_9PFS and use CONFIG_FSDEV_9P instead, to fix the './configure --without-default-devices --enable-xen' build: /usr/bin/ld: libcommon.fa.p/hw_xen_xen-legacy-backend.c.o: in function `xen_be_register_common': hw/xen/xen-legacy-backend.c:754: undefined reference to `xen_9pfs_ops' /usr/bin/ld: libcommon.fa.p/fsdev_qemu-fsdev.c.o:(.data.rel+0x8): undefined reference to `local_ops' /usr/bin/ld: libcommon.fa.p/fsdev_qemu-fsdev.c.o:(.data.rel+0x20): undefined reference to `synth_ops' /usr/bin/ld: libcommon.fa.p/fsdev_qemu-fsdev.c.o:(.data.rel+0x38): undefined reference to `proxy_ops' collect2: error: ld returned 1 exit status Fixes: `b2c00bce54` ("meson: convert hw/9pfs, cleanup") Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Greg Kurz <groug@kaod.org> Tested-by: Greg Kurz <groug@kaod.org> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Acked-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Message-Id: <20201104115706.3101190-3-philmd@redhat.com> Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>	2020-11-05 15:21:11 +01:00
Xinhao Zhang	22e1367587	hw/9pfs : add space before the open parenthesis '(' Fix code style. Space required before the open parenthesis '('. Signed-off-by: Xinhao Zhang <zhangxinhao1@huawei.com> Signed-off-by: Kai Deng <dengkai1@huawei.com> Reported-by: Euler Robot <euler.robot@huawei.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <20201030043515.1030223-3-zhangxinhao1@huawei.com> Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>	2020-11-05 15:14:03 +01:00
Xinhao Zhang	487729e9f6	hw/9pfs : open brace '{' following struct go on the same line Fix code style. Open braces for struct should go on the same line. Signed-off-by: Xinhao Zhang <zhangxinhao1@huawei.com> Signed-off-by: Kai Deng <dengkai1@huawei.com> Reported-by: Euler Robot <euler.robot@huawei.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <20201030043515.1030223-2-zhangxinhao1@huawei.com> Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>	2020-11-05 15:14:03 +01:00
Xinhao Zhang	01011733ea	hw/9pfs : add spaces around operator Fix code style. Operator needs spaces both sides. Signed-off-by: Xinhao Zhang <zhangxinhao1@huawei.com> Signed-off-by: Kai Deng <dengkai1@huawei.com> Reported-by: Euler Robot <euler.robot@huawei.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <20201030043515.1030223-1-zhangxinhao1@huawei.com> Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>	2020-11-05 15:14:03 +01:00
Christian Schoenebeck	b036d9ac69	9pfs: suppress performance warnings on qtest runs Don't trigger any performance warning if we're just running test cases, because tests intentionally run for edge cases. So far performance warnings were suppressed for the 'synth' fs driver backend only. This patch suppresses them for all 9p fs driver backends. Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <a2d2ff2163f8853ea782a7a1d4e6f2afd7c29ffe.1603106145.git.qemu_oss@crudebyte.com> Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>	2020-10-19 14:25:40 +02:00
Eduardo Habkost	8063396bf3	Use OBJECT_DECLARE_SIMPLE_TYPE when possible This converts existing DECLARE_INSTANCE_CHECKER usage to OBJECT_DECLARE_SIMPLE_TYPE when possible. $ ./scripts/codeconverter/converter.py -i \ --pattern=AddObjectDeclareSimpleType $(git grep -l '' -- '*.[ch]') Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Acked-by: Paul Durrant <paul@xen.org> Message-Id: <20200916182519.415636-6-ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2020-09-18 14:12:32 -04:00
Christian Schoenebeck	c418f935ac	9pfs: disable msize warning for synth driver Previous patch introduced a performance warning being logged on host side if client connected with an 'msize' <= 8192. Disable this performance warning for the synth driver to prevent that warning from being printed whenever the 9pfs (qtest) test cases are running. Introduce a new export flag V9FS_NO_PERF_WARN for that purpose, which might also be used to disable such warnings from the CLI in future. We could have also prevented the warning by simply raising P9_MAX_SIZE in virtio-9p-test.c to any value larger than 8192, however in the context of test cases it makes sense running for edge cases, which includes the lowest 'msize' value supported by the server which is 4096, hence we want to preserve an msize of 4096 for the test client. Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <E1kEyDy-0006nN-5A@lizzy.crudebyte.com> Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>	2020-09-15 12:12:03 +02:00
Christian Schoenebeck	62777d825b	9pfs: log warning if msize <= 8192 It is essential to choose a reasonable high value for 'msize' to avoid severely degraded file I/O performance. This parameter can only be chosen on client/guest side, and a Linux client defaults to an 'msize' of only 8192 if the user did not explicitly specify a value for 'msize', which results in very poor file I/O performance. Unfortunately many users are not aware that they should specify an appropriate value for 'msize' to avoid severe performance issues, so log a performance warning (with a QEMU wiki link explaining this issue in detail) on host side in that case to make it more clear. Currently a client cannot automatically pick a reasonable value for 'msize', because a good value for 'msize' depends on the file I/O potential of the underlying storage on host side, i.e. a feature invisible to the client, and even then a user would still need to trade off between performance profit and additional RAM costs, i.e. with growing 'msize' (RAM occupation), performance still increases, but performance delta will shrink continuously. Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <e6fc84845c95816ad5baecb0abd6bfefdcf7ec9f.1599144062.git.qemu_oss@crudebyte.com> Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>	2020-09-15 12:12:03 +02:00
Eduardo Habkost	8110fa1d94	Use DECLARE_CHECKER macros Generated using: $ ./scripts/codeconverter/converter.py -i \ --pattern=TypeCheckMacro $(git grep -l '' -- '*.[ch]') Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20200831210740.126168-12-ehabkost@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20200831210740.126168-13-ehabkost@redhat.com> Message-Id: <20200831210740.126168-14-ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2020-09-09 09:27:09 -04:00
Eduardo Habkost	db1015e92e	Move QOM typedefs and add missing includes Some typedefs and macros are defined after the type check macros. This makes it difficult to automatically replace their definitions with OBJECT_DECLARE_TYPE. Patch generated using: $ ./scripts/codeconverter/converter.py -i \ --pattern=QOMStructTypedefSplit $(git grep -l '' -- '.[ch]') which will split "typdef struct { ... } TypedefName" declarations. Followed by: $ ./scripts/codeconverter/converter.py -i --pattern=MoveSymbols \ $(git grep -l '' -- '.[ch]') which will: - move the typedefs and #defines above the type check macros - add missing #include "qom/object.h" lines if necessary Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20200831210740.126168-9-ehabkost@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20200831210740.126168-10-ehabkost@redhat.com> Message-Id: <20200831210740.126168-11-ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2020-09-09 09:26:43 -04:00
Peter Maydell	30aa19446d	9pfs: Fix severe performance issue of Treaddir requests. -----BEGIN PGP SIGNATURE----- iQJLBAABCgA1FiEEltjREM96+AhPiFkBNMK1h2Wkc5UFAl8zvx0XHHFlbXVfb3Nz QGNydWRlYnl0ZS5jb20ACgkQNMK1h2Wkc5Uthw//cXXwifzzjUaLccxkTCRejdZH tRLVhx8Asp4JG5WV+djF78dAh8UGw6DPMGIejqgZyBW3fDwQzbJGSycMWCfLtDwS 176rDS0yYfpHM4hVW3dVIvSC6ea1hXlzZQP4STe1ZSghVXYLjFLY6u5aFJmvtS2E vh33VecxE/MyKvJlTBpNG4h/oNz5PIJXPOsBI/N9kIX7sBDXZMI/X90SSJ0m/MJa heT/DRXTDJo+9m8K4Eibso/Akx8h+ZuyMwSR+b5e/9OKqylMdFKKBoGSSPDY2h8r q5OweV0Aewfj885qnD7BfH/Iis6re/qbFcQz6gxqZW0j/aW71yRoFXbFucvgX0ie 1HLiLHd/gv9HAwT8TeYUT7bldIDyk2jiD14cvhkE9PXlWmGigu0aMiXhPJ2/Jbx2 uJUIbLRXk6d/eds8q+2KO8+H6c6PmXMy40rqXDMFbUHCJIYDVH0K3hvH+4h8uE63 PKRuwoI+XOryw6dxEQlx206CfDUrjnZ+X4+v7UloTEy6/4BxlcagFQDCgyHEqyJL PVlkOjRyJWDt8Q1k6YpZImj+OaTzLmnLE8/ucLzCnaHEVqWQUJwwO/jeeCgFt3a0 oAUoTZUnpS7OM/oNWRx6YiheM8Ynk9nb6rAjeCpGnNgDhihq9Oh9/PKsXwTXUdyL sywT9dVI0Y4m3LyF7ok= =1Qh/ -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/cschoenebeck/tags/pull-9p-20200812' into staging 9pfs: Fix severe performance issue of Treaddir requests. # gpg: Signature made Wed 12 Aug 2020 11:06:21 BST # gpg: using RSA key 96D8D110CF7AF8084F88590134C2B58765A47395 # gpg: issuer "qemu_oss@crudebyte.com" # gpg: Good signature from "Christian Schoenebeck <qemu_oss@crudebyte.com>" [unknown] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: ECAB 1A45 4014 1413 BA38 4926 30DB 47C3 A012 D5F4 # Subkey fingerprint: 96D8 D110 CF7A F808 4F88 5901 34C2 B587 65A4 7395 * remotes/cschoenebeck/tags/pull-9p-20200812: 9pfs: clarify latency of v9fs_co_run_in_worker() 9pfs: differentiate readdir lock between 9P2000.u vs. 9P2000.L 9pfs: T_readdir latency optimization 9pfs: add new function v9fs_co_readdir_many() 9pfs: split out fs driver core of v9fs_co_readdir() 9pfs: make v9fs_readdir_response_size() public tests/virtio-9p: added split readdir tests Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-08-24 16:39:53 +01:00
Marc-André Lureau	b2c00bce54	meson: convert hw/9pfs, cleanup hw/Makefile.objs is gone so there is more code that can be removed. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-08-21 06:30:33 -04:00
Paolo Bonzini	243af0225a	trace: switch position of headers to what Meson requires Meson doesn't enjoy the same flexibility we have with Make in choosing the include path. In particular the tracing headers are using $(build_root)/$(<D). In order to keep the include directives unchanged, the simplest solution is to generate headers with patterns like "trace/trace-audio.h" and place forwarding headers in the source tree such that for example "audio/trace.h" includes "trace/trace-audio.h". This patch is too ugly to be applied to the Makefiles now. It's only a way to separate the changes to the tracing header files from the Meson rewrite of the tracing logic. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-08-21 06:18:24 -04:00
Christian Schoenebeck	da9f2eda25	9pfs: clarify latency of v9fs_co_run_in_worker() As we just fixed a severe performance issue with Treaddir request handling, clarify this overall issue as a comment on v9fs_co_run_in_worker() with the intention to hopefully prevent such performance mistakes in future (and fixing other yet outstanding ones). Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Message-Id: <4d34d332e1aaa8a2cf8dc0b5da4fd7727f2a86e8.1596012787.git.qemu_oss@crudebyte.com> Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>	2020-08-12 09:17:32 +02:00
Christian Schoenebeck	d2c5cf7ca1	9pfs: differentiate readdir lock between 9P2000.u vs. 9P2000.L Previous patch suggests that it might make sense to use a different mutex type now while handling readdir requests, depending on the precise protocol variant, as v9fs_do_readdir_with_stat() (used by 9P2000.u) uses a CoMutex to avoid deadlocks that might happen with QemuMutex otherwise, whereas do_readdir_many() (used by 9P2000.L) should better use a QemuMutex, as the precise behaviour of a failed CoMutex lock on fs driver side would not be clear. And to avoid the wrong lock type being used, be now strict and error out if a 9P2000.L client sends a Tread on a directory, and likeweise error out if a 9P2000.u client sends a Treaddir request. This patch is just intended as transitional measure, as currently 9P2000.u vs. 9P2000.L implementations currently differ where the main logic of fetching directory entries is located at (9P2000.u still being more top half focused, while 9P2000.L already being bottom half focused in regards to fetching directory entries that is). Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Message-Id: <9a2ddc347e533b0d801866afd9dfac853d2d4106.1596012787.git.qemu_oss@crudebyte.com> Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>	2020-08-12 09:17:32 +02:00
Christian Schoenebeck	0c4356ba7d	9pfs: T_readdir latency optimization Make top half really top half and bottom half really bottom half: Each T_readdir request handling is hopping between threads (main I/O thread and background I/O driver threads) several times for every individual directory entry, which sums up to huge latencies for handling just a single T_readdir request. Instead of doing that, collect now all required directory entries (including all potentially required stat buffers for each entry) in one rush on a background I/O thread from fs driver by calling the previously added function v9fs_co_readdir_many() instead of v9fs_co_readdir(), then assemble the entire resulting network response message for the readdir request on main I/O thread. The fs driver is still aborting the directory entry retrieval loop (on the background I/O thread inside of v9fs_co_readdir_many()) as soon as it would exceed the client's requested maximum R_readdir response size. So this will not introduce a performance penalty on another end. Also: No longer seek initial directory position in v9fs_readdir(), as this is now handled (more consistently) by v9fs_co_readdir_many() instead. Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Message-Id: <c7c3d1cf4e86611538cef44897842819d9359d7a.1596012787.git.qemu_oss@crudebyte.com> Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>	2020-08-12 09:17:32 +02:00
Christian Schoenebeck	2149675b19	9pfs: add new function v9fs_co_readdir_many() The newly added function v9fs_co_readdir_many() retrieves multiple directory entries with a single fs driver request. It is intended to replace uses of v9fs_co_readdir(), the latter only retrieves a single directory entry per fs driver request instead. The reason for this planned replacement is that for every fs driver request the coroutine is dispatched from main I/O thread to a background I/O thread and eventually dispatched back to main I/O thread. Hopping between threads adds latency. So if a 9pfs Treaddir request reads a large amount of directory entries, this currently sums up to huge latencies of several hundred ms or even more. So using v9fs_co_readdir_many() instead of v9fs_co_readdir() will provide significant performance improvements. Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Message-Id: <73dc827a12ef577ae7e644dcf34a5c0e443ab42f.1596012787.git.qemu_oss@crudebyte.com> Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>	2020-08-12 09:17:32 +02:00
Christian Schoenebeck	dd8151f4fe	9pfs: split out fs driver core of v9fs_co_readdir() The implementation of v9fs_co_readdir() has two parts: the outer part is executed by main I/O thread, whereas the inner part is executed by fs driver on a background I/O thread. Move the inner part to its own new, private function do_readdir(), so it can be shared by another upcoming new function. This is just a preparatory patch for the subsequent patch, with the purpose to avoid the next patch to clutter the overall diff. Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <a426ee06e77584fa2d8253ce5d8bea519eb3ffd4.1596012787.git.qemu_oss@crudebyte.com> Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>	2020-08-12 09:17:32 +02:00
Christian Schoenebeck	29c9d2ca80	9pfs: make v9fs_readdir_response_size() public Rename function v9fs_readdir_data_size() -> v9fs_readdir_response_size() and make it callable from other units. So far this function is only used by 9p.c, however subsequent patches require the function to be callable from another 9pfs unit. And as we're at it; also make it clear for what this function is used for. Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <3668ebc7d5b929a0e4f1357457060d96f50f76f4.1596012787.git.qemu_oss@crudebyte.com> Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>	2020-08-12 09:17:32 +02:00
Vladimir Sementsov-Ogievskiy	92c451222c	virtio-9p: Use ERRP_GUARD() If we want to check error after errp-function call, we need to introduce local_err and then propagate it to errp. Instead, use the ERRP_GUARD() macro, benefits are: 1. No need of explicit error_propagate call 2. No need of explicit local_err variable: use errp directly 3. ERRP_GUARD() leaves errp as is if it's not NULL or &error_fatal, this means that we don't break error_abort (we'll abort on error_set, not on error_propagate) If we want to add some info to errp (by error_prepend() or error_append_hint()), we must use the ERRP_GUARD() macro. Otherwise, this info will not be added when errp == &error_fatal (the program will exit prior to the error_append_hint() or error_prepend() call). Fix such a case in v9fs_device_realize_common(). This commit is generated by command sed -n '/^virtio-9p$/,/^$/{s/^F: //p}' MAINTAINERS \| \ xargs git ls-files \| grep '\.[hc]$' \| \ xargs spatch \ --sp-file scripts/coccinelle/errp-guard.cocci \ --macro-file scripts/cocci-macro-file.h \ --in-place --no-show-diff --max-width 80 Reported-by: Kevin Wolf <kwolf@redhat.com> Reported-by: Greg Kurz <groug@kaod.org> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Acked-by: Greg Kurz <groug@kaod.org> Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com> [Commit message tweaked] Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20200707165037.1026246-7-armbru@redhat.com> [ERRP_AUTO_PROPAGATE() renamed to ERRP_GUARD(), and auto-propagated-errp.cocci to errp-guard.cocci. Commit message tweaked again.]	2020-07-10 15:18:09 +02:00
Markus Armbruster	9261ef5e32	Clean up some calls to ignore Error objects the right way Receiving the error in a local variable only to free it is less clear (and also less efficient) than passing NULL. Clean up. Cc: Daniel P. Berrange <berrange@redhat.com> Cc: Jerome Forissier <jerome@forissier.org> CC: Greg Kurz <groug@kaod.org> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <20200630090351.1247703-4-armbru@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>	2020-07-02 06:25:28 +02:00
Stefano Stabellini	84af75577c	xen/9pfs: increase max ring order to 9 The max order allowed by the protocol is 9. Increase the max order supported by QEMU to 9 to increase performance. Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com> Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Message-Id: <20200521192627.15259-3-sstabellini@kernel.org> Signed-off-by: Greg Kurz <groug@kaod.org>	2020-05-25 11:45:40 +02:00
Stefano Stabellini	a4c4d46272	xen/9pfs: yield when there isn't enough room on the ring Instead of truncating replies, which is problematic, wait until the client reads more data and frees bytes on the reply ring. Do that by calling qemu_coroutine_yield(). The corresponding qemu_coroutine_enter_if_inactive() is called from xen_9pfs_bh upon receiving the next notification from the client. We need to be careful to avoid races in case xen_9pfs_bh and the coroutine are both active at the same time. In xen_9pfs_bh, wait until either the critical section is over (ring->co == NULL) or until the coroutine becomes inactive (qemu_coroutine_yield() was called) before continuing. Then, simply wake up the coroutine if it is inactive. Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com> Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Message-Id: <20200521192627.15259-2-sstabellini@kernel.org> Signed-off-by: Greg Kurz <groug@kaod.org>	2020-05-25 11:45:39 +02:00
Stefano Stabellini	cf45183b71	Revert "9p: init_in_iov_from_pdu can truncate the size" This reverts commit `16724a1730`. It causes https://bugs.launchpad.net/bugs/1877688. Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com> Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Message-Id: <20200521192627.15259-1-sstabellini@kernel.org> Signed-off-by: Greg Kurz <groug@kaod.org>	2020-05-25 11:45:38 +02:00
Greg Kurz	ed463454ef	9p: Lock directory streams with a CoMutex Locking was introduced in QEMU 2.7 to address the deprecation of readdir_r(3) in glibc 2.24. It turns out that the frontend code is the worst place to handle a critical section with a pthread mutex: the code runs in a coroutine on behalf of the QEMU mainloop and then yields control, waiting for the fsdev backend to process the request in a worker thread. If the client resends another readdir request for the same fid before the previous one finally unlocked the mutex, we're deadlocked. This never bit us because the linux client serializes readdir requests for the same fid, but it is quite easy to demonstrate with a custom client. A good solution could be to narrow the critical section in the worker thread code and to return a copy of the dirent to the frontend, but this causes quite some changes in both 9p.c and codir.c. So, instead of that, in order for people to easily backport the fix to older QEMU versions, let's simply use a CoMutex since all the users for this sit in coroutines. Fixes: `7cde47d4a8` ("9p: add locking to V9fsDir") Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Message-Id: <158981894794.109297.3530035833368944254.stgit@bahia.lan> Signed-off-by: Greg Kurz <groug@kaod.org>	2020-05-25 10:38:03 +02:00
Dan Robertson	03556ea920	9pfs: include linux/limits.h for XATTR_SIZE_MAX linux/limits.h should be included for the XATTR_SIZE_MAX definition used by v9fs_xattrcreate. Fixes: `3b79ef2cf4` ("9pfs: limit xattr size in xattrcreate") Signed-off-by: Dan Robertson <dan@dlrobertson.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Message-Id: <20200515203015.7090-2-dan@dlrobertson.com> Signed-off-by: Greg Kurz <groug@kaod.org>	2020-05-25 10:38:03 +02:00
Markus Armbruster	b69c3c21a5	qdev: Unrealize must not fail Devices may have component devices and buses. Device realization may fail. Realization is recursive: a device's realize() method realizes its components, and device_set_realized() realizes its buses (which should in turn realize the devices on that bus, except bus_set_realized() doesn't implement that, yet). When realization of a component or bus fails, we need to roll back: unrealize everything we realized so far. If any of these unrealizes failed, the device would be left in an inconsistent state. Must not happen. device_set_realized() lets it happen: it ignores errors in the roll back code starting at label child_realize_fail. Since realization is recursive, unrealization must be recursive, too. But how could a partly failed unrealize be rolled back? We'd have to re-realize, which can fail. This design is fundamentally broken. device_set_realized() does not roll back at all. Instead, it keeps unrealizing, ignoring further errors. It can screw up even for a device with no buses: if the lone dc->unrealize() fails, it still unregisters vmstate, and calls listeners' unrealize() callback. bus_set_realized() does not roll back either. Instead, it stops unrealizing. Fortunately, no unrealize method can fail, as we'll see below. To fix the design error, drop parameter @errp from all the unrealize methods. Any unrealize method that uses @errp now needs an update. This leads us to unrealize() methods that can fail. Merely passing it to another unrealize method cannot cause failure, though. Here are the ones that do other things with @errp: * virtio_serial_device_unrealize() Fails when qbus_set_hotplug_handler() fails, but still does all the other work. On failure, the device would stay realized with its resources completely gone. Oops. Can't happen, because qbus_set_hotplug_handler() can't actually fail here. Pass &error_abort to qbus_set_hotplug_handler() instead. * hw/ppc/spapr_drc.c's unrealize() Fails when object_property_del() fails, but all the other work is already done. On failure, the device would stay realized with its vmstate registration gone. Oops. Can't happen, because object_property_del() can't actually fail here. Pass &error_abort to object_property_del() instead. * spapr_phb_unrealize() Fails and bails out when remove_drcs() fails, but other work is already done. On failure, the device would stay realized with some of its resources gone. Oops. remove_drcs() fails only when chassis_from_bus()'s object_property_get_uint() fails, and it can't here. Pass &error_abort to remove_drcs() instead. Therefore, no unrealize method can fail before this patch. device_set_realized()'s recursive unrealization via bus uses object_property_set_bool(). Can't drop @errp there, so pass &error_abort. We similarly unrealize with object_property_set_bool() elsewhere, always ignoring errors. Pass &error_abort instead. Several unrealize methods no longer handle errors from other unrealize methods: virtio_9p_device_unrealize(), virtio_input_device_unrealize(), scsi_qdev_unrealize(), ... Much of the deleted error handling looks wrong anyway. One unrealize methods no longer ignore such errors: usb_ehci_pci_exit(). Several realize methods no longer ignore errors when rolling back: v9fs_device_realize_common(), pci_qdev_unrealize(), spapr_phb_realize(), usb_qdev_realize(), vfio_ccw_realize(), virtio_device_realize(). Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200505152926.18877-17-armbru@redhat.com>	2020-05-15 07:08:14 +02:00
Christian Schoenebeck	9bbb7e0fe0	xen-9pfs: Fix log messages of reply errors If delivery of some 9pfs response fails for some reason, log the error message by mentioning the 9P protocol reply type, not by client's request type. The latter could be misleading that the error occurred already when handling the request input. Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Acked-by: Stefano Stabellini <sstabellini@kernel.org> Message-Id: <ad0e5a9b6abde52502aa40b30661d29aebe1590a.1589132512.git.qemu_oss@crudebyte.com> Signed-off-by: Greg Kurz <groug@kaod.org>	2020-05-14 08:06:43 +02:00
Omar Sandoval	a5804fcf7b	9pfs: local: ignore O_NOATIME if we don't have permissions QEMU's local 9pfs server passes through O_NOATIME from the client. If the QEMU process doesn't have permissions to use O_NOATIME (namely, it does not own the file nor have the CAP_FOWNER capability), the open will fail. This causes issues when from the client's point of view, it believes it has permissions to use O_NOATIME (e.g., a process running as root in the virtual machine). Additionally, overlayfs on Linux opens files on the lower layer using O_NOATIME, so in this case a 9pfs mount can't be used as a lower layer for overlayfs (cf. `dabfe19719/vmtest/onoatimehack.c` and https://github.com/NixOS/nixpkgs/issues/54509). Luckily, O_NOATIME is effectively a hint, and is often ignored by, e.g., network filesystems. open(2) notes that O_NOATIME "may not be effective on all filesystems. One example is NFS, where the server maintains the access time." This means that we can honor it when possible but fall back to ignoring it. Acked-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Message-Id: <e9bee604e8df528584693a4ec474ded6295ce8ad.1587149256.git.osandov@fb.com> Signed-off-by: Greg Kurz <groug@kaod.org>	2020-05-14 08:06:43 +02:00
Greg Kurz	659f195328	9p/proxy: Fix export_flags The common fsdev options are set by qemu_fsdev_add() before it calls the backend specific option parsing code. In the case of "proxy" this means "writeout" or "readonly" were simply ignored. This has been broken from the beginning. Reported-by: Stéphane Graber <stgraber@ubuntu.com> Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Message-Id: <158349633705.1237488.8895481990204796135.stgit@bahia.lan>	2020-03-10 16:12:49 +01:00

1 2 3 4 5 ...

539 Commits