qemu-e2k

Author	SHA1	Message	Date
Stefan Hajnoczi	6d740fb01b	aio-posix: do not nest poll handlers QEMU's event loop supports nesting, which means that event handler functions may themselves call aio_poll(). The condition that triggered a handler must be reset before the nested aio_poll() call, otherwise the same handler will be called and immediately re-enter aio_poll. This leads to an infinite loop and stack exhaustion. Poll handlers are especially prone to this issue, because they typically reset their condition by finishing the processing of pending work. Unfortunately it is during the processing of pending work that nested aio_poll() calls typically occur and the condition has not yet been reset. Disable a poll handler during ->io_poll_ready() so that a nested aio_poll() call cannot invoke ->io_poll_ready() again. As a result, the disabled poll handler and its associated fd handler do not run during the nested aio_poll(). Calling aio_set_fd_handler() from inside nested aio_poll() could cause it to run again. If the fd handler is pending inside nested aio_poll(), then it will also run again. In theory fd handlers can be affected by the same issue, but they are more likely to reset the condition before calling nested aio_poll(). This is a special case and it's somewhat complex, but I don't see a way around it as long as nested aio_poll() is supported. Cc: qemu-stable@nongnu.org Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2186181 Fixes: `c382706925` ("block: Mark bdrv_co_io_(un)plug() and callers GRAPH_RDLOCK") Cc: Kevin Wolf <kwolf@redhat.com> Cc: Emanuele Giuseppe Esposito <eesposit@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20230502184134.534703-2-stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-05-19 19:12:12 +02:00
Kevin Wolf	78935fcd88	iotests/245: Check if 'compress' driver is available Skip TestBlockdevReopen.test_insert_compress_filter() if the 'compress' driver isn't available. In order to make the test succeed when the case is skipped, we also need to remove any output from it (which would be missing in the case where we skip it). This is done by replacing qemu_io_log() with qemu_io(). In case of failure, qemu_io() raises an exception with the output of the qemu-io binary in its message, so we don't actually lose anything. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230511143801.255021-1-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-05-19 19:12:12 +02:00
Kevin Wolf	71438d8dac	graph-lock: Honour read locks even in the main thread There are some conditions under which we don't actually need to do anything for taking a reader lock: Writing the graph is only possible from the main context while holding the BQL. So if a reader is running in the main context under the BQL and knows that it won't be interrupted until the next writer runs, we don't actually need to do anything. This is the case if the reader code neither has a nested event loop (this is forbidden anyway while you hold the lock) nor is a coroutine (because a writer could run when the coroutine has yielded). These conditions are exactly what bdrv_graph_rdlock_main_loop() asserts. They are not fulfilled in bdrv_graph_co_rdlock(), which always runs in a coroutine. This deletes the shortcuts in bdrv_graph_co_rdlock() that skip taking the reader lock in the main thread. Reported-by: Fiona Ebner <f.ebner@proxmox.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230510203601.418015-9-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-05-19 19:12:12 +02:00
Kevin Wolf	018e5987b5	blockjob: Adhere to rate limit even when reentered early When jobs are sleeping, for example to enforce a given rate limit, they can be reentered early, in particular in order to get paused, to update the rate limit or to get cancelled. Before this patch, they behave in this case as if they had fully completed their rate limiting delay. This means that requests are sped up beyond their limit, violating the constraints that the user gave us. Change the block jobs to sleep in a loop until the necessary delay is completed, while still allowing cancelling them immediately as well pausing (handled by the pause point in job_sleep_ns()) and updating the rate limit. This change is also motivated by iotests cases being prone to fail because drain operations pause and unpause them so often that block jobs complete earlier than they are supposed to. In particular, the next commit would fail iotests 030 without this change. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230510203601.418015-8-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-05-19 19:12:12 +02:00
Kevin Wolf	01a10c2433	test-bdrv-drain: Call bdrv_co_unref() in coroutine context bdrv_unref() is a no_coroutine_fn, so calling it from coroutine context is invalid. Use bdrv_co_unref() instead. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230510203601.418015-7-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-05-19 19:12:12 +02:00
Kevin Wolf	87f130bdaa	test-bdrv-drain: Take graph lock more selectively If we take a reader lock, we can't call any functions that take a writer lock internally without causing deadlocks once the reader lock is actually enforced in the main thread, too. Take the reader lock only where it is actually needed. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230510203601.418015-6-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-05-19 19:12:12 +02:00
Kevin Wolf	3db0c8b25c	qemu-img: Take graph lock more selectively If we take a reader lock, we can't call any functions that take a writer lock internally without causing deadlocks once the reader lock is actually enforced in the main thread, too. Take the reader lock only where it is actually needed. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230510203601.418015-5-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-05-19 19:12:12 +02:00
Kevin Wolf	e3e31dc872	qcow2: Unlock the graph in qcow2_do_open() where necessary qcow2_do_open() calls a few no_co_wrappers that wrap functions taking the graph lock internally as a writer. Therefore, it can't hold the reader lock across these calls, it causes deadlocks. Drop the lock temporarily around the calls. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230510203601.418015-4-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-05-19 19:12:12 +02:00
Kevin Wolf	a184563778	block/export: Fix null pointer dereference in error path There are some error paths in blk_exp_add() that jump to 'fail:' before 'exp' is even created. So we can't just unconditionally access exp->blk. Add a NULL check, and switch from exp->blk to blk, which is available earlier, just to be extra sure that we really cover all cases where BlockDevOps could have been set for it (in practice, this only happens in drv->create() today, so this part of the change isn't strictly necessary). Fixes: Coverity CID 1509238 Fixes: `de79b52604` Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230510203601.418015-3-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Tested-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-05-19 19:12:12 +02:00
Kevin Wolf	4db7ba3b87	block: Call .bdrv_co_create(_opts) unlocked These are functions that modify the graph, so they must be able to take a writer lock. This is impossible if they already hold the reader lock. If they need a reader lock for some of their operations, they should take it internally. Many of them go through blk_(), which will always take the lock itself. Direct calls of bdrv_() need to take the reader lock. Note that while locking for bdrv_co_() calls is checked by TSA, this is not the case for the mixed_coroutine_fns bdrv_(). Holding the lock is still required when they are called from coroutine context like here! This effectively reverts `4ec8df0183`, but adds some internal locking instead. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230510203601.418015-2-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-05-19 19:12:12 +02:00
Akihiro Suda	41f8b63339	docs/interop/qcow2.txt: fix description about "zlib" clusters "zlib" clusters are actually raw deflate (RFC1951) clusters without zlib headers. Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp> Message-Id: <168424874322.11954.1340942046351859521-0@git.sr.ht> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-05-19 19:12:12 +02:00
Vladimir Sementsov-Ogievskiy	d53c89aed1	blockdev: qmp_transaction: drop extra generic layer Let's simplify things: First, actions generally don't need access to common BlkActionState structure. The only exclusion are backup actions that need block_job_txn. Next, for transaction actions of Transaction API is more native to allocated state structure in the action itself. So, do the following transformation: 1. Let all actions be represented by a function with corresponding structure as arguments. 2. Instead of array-map marshaller, let's make a function, that calls corresponding action directly. 3. BlkActionOps and BlkActionState structures become unused Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-Id: <20230510150624.310640-7-vsementsov@yandex-team.ru> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-05-19 19:12:12 +02:00
Vladimir Sementsov-Ogievskiy	c85feafa98	blockdev: use state.bitmap in block-dirty-bitmap-add action Other bitmap related actions use the .bitmap pointer in .abort action, let's do same here: 1. It helps further refactoring, as bitmap-add is the only bitmap action that uses state.action in .abort 2. It must be safe: transaction actions rely on the fact that on .abort() the state is the same as at the end of .prepare(), so that in .abort() we could precisely rollback the changes done by .prepare(). The only way to remove the bitmap during transaction should be block-dirty-bitmap-remove action, but it postpones actual removal to .commit(), so we are OK on any rollback path. (Note also that bitmap-remove is the only bitmap action that has .commit() phase, except for simple g_free the state on .clean()) 3. Again, other bitmap actions behave this way: keep the bitmap pointer during the transaction. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-Id: <20230510150624.310640-6-vsementsov@yandex-team.ru> [kwolf: Also remove the now unused BlockDirtyBitmapState.prepared] Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-05-19 19:12:12 +02:00
Vladimir Sementsov-Ogievskiy	c85f34cf89	blockdev: transaction: refactor handling transaction properties Only backup supports GROUPED mode. Make this logic more clear. And avoid passing extra thing to each action. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-Id: <20230510150624.310640-5-vsementsov@yandex-team.ru> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-05-19 19:12:12 +02:00
Vladimir Sementsov-Ogievskiy	30c96b5559	blockdev: qmp_transaction: refactor loop to classic for Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230510150624.310640-4-vsementsov@yandex-team.ru> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-05-19 19:12:12 +02:00
Vladimir Sementsov-Ogievskiy	240396965f	blockdev: transactions: rename some things Look at qmp_transaction(): dev_list is not obvious name for list of actions. Let's look at qapi spec, this argument is "actions". Let's follow the common practice of using same argument names in qapi scheme and code. To be honest, rename props to properties for same reason. Next, we have to rename global map of actions, to not conflict with new name for function argument. Rename also dev_entry loop variable accordingly to new name of the list. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230510150624.310640-3-vsementsov@yandex-team.ru> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-05-19 19:12:12 +02:00
Vladimir Sementsov-Ogievskiy	8187f63c9c	blockdev: refactor transaction to use Transaction API We are going to add more block-graph modifying transaction actions, and block-graph modifying functions are already based on Transaction API. Next, we'll need to separately update permissions after several graph-modifying actions, and this would be simple with help of Transaction API. So, now let's just transform what we have into new-style transaction actions. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-Id: <20230510150624.310640-2-vsementsov@yandex-team.ru> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-05-19 19:12:12 +02:00
Peter Maydell	d009607d08	Revert "arm/kvm: add support for MTE" This reverts commit `b320e21c48`, which accidentally broke TCG, because it made the TCG -cpu max report the presence of MTE to the guest even if the board hadn't enabled MTE by wiring up the tag RAM. This meant that if the guest then tried to use MTE QEMU would segfault accessing the non-existent tag RAM: ==346473==ERROR: UndefinedBehaviorSanitizer: SEGV on unknown address (pc 0x55f328952a4a bp 0x00000213a400 sp 0x7f7871859b80 T346476) ==346473==The signal is caused by a READ memory access. ==346473==Hint: this fault was caused by a dereference of a high value address (see register values below). Disassemble the provided pc to learn which register was used. #0 0x55f328952a4a in address_space_to_flatview /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/include/exec/memory.h:1108:12 #1 0x55f328952a4a in address_space_translate /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/include/exec/memory.h:2797:31 #2 0x55f328952a4a in allocation_tag_mem /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-clang/../../target/arm/tcg/mte_helper.c:176:10 #3 0x55f32895366c in helper_stgm /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-clang/../../target/arm/tcg/mte_helper.c:461:15 #4 0x7f782431a293 (<unknown module>) It's also not clear that the KVM logic is correct either: MTE defaults to on there, rather than being only on if the board wants it on. Revert the whole commit for now so we can sort out the issues. (We didn't catch this in CI because we have no test cases in avocado that use guests with MTE support.) Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20230519145808.348701-1-peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2023-05-19 08:01:15 -07:00
Bernhard Beschow	87af48a49c	hw/i386/pc: No need for rtc_state to be an out-parameter Now that the RTC is created as part of the southbridges it doesn't need to be an out-parameter any longer. Signed-off-by: Bernhard Beschow <shentey@gmail.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20230519084734.220480-3-shentey@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-05-19 10:30:46 -04:00
Bernhard Beschow	f0bc6bf725	hw/i386/pc: Create RTC controllers in south bridges Just like in the real hardware (and in PIIX4), create the RTC controllers in the south bridges. Signed-off-by: Bernhard Beschow <shentey@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Message-Id: <20230519084734.220480-2-shentey@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-05-19 10:30:46 -04:00
Ira Weiny	547a652fd1	hw/cxl: Introduce cxl_device_get_timestamp() utility function There are new users of this functionality coming shortly so factor it out from the GET_TIMESTAMP mailbox command handling. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Fan Ni <fan.ni@samsung.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Message-Id: <20230423162013.4535-3-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-05-19 10:30:46 -04:00
Jonathan Cameron	b6aab45971	hw/cxl: rename mailbox return code type from ret_code to CXLRetCode Given the increasing usage of this mailbox return code type, now is a good time to switch to QEMU style naming. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Message-Id: <20230423162013.4535-2-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-05-19 10:30:46 -04:00
Sebastian Ott	6a36a4ced8	hw/pci-bridge: make building pcie-to-pci bridge configurable Introduce a CONFIG option to build the pcie-to-pci bridge. No functional change since it's enabled per default for PCIE_PORT=y. Signed-off-by: Sebastian Ott <sebott@redhat.com> Message-Id: <72b6599d-6b27-00b5-aac5-2ebc16a2e023@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-05-19 10:30:46 -04:00
Viktor Prutyanov	206e91d143	virtio-pci: add handling of PCI ATS and Device-TLB enable/disable According to PCIe Address Translation Services specification 5.1.3., ATS Control Register has Enable bit to enable/disable ATS. Guest may enable/disable PCI ATS and, accordingly, Device-TLB for the VirtIO PCI device. So, raise/lower a flag and call a trigger function to pass this event to a device implementation. Signed-off-by: Viktor Prutyanov <viktor@daynix.com> Message-Id: <20230512135122.70403-2-viktor@daynix.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-05-19 10:30:46 -04:00
Bernhard Beschow	9e57b81861	hw/pci-host/pam: Make init_pam() usage more readable Unlike pam_update() which takes the subject -- PAMMemoryRegion -- as first argument, init_pam() takes it as fifth (!) argument. This makes it quite hard to figure out what an init_pam() invocation actually initializes. By moving the subject to the front this should become clearer. While at it, lower the DeviceState parameter to Object, also communicating more clearly that this parameter is just the owner rather than some (heavy?) dependency. Signed-off-by: Bernhard Beschow <shentey@gmail.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20230213162004.2797-8-shentey@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-05-19 10:30:46 -04:00
Bernhard Beschow	f9fddaf7ce	hw/i386/pc: Initialize ram_memory variable directly Going through pc_memory_init() seems quite complicated for a simple assignment. Signed-off-by: Bernhard Beschow <shentey@gmail.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20230213162004.2797-7-shentey@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-05-19 10:30:46 -04:00
Bernhard Beschow	8631743c09	hw/i386/pc_{q35,piix}: Minimize usage of get_system_memory() Signed-off-by: Bernhard Beschow <shentey@gmail.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Message-Id: <20230213162004.2797-6-shentey@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-05-19 10:30:46 -04:00
Bernhard Beschow	1e366da031	hw/i386/pc_{q35,piix}: Reuse MachineClass::desc as SMB product name No need to repeat the descriptions. Signed-off-by: Bernhard Beschow <shentey@gmail.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Message-Id: <20230213162004.2797-5-shentey@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-05-19 10:30:46 -04:00
Bernhard Beschow	1ab7167b09	hw/i386/pc_q35: Reuse machine parameter Signed-off-by: Bernhard Beschow <shentey@gmail.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Message-Id: <20230213162004.2797-4-shentey@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>	2023-05-19 10:30:46 -04:00
Bernhard Beschow	67b4a74a07	hw/pci-host/q35: Inline sysbus_add_io() sysbus_add_io() just wraps memory_region_add_subregion() while also obscuring where the memory is attached. So use memory_region_add_subregion() directly and attach it to the existing memory region s->mch.address_space_io which is set as an alias to get_system_io() by the q35 machine. Signed-off-by: Bernhard Beschow <shentey@gmail.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Message-Id: <20230213162004.2797-3-shentey@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-05-19 10:30:46 -04:00
Bernhard Beschow	273d65020b	hw/pci-host/i440fx: Inline sysbus_add_io() sysbus_add_io() just wraps memory_region_add_subregion() while also obscuring where the memory is attached. So use memory_region_add_subregion() directly and attach it to the existing memory region s->bus->address_space_io which is set as an alias to get_system_io() by the pc machine. Signed-off-by: Bernhard Beschow <shentey@gmail.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Message-Id: <20230213162004.2797-2-shentey@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>	2023-05-19 10:30:46 -04:00
Cindy Lu	bc7b0cac7b	vhost-vdpa: Add support for vIOMMU. 1. The vIOMMU support will make vDPA can work in IOMMU mode. This will fix security issues while using the no-IOMMU mode. To support this feature we need to add new functions for IOMMU MR adds and deletes. Also since the SVQ does not support vIOMMU yet, add the check for IOMMU in vhost_vdpa_dev_start, if the SVQ and IOMMU enable at the same time the function will return fail. 2. Skip the iova_max check vhost_vdpa_listener_skipped_section(). While MR is IOMMU, move this check to vhost_vdpa_iommu_map_notify() Verified in vp_vdpa and vdpa_sim_net driver Signed-off-by: Cindy Lu <lulu@redhat.com> Message-Id: <20230510054631.2951812-5-lulu@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-05-19 10:30:46 -04:00
Cindy Lu	2fbef6aad8	vhost-vdpa: Add check for full 64-bit in region delete The unmap ioctl doesn't accept a full 64-bit span. So need to add check for the section's size in vhost_vdpa_listener_region_del(). Signed-off-by: Cindy Lu <lulu@redhat.com> Message-Id: <20230510054631.2951812-4-lulu@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-05-19 10:30:46 -04:00
Cindy Lu	3d1e4d34a8	vhost_vdpa: fix the input in trace_vhost_vdpa_listener_region_del() In trace_vhost_vdpa_listener_region_del, the value for llend should change to int128_get64(int128_sub(llend, int128_one())) Signed-off-by: Cindy Lu <lulu@redhat.com> Message-Id: <20230510054631.2951812-3-lulu@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-05-19 10:30:46 -04:00
Cindy Lu	74b5d2b56c	vhost: expose function vhost_dev_has_iommu() To support vIOMMU in vdpa, need to exposed the function vhost_dev_has_iommu, vdpa will use this function to check if vIOMMU enable. Signed-off-by: Cindy Lu <lulu@redhat.com> Message-Id: <20230510054631.2951812-2-lulu@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-05-19 10:30:46 -04:00
Mauro Matteo Cascella	3e69908907	virtio-crypto: fix NULL pointer dereference in virtio_crypto_free_request Ensure op_info is not NULL in case of QCRYPTODEV_BACKEND_ALG_SYM algtype. Fixes: `0e660a6f90` ("crypto: Introduce RSA algorithm") Signed-off-by: Mauro Matteo Cascella <mcascell@redhat.com> Reported-by: Yiming Tao <taoym@zju.edu.cn> Message-Id: <20230509075317.1132301-1-mcascell@redhat.com> Reviewed-by: Gonglei <arei.gonglei@huawei.com> Reviewed-by: zhenwei pi<pizhenwei@bytedance.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-05-19 10:30:46 -04:00
Eugenio Pérez	1fac00f70b	virtio-net: not enable vq reset feature unconditionally The commit `93a97dc520` ("virtio-net: enable vq reset feature") enables unconditionally vq reset feature as long as the device is emulated. This makes impossible to actually disable the feature, and it causes migration problems from qemu version previous than 7.2. The entire final commit is unneeded as device system already enable or disable the feature properly. This reverts commit `93a97dc520`. Fixes: `93a97dc520` ("virtio-net: enable vq reset feature") Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20230504101447.389398-1-eperezma@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-05-19 10:30:46 -04:00
David Hildenbrand	bab105300b	vhost-user: Remove acpi-specific memslot limit Let's just support 512 memslots on x86-64 and aarch64 as well. The maximum number of ACPI slots (256) is no longer completely expressive ever since we supported virtio-based memory devices. Further, we're completely ignoring other memslots used outside of memory device context, such as memslots used for boot memory. Note that the vhost memslot limit in the kernel is usually configured to be 509. With this change, we prepare vhost-user on the QEMU side to be closer to that limit, to eventually support ~512 memslots in most vhost implementations and have less "surprises" when cold/hotplugging vhost devices while also consuming more memslots than we're currently used to by memory devices (e.g., once virtio-mem starts using multiple memslots). Note that most vhost-user implementations only support a small number of memslots so far, which we can hopefully improve in the near future. We'll leave the PPC special-case as is for now. Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20230503184144.808478-1-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-05-19 10:30:46 -04:00
David Hildenbrand	d5cef02574	virtio-mem: Default to "unplugged-inaccessible=on" with 8.1 on x86-64 Allowing guests to read unplugged memory simplified the bring-up of virtio-mem in Linux guests -- which was limited to x86-64 only. On arm64 (which was added later), we never had legacy guests and don't even allow to configure it, essentially always having "unplugged-inaccessible=on". At this point, all guests we care about should be supporting VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE, so let's change the default for the 8.1 machine. This change implies that also memory that supports the shared zeropage (private anonymous memory) will now require VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE in the driver in order to be usable by the guest -- as default, one can still manually set the unplugged-inaccessible property. Disallowing the guest to read unplugged memory will be important for some future features, such as memslot optimizations or protection of unplugged memory, whereby we'll actually no longer allow the guest to even read from unplugged memory. At some point, we might want to deprecate and remove that property. Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <richard.henderson@linaro.org> Cc: Eduardo Habkost <eduardo@habkost.net> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20230503182352.792458-1-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-05-19 10:30:46 -04:00
Leonardo Bras	5ed3dabe57	hw/pci: Disable PCI_ERR_UNCOR_MASK register for machine type < 8.0 Since it's implementation on v8.0.0-rc0, having the PCI_ERR_UNCOR_MASK set for machine types < 8.0 will cause migration to fail if the target QEMU version is < 8.0.0 : qemu-system-x86_64: get_pci_config_device: Bad config data: i=0x10a read: 40 device: 0 cmask: ff wmask: 0 w1cmask:0 qemu-system-x86_64: Failed to load PCIDevice:config qemu-system-x86_64: Failed to load e1000e:parent_obj qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:02.0/e1000e' qemu-system-x86_64: load of migration failed: Invalid argument The above test migrated a 7.2 machine type from QEMU master to QEMU 7.2.0, with this cmdline: ./qemu-system-x86_64 -M pc-q35-7.2 [-incoming XXX] In order to fix this, property x-pcie-err-unc-mask was introduced to control when PCI_ERR_UNCOR_MASK is enabled. This property is enabled by default, but is disabled if machine type <= 7.2. Fixes: `010746ae1d` ("hw/pci/aer: Implement PCI_ERR_UNCOR_MASK register") Suggested-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Leonardo Bras <leobras@redhat.com> Message-Id: <20230503002701.854329-1-leobras@redhat.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Fixes: https://gitlab.com/qemu-project/qemu/-/issues/1576 Tested-by: Fiona Ebner <f.ebner@proxmox.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-05-19 10:30:46 -04:00
Stefan Hajnoczi	6f8be29ec1	vhost-user: send SET_STATUS 0 after GET_VRING_BASE Setting the VIRTIO Device Status Field to 0 resets the device. The device's state is lost, including the vring configuration. vhost-user.c currently sends SET_STATUS 0 before GET_VRING_BASE. This risks confusion about the lifetime of the vhost-user state (e.g. vring last_avail_idx) across VIRTIO device reset. Eugenio Pérez <eperezma@redhat.com> adjusted the order for vhost-vdpa.c in commit `c3716f260b` ("vdpa: move vhost reset after get vring base") and in that commit description suggested doing the same for vhost-user in the future. Go ahead and adjust vhost-user.c now. I ran various online code searches to identify vhost-user backends implementing SET_STATUS. It seems only DPDK implements SET_STATUS and Yajun Wu <yajunw@nvidia.com> has confirmed that it is safe to make this change. Fixes: commit `923b8921d2` ("vhost-user: Support vhost_dev_start") Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Cindy Lu <lulu@redhat.com> Cc: Yajun Wu <yajunw@nvidia.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20230501230409.274178-1-stefanha@redhat.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Yajun Wu <yajunw@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-05-19 10:30:46 -04:00
Vladimir Sementsov-Ogievskiy	5b52692f9d	pci: pci_add_option_rom(): refactor: use g_autofree for path variable Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Reviewed-by: David Hildenbrand <david@redhat.com> Message-Id: <20230515125229.44836-3-vsementsov@yandex-team.ru> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com>	2023-05-19 01:36:09 -04:00
Vladimir Sementsov-Ogievskiy	4ab049c7e6	pci: pci_add_option_rom(): improve style Fix over-80 lines and missing curly brackets for if-operators, which are required by QEMU coding style. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Reviewed-by: David Hildenbrand <david@redhat.com> Message-Id: <20230515125229.44836-2-vsementsov@yandex-team.ru> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com>	2023-05-19 01:36:09 -04:00
Eric DeVolder	1141159cb4	ACPI: bios-tables-test.c step 5 (update expected table binaries) Following the guidelines in tests/qtest/bios-tables-test.c, this is step 5 and 6. An examination of all the files impacted (as listed in bios-tables-test-allowe-diff.h) shows only the MADT/APIC tables bumping revision from 1 to 3, and a corresponding change to the checksum. The below diff is typical: --- /tmp/asl-1F9641.dsl 2023-05-16 15:18:31.292579156 -0400 +++ /tmp/asl-GVD741.dsl 2023-05-16 15:18:31.291579149 -0400 @@ -1,32 +1,32 @@ /* * Intel ACPI Component Architecture * AML/ASL+ Disassembler version 20230331 (64-bit version) * Copyright (c) 2000 - 2023 Intel Corporation * - * Disassembly of tests/data/acpi/pc/APIC, Tue May 16 15:18:31 2023 + * Disassembly of /tmp/aml-R4D741, Tue May 16 15:18:31 2023 * * ACPI Data Table [APIC] * * Format: [HexOffset DecimalOffset ByteLength] FieldName : FieldValue (in hex) */ [000h 0000 004h] Signature : "APIC" [Multiple APIC Description Table (MADT)] [004h 0004 004h] Table Length : 00000078 -[008h 0008 001h] Revision : 01 -[009h 0009 001h] Checksum : 8A +[008h 0008 001h] Revision : 03 +[009h 0009 001h] Checksum : 88 [00Ah 0010 006h] Oem ID : "BOCHS " [010h 0016 008h] Oem Table ID : "BXPC " [018h 0024 004h] Oem Revision : 00000001 [01Ch 0028 004h] Asl Compiler ID : "BXPC" [020h 0032 004h] Asl Compiler Revision : 00000001 [024h 0036 004h] Local Apic Address : FEE00000 [028h 0040 004h] Flags (decoded below) : 00000001 PC-AT Compatibility : 1 [02Ch 0044 001h] Subtable Type : 00 [Processor Local APIC] [02Dh 0045 001h] Length : 08 [02Eh 0046 001h] Processor ID : 00 [02Fh 0047 001h] Local Apic ID : 00 [030h 0048 004h] Flags (decoded below) : 00000001 Processor Enabled : 1 @@ -81,24 +81,24 @@ [06Bh 0107 001h] Source : 0B [06Ch 0108 004h] Interrupt : 0000000B [070h 0112 002h] Flags (decoded below) : 000D Polarity : 1 Trigger Mode : 3 [072h 0114 001h] Subtable Type : 04 [Local APIC NMI] [073h 0115 001h] Length : 06 [074h 0116 001h] Processor ID : FF [075h 0117 002h] Flags (decoded below) : 0000 Polarity : 0 Trigger Mode : 0 [077h 0119 001h] Interrupt Input LINT : 01 Raw Table Data: Length 120 (0x78) - 0000: 41 50 49 43 78 00 00 00 01 8A 42 4F 43 48 53 20 // APICx.....BOCHS + 0000: 41 50 49 43 78 00 00 00 03 88 42 4F 43 48 53 20 // APICx.....BOCHS 0010: 42 58 50 43 20 20 20 20 01 00 00 00 42 58 50 43 // BXPC ....BXPC 0020: 01 00 00 00 00 00 E0 FE 01 00 00 00 00 08 00 00 // ................ 0030: 01 00 00 00 01 0C 00 00 00 00 C0 FE 00 00 00 00 // ................ 0040: 02 0A 00 00 02 00 00 00 00 00 02 0A 00 05 05 00 // ................ 0050: 00 00 0D 00 02 0A 00 09 09 00 00 00 0D 00 02 0A // ................ 0060: 00 0A 0A 00 00 00 0D 00 02 0A 00 0B 0B 00 00 00 // ................ 0070: 0D 00 04 06 FF 00 00 01 // ........ Signed-off-by: Eric DeVolder <eric.devolder@oracle.com> Message-Id: <20230517162545.2191-4-eric.devolder@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Acked-by: Ani Sinha <anisinha@redhat.com>	2023-05-19 01:36:09 -04:00
Eric DeVolder	6da94e277c	ACPI: i386: bump to MADT to revision 3 Currently i386 QEMU generates MADT revision 3, and reports MADT revision 1. Set .revision to 3 to match reality. Link: https://lore.kernel.org/linux-acpi/20230327191026.3454-1-eric.devolder@ora cle.com/T/#t Signed-off-by: Eric DeVolder <eric.devolder@oracle.com> Reviewed-by: Ani Sinha <anisinha@redhat.com> Message-Id: <20230517162545.2191-3-eric.devolder@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com>	2023-05-19 01:36:09 -04:00
Eric DeVolder	354b09d228	ACPI: bios-tables-test.c step 2 (allowed-diff entries) Following the guidelines in tests/qtest/bios-tables-test.c, set up bios-tables-test-allowed-diff.h to ignore the imminent changes to the APIC tables, per step 2. Signed-off-by: Eric DeVolder <eric.devolder@oracle.com> Message-Id: <20230517162545.2191-2-eric.devolder@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Acked-by: Ani Sinha <ani@anisinha.ca>	2023-05-19 01:36:09 -04:00
Gregory Price	adacc814f5	hw/cxl: Multi-Region CXL Type-3 Devices (Volatile and Persistent) This commit enables each CXL Type-3 device to contain one volatile memory region and one persistent region. Two new properties have been added to cxl-type3 device initialization: [volatile-memdev] and [persistent-memdev] The existing [memdev] property has been deprecated and will default the memory region to a persistent memory region (although a user may assign the region to a ram or file backed region). It cannot be used in combination with the new [persistent-memdev] property. Partitioning volatile memory from persistent memory is not yet supported. Volatile memory is mapped at DPA(0x0), while Persistent memory is mapped at DPA(vmem->size), per CXL Spec 8.2.9.8.2.0 - Get Partition Info. Signed-off-by: Gregory Price <gregory.price@memverge.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Fan Ni <fan.ni@samsung.com> Tested-by: Fan Ni <fan.ni@samsung.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Message-Id: <20230421160827.2227-4-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-05-19 01:36:09 -04:00
Jonathan Cameron	3521176526	hw/mem: Use memory_region_size() in cxl_type3 Accessors prefered over direct use of int128_get64() as they clamp out of range values. None are expected here but cleaner to always use the accessor than mix and match. Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Message-Id: <20230421160827.2227-3-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Gregory Price <gregory.price@memverge.com>	2023-05-19 01:36:09 -04:00
Gregory Price	847ea4e746	tests/qtest/cxl-test: whitespace, line ending cleanup Defines are starting to exceed line length limits, align them for cleanliness before making modifications. Signed-off-by: Gregory Price <gregory.price@memverge.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Message-Id: <20230421160827.2227-2-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-05-19 01:36:09 -04:00
Jonathan Cameron	823371a630	hw/cxl: Fix incorrect reset of commit and associated clearing of committed. The hardware clearing the commit bit is not spec compliant. Clearing of committed bit when commit is cleared is not specifically stated in the CXL spec, but is the expected (and simplest) permitted behaviour so use that for QEMU emulation. Reviewed-by: Fan Ni <fan.ni@samsung.com> Tested-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> -- v2: Picked up tags. Message-Id: <20230421135906.3515-4-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-05-19 01:36:09 -04:00

... 2 3 4 5 6 ...

104091 Commits