qemu-e2k

Author	SHA1	Message	Date
Cédric Le Goater	44fa20c928	spapr: Remove support for NVIDIA V100 GPU with NVLink2 NVLink2 support was removed from the PPC PowerNV platform and VFIO in Linux 5.13 with commits : 562d1e207d32 ("powerpc/powernv: remove the nvlink support") b392a1989170 ("vfio/pci: remove vfio_pci_nvlink2") This was 2.5 years ago. Do the same in QEMU with a revert of commit `ec132efaa8` ("spapr: Support NVIDIA V100 GPU with NVLink2"). Some adjustements are required on the NUMA part. Cc: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com> Message-ID: <20230918091717.149950-1-clg@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>	2023-09-18 07:25:28 -03:00
Joao Martins	a31fe5daea	vfio/common: Separate vfio-pci ranges QEMU computes the DMA logging ranges for two predefined ranges: 32-bit and 64-bit. In the OVMF case, when the dynamic MMIO window is enabled, QEMU includes in the 64-bit range the RAM regions at the lower part and vfio-pci device RAM regions which are at the top of the address space. This range contains a large gap and the size can be bigger than the dirty tracking HW limits of some devices (MLX5 has a 2^42 limit). To avoid such large ranges, introduce a new PCI range covering the vfio-pci device RAM regions, this only if the addresses are above 4GB to avoid breaking potential SeaBIOS guests. [ clg: - wrote commit log - fixed overlapping 32-bit and PCI ranges when using SeaBIOS ] Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Cédric Le Goater <clg@redhat.com> Fixes: `5255bbf4ec` ("vfio/common: Add device dirty page tracking start/stop") Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-09-11 08:34:06 +02:00
Avihai Horon	615379764a	vfio/migration: Block VFIO migration with background snapshot Background snapshot allows creating a snapshot of the VM while it's running and keeping it small by not including dirty RAM pages. The way it works is by first stopping the VM, saving the non-iterable devices' state and then starting the VM and saving the RAM while write protecting it with UFFD. The resulting snapshot represents the VM state at snapshot start. VFIO migration is not compatible with background snapshot. First of all, VFIO device state is not even saved in background snapshot because only non-iterable device state is saved. But even if it was saved, after starting the VM, a VFIO device could dirty pages without it being detected by UFFD write protection. This would corrupt the snapshot, as the RAM in it would not represent the RAM at snapshot start. To prevent this, block VFIO migration with background snapshot. Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-09-11 08:34:06 +02:00
Avihai Horon	bf7ef7a2da	vfio/migration: Block VFIO migration with postcopy migration VFIO migration is not compatible with postcopy migration. A VFIO device in the destination can't handle page faults for pages that have not been sent yet. Doing such migration will cause the VM to crash in the destination: qemu-system-x86_64: VFIO_MAP_DMA failed: Bad address qemu-system-x86_64: vfio_dma_map(0x55a28c7659d0, 0xc0000, 0xb000, 0x7f1b11a00000) = -14 (Bad address) qemu: hardware error: vfio: DMA mapping failed, unable to continue To prevent this, block VFIO migration with postcopy migration. Reported-by: Yanghang Liu <yanghliu@redhat.com> Signed-off-by: Avihai Horon <avihaih@nvidia.com> Tested-by: Yanghang Liu <yanghliu@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-09-11 08:34:06 +02:00
Avihai Horon	8118349b1b	vfio/migration: Fail adding device with enable-migration=on and existing blocker If a device with enable-migration=on is added and it causes a migration blocker, adding the device should fail with a proper error. This is not the case with multiple device migration blocker when the blocker already exists. If the blocker already exists and a device with enable-migration=on is added which causes a migration blocker, adding the device will succeed. Fix it by failing adding the device in such case. Fixes: `8bbcb64a71` ("vfio/migration: Make VFIO migration non-experimental") Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-09-11 08:34:06 +02:00
Avihai Horon	5c7a4b6035	vfio/migration: Allow migration of multiple P2P supporting devices Now that P2P support has been added to VFIO migration, allow migration of multiple devices if all of them support P2P migration. Single device migration is allowed regardless of P2P migration support. Signed-off-by: Avihai Horon <avihaih@nvidia.com> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Tested-by: YangHang Liu <yanghliu@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-09-11 08:34:05 +02:00
Avihai Horon	94f775e428	vfio/migration: Add P2P support for VFIO migration VFIO migration uAPI defines an optional intermediate P2P quiescent state. While in the P2P quiescent state, P2P DMA transactions cannot be initiated by the device, but the device can respond to incoming ones. Additionally, all outstanding P2P transactions are guaranteed to have been completed by the time the device enters this state. The purpose of this state is to support migration of multiple devices that might do P2P transactions between themselves. Add support for P2P migration by transitioning all the devices to the P2P quiescent state before stopping or starting the devices. Use the new VMChangeStateHandler prepare_cb to achieve that behavior. This will allow migration of multiple VFIO devices if all of them support P2P migration. Signed-off-by: Avihai Horon <avihaih@nvidia.com> Tested-by: YangHang Liu <yanghliu@redhat.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-09-11 08:34:05 +02:00
Joao Martins	3d4d0f0e06	vfio/migration: Refactor PRE_COPY and RUNNING state checks Move the PRE_COPY and RUNNING state checks to helper functions. This is in preparation for adding P2P VFIO migration support, where these helpers will also test for PRE_COPY_P2P and RUNNING_P2P states. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Tested-by: YangHang Liu <yanghliu@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-09-11 08:34:05 +02:00
Avihai Horon	5485298ce0	vfio/migration: Move from STOP_COPY to STOP in vfio_save_cleanup() Changing the device state from STOP_COPY to STOP can take time as the device may need to free resources and do other operations as part of the transition. Currently, this is done in vfio_save_complete_precopy() and therefore it is counted in the migration downtime. To avoid this, change the device state from STOP_COPY to STOP in vfio_save_cleanup(), which is called after migration has completed and thus is not part of migration downtime. Signed-off-by: Avihai Horon <avihaih@nvidia.com> Tested-by: YangHang Liu <yanghliu@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-09-11 08:34:05 +02:00
Alex Williamson	c00aac6f14	vfio/pci: Enable AtomicOps completers on root ports Dynamically enable Atomic Ops completer support around realize/exit of vfio-pci devices reporting host support for these accesses and adhering to a minimal configuration standard. While the Atomic Ops completer bits in the root port device capabilities2 register are read-only, the PCIe spec does allow RO bits to change to reflect hardware state. We take advantage of that here around the realize and exit functions of the vfio-pci device. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Robin Voetter <robin@streamhpc.com> Tested-by: Robin Voetter <robin@streamhpc.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-07-10 09:52:52 +02:00
Tony Krowiak	1360b2ad1f	s390x/ap: Wire up the device request notifier interface Let's wire up the device request notifier interface to handle device unplug requests for AP. Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Link: https://lore.kernel.org/qemu-devel/20230530225544.280031-1-akrowiak@linux.ibm.com/ Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-07-10 09:52:52 +02:00
Avihai Horon	8af87a3ec7	vfio: Fix null pointer dereference bug in vfio_bars_finalize() vfio_realize() has the following flow: 1. vfio_bars_prepare() -- sets VFIOBAR->size. 2. msix_early_setup(). 3. vfio_bars_register() -- allocates VFIOBAR->mr. After vfio_bars_prepare() is called msix_early_setup() can fail. If it does fail, vfio_bars_register() is never called and VFIOBAR->mr is not allocated. In this case, vfio_bars_finalize() is called as part of the error flow to free the bars' resources. However, vfio_bars_finalize() calls object_unparent() for VFIOBAR->mr after checking only VFIOBAR->size, and thus we get a null pointer dereference. Fix it by checking VFIOBAR->mr in vfio_bars_finalize(). Fixes: `89d5202edc` ("vfio/pci: Allow relocating MSI-X MMIO") Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-07-10 09:52:52 +02:00
Zhenzhong Duan	d4a2af747d	vfio/migration: Return bool type for vfio_migration_realize() Make vfio_migration_realize() adhere to the convention of other realize() callbacks(like qdev_realize) by returning bool instead of int. Suggested-by: Cédric Le Goater <clg@redhat.com> Suggested-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-07-10 09:52:52 +02:00
Zhenzhong Duan	0520d63c77	vfio/migration: Remove print of "Migration disabled" Property enable_migration supports [on/off/auto]. In ON mode, error pointer is passed to errp and logged. In OFF mode, we doesn't need to log "Migration disabled" as it's intentional. In AUTO mode, we should only ever see errors or warnings if the device supports migration and an error or incompatibility occurs while further probing or configuring it. Lack of support for migration shoundn't generate an error or warning. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-07-10 09:52:52 +02:00
Zhenzhong Duan	2b43b2995b	vfio/migration: Free resources when vfio_migration_realize fails When vfio_realize() succeeds, hot unplug will call vfio_exitfn() to free resources allocated in vfio_realize(); when vfio_realize() fails, vfio_exitfn() is never called and we need to free resources in vfio_realize(). In the case that vfio_migration_realize() fails, e.g: with -only-migratable & enable-migration=off, we see below: (qemu) device_add vfio-pci,host=81:11.1,id=vfio1,bus=root1,enable-migration=off 0000:81:11.1: Migration disabled Error: disallowing migration blocker (--only-migratable) for: 0000:81:11.1: Migration is disabled for VFIO device If we hotplug again we should see same log as above, but we see: (qemu) device_add vfio-pci,host=81:11.1,id=vfio1,bus=root1,enable-migration=off Error: vfio 0000:81:11.1: device is already attached That's because some references to VFIO device isn't released. For resources allocated in vfio_migration_realize(), free them by jumping to out_deinit path with calling a new function vfio_migration_deinit(). For resources allocated in vfio_realize(), free them by jumping to de-register path in vfio_realize(). Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Fixes: `a22651053b` ("vfio: Make vfio-pci device migration capable") Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-07-10 09:52:52 +02:00
Zhenzhong Duan	3c26c80a0a	vfio/migration: Change vIOMMU blocker from global to per device Contrary to multiple device blocker which needs to consider already-attached devices to unblock/block dynamically, the vIOMMU migration blocker is a device specific config. Meaning it only needs to know whether the device is bypassing or not the vIOMMU (via machine property, or per pxb-pcie::bypass_iommu), and does not need the state of currently present devices. For this reason, the vIOMMU global migration blocker can be consolidated into the per-device migration blocker, allowing us to remove some unnecessary code. This change also makes vfio_mig_active() more accurate as it doesn't check for global blocker. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-07-10 09:52:52 +02:00
Zhenzhong Duan	adee0da036	vfio/pci: Disable INTx in vfio_realize error path When vfio realize fails, INTx isn't disabled if it has been enabled. This may confuse host side with unhandled interrupt report. Fixes: `c5478fea27` ("vfio/pci: Respond to KVM irqchip change notifier") Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-07-10 09:52:52 +02:00
Alex Williamson	0ddcb39c93	hw/vfio/pci-quirks: Sanitize capability pointer Coverity reports a tained scalar when traversing the capabilities chain (CID 1516589). In practice I've never seen a device with a chain so broken as to cause an issue, but it's also pretty easy to sanitize. Fixes: `f6b30c1984` ("hw/vfio/pci-quirks: Support alternate offset for GPUDirect Cliques") Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-07-10 09:52:52 +02:00
Zhenzhong Duan	0cc889c882	vfio/pci: Free leaked timer in vfio_realize error path When vfio_realize fails, the mmap_timer used for INTx optimization isn't freed. As this timer isn't activated yet, the potential impact is just a piece of leaked memory. Fixes: `ea486926b0` ("vfio-pci: Update slow path INTx algorithm timer related") Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-06-30 06:02:51 +02:00
Zhenzhong Duan	357bd7932a	vfio/pci: Fix a segfault in vfio_realize The kvm irqchip notifier is only registered if the device supports INTx, however it's unconditionally removed in vfio realize error path. If the assigned device does not support INTx, this will cause QEMU to crash when vfio realize fails. Change it to conditionally remove the notifier only if the notify hook is setup. Before fix: (qemu) device_add vfio-pci,host=81:11.1,id=vfio1,bus=root1,xres=1 Connection closed by foreign host. After fix: (qemu) device_add vfio-pci,host=81:11.1,id=vfio1,bus=root1,xres=1 Error: vfio 0000:81:11.1: xres and yres properties require display=on (qemu) Fixes: `c5478fea27` ("vfio/pci: Respond to KVM irqchip change notifier") Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-06-30 06:02:51 +02:00
Avihai Horon	8bbcb64a71	vfio/migration: Make VFIO migration non-experimental The major parts of VFIO migration are supported today in QEMU. This includes basic VFIO migration, device dirty page tracking and precopy support. Thus, at this point in time, it seems appropriate to make VFIO migration non-experimental: remove the x prefix from enable_migration property, change it to ON_OFF_AUTO and let the default value be AUTO. In addition, make the following adjustments: 1. When enable_migration is ON and migration is not supported, fail VFIO device realization. 2. When enable_migration is AUTO (i.e., not explicitly enabled), require device dirty tracking support. This is because device dirty tracking is currently the only method to do dirty page tracking, which is essential for migrating in a reasonable downtime. Setting enable_migration to ON will not require device dirty tracking. 3. Make migration error and blocker messages more elaborate. 4. Remove error prints in vfio_migration_query_flags(). 5. Rename trace_vfio_migration_probe() to trace_vfio_migration_realize(). Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-06-30 06:02:51 +02:00
Avihai Horon	808642a2f6	vfio/migration: Reset bytes_transferred properly Currently, VFIO bytes_transferred is not reset properly: 1. bytes_transferred is not reset after a VM snapshot (so a migration following a snapshot will report incorrect value). 2. bytes_transferred is a single counter for all VFIO devices, however upon migration failure it is reset multiple times, by each VFIO device. Fix it by introducing a new function vfio_reset_bytes_transferred() and calling it during migration and snapshot start. Remove existing bytes_transferred reset in VFIO migration state notifier, which is not needed anymore. Fixes: `3710586caa` ("qapi: Add VFIO devices migration stats in Migration stats") Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-06-30 06:02:51 +02:00
Shameer Kolothum	c174088923	vfio/pci: Call vfio_prepare_kvm_msi_virq_batch() in MSI retry path When vfio_enable_vectors() returns with less than requested nr_vectors we retry with what kernel reported back. But the retry path doesn't call vfio_prepare_kvm_msi_virq_batch() and this results in, qemu-system-aarch64: vfio: Error: Failed to enable 4 MSI vectors, retry with 1 qemu-system-aarch64: ../hw/vfio/pci.c:602: vfio_commit_kvm_msi_virq_batch: Assertion `vdev->defer_kvm_irq_routing' failed Fixes: `dc580d51f7` ("vfio: defer to commit kvm irq routing when enable msi/msix") Reviewed-by: Longpeng <longpeng2@huawei.com> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-06-30 06:02:51 +02:00
Alex Williamson	f6b30c1984	hw/vfio/pci-quirks: Support alternate offset for GPUDirect Cliques NVIDIA Turing and newer GPUs implement the MSI-X capability at the offset previously reserved for use by hypervisors to implement the GPUDirect Cliques capability. A revised specification provides an alternate location. Add a config space walk to the quirk to check for conflicts, allowing us to fall back to the new location or generate an error at the quirk setup rather than when the real conflicting capability is added should there be no available location. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-06-30 06:02:51 +02:00
Alex Williamson	634f38f0f7	vfio: Implement a common device info helper A common helper implementing the realloc algorithm for handling capabilities. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Robin Voetter <robin@streamhpc.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-06-30 06:02:51 +02:00
Avihai Horon	745c42912a	vfio/migration: Add support for switchover ack capability Loading of a VFIO device's data can take a substantial amount of time as the device may need to allocate resources, prepare internal data structures, etc. This can increase migration downtime, especially for VFIO devices with a lot of resources. To solve this, VFIO migration uAPI defines "initial bytes" as part of its precopy data stream. Initial bytes can be used in various ways to improve VFIO migration performance. For example, it can be used to transfer device metadata to pre-allocate resources in the destination. However, for this to work we need to make sure that all initial bytes are sent and loaded in the destination before the source VM is stopped. Use migration switchover ack capability to make sure a VFIO device's initial bytes are sent and loaded in the destination before the source stops the VM and attempts to complete the migration. This can significantly reduce migration downtime for some devices. Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Tested-by: YangHang Liu <yanghliu@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-06-30 06:02:51 +02:00
Avihai Horon	eda7362af9	vfio/migration: Add VFIO migration pre-copy support Pre-copy support allows the VFIO device data to be transferred while the VM is running. This helps to accommodate VFIO devices that have a large amount of data that needs to be transferred, and it can reduce migration downtime. Pre-copy support is optional in VFIO migration protocol v2. Implement pre-copy of VFIO migration protocol v2 and use it for devices that support it. Full description of it can be found in the following Linux commit: 4db52602a607 ("vfio: Extend the device migration protocol with PRE_COPY"). Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Tested-by: YangHang Liu <yanghliu@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-06-30 06:02:51 +02:00
Avihai Horon	6cd1fe1159	vfio/migration: Store VFIO migration flags in VFIOMigration VFIO migration flags are queried once in vfio_migration_init(). Store them in VFIOMigration so they can be used later to check the device's migration capabilities without re-querying them. This will be used in the next patch to check if the device supports precopy migration. Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Tested-by: YangHang Liu <yanghliu@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-06-30 06:02:51 +02:00
Avihai Horon	cf53efbbda	vfio/migration: Refactor vfio_save_block() to return saved data size Refactor vfio_save_block() to return the size of saved data on success and -errno on error. This will be used in next patch to implement VFIO migration pre-copy support. Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Tested-by: YangHang Liu <yanghliu@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-06-30 06:02:51 +02:00
Joao Martins	6fe4f6c941	hw/vfio: Add number of dirty pages to vfio_get_dirty_bitmap tracepoint Include the number of dirty pages on the vfio_get_dirty_bitmap tracepoint. These are fetched from the newly added return value in cpu_physical_memory_set_dirty_lebitmap(). Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Acked-by: Alex Williamson <alex.williamson@redhat.com> Message-Id: <20230530180556.24441-3-joao.m.martins@oracle.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>	2023-06-13 11:28:58 +02:00
Zhenzhong Duan	b83b40b614	vfio/pci: Fix a use-after-free issue vbasedev->name is freed wrongly which leads to garbage VFIO trace log. Fix it by allocating a dup of vbasedev->name and then free the dup. Fixes: `2dca1b37a7` ("vfio/pci: add support for VF token") Suggested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-05-24 09:21:22 +02:00
Alex Williamson	b5048a4cbf	vfio/pci: Static Resizable BAR capability The PCI Resizable BAR (ReBAR) capability is currently hidden from the VM because the protocol for interacting with the capability does not support a mechanism for the device to reject an advertised supported BAR size. However, when assigned to a VM, the act of resizing the BAR requires adjustment of host resources for the device, which absolutely can fail. Linux does not currently allow us to reserve resources for the device independent of the current usage. The only writable field within the ReBAR capability is the BAR Size register. The PCIe spec indicates that when written, the device should immediately begin to operate with the provided BAR size. The spec however also notes that software must only write values corresponding to supported sizes as indicated in the capability and control registers. Writing unsupported sizes produces undefined results. Therefore, if the hypervisor were to virtualize the capability and control registers such that the current size is the only indicated available size, then a write of anything other than the current size falls into the category of undefined behavior, where we can essentially expose the modified ReBAR capability as read-only. This may seem pointless, but users have reported that virtualizing the capability in this way not only allows guest software to expose related features as available (even if only cosmetic), but in some scenarios can resolve guest driver issues. Additionally, no regressions in behavior have been reported for this change. A caveat here is that the PCIe spec requires for compatibility that devices report support for a size in the range of 1MB to 512GB, therefore if the current BAR size falls outside that range we revert to hiding the capability. Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/r/20230505232308.2869912-1-alex.williamson@redhat.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2023-05-09 09:30:13 -06:00
Avihai Horon	ff180c6bd7	vfio/migration: Skip log_sync during migration SETUP state Currently, VFIO log_sync can be issued while migration is in SETUP state. However, doing this log_sync is at best redundant and at worst can fail. Redundant -- all RAM is marked dirty in migration SETUP state and is transferred only after migration is set to ACTIVE state, so doing log_sync during migration SETUP is pointless. Can fail -- there is a time window, between setting migration state to SETUP and starting dirty tracking by RAM save_live_setup handler, during which dirty tracking is still not started. Any VFIO log_sync call that is issued during this time window will fail. For example, this error can be triggered by migrating a VM when a GUI is active, which constantly calls log_sync. Fix it by skipping VFIO log_sync while migration is in SETUP state. Fixes: `758b96b61d` ("vfio/migrate: Move switch of dirty tracking into vfio_memory_listener") Signed-off-by: Avihai Horon <avihaih@nvidia.com> Link: https://lore.kernel.org/r/20230403130000.6422-1-avihaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2023-05-09 09:30:13 -06:00
Minwoo Im	2dca1b37a7	vfio/pci: add support for VF token VF token was introduced [1] to kernel vfio-pci along with SR-IOV support [2]. This patch adds support VF token among PF and VF(s). To passthu PCIe VF to a VM, kernel >= v5.7 needs this. It can be configured with UUID like: -device vfio-pci,host=DDDD:BB:DD:F,vf-token=<uuid>,... [1] https://lore.kernel.org/linux-pci/158396393244.5601.10297430724964025753.stgit@gimli.home/ [2] https://lore.kernel.org/linux-pci/158396044753.5601.14804870681174789709.stgit@gimli.home/ Cc: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Minwoo Im <minwoo.im@samsung.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Link: https://lore.kernel.org/r/20230320073522epcms2p48f682ecdb73e0ae1a4850ad0712fd780@epcms2p4 Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2023-05-09 09:30:13 -06:00
Richard Henderson	cc37d98bfb	*: Add missing includes of qemu/error-report.h This had been pulled in via qemu/plugin.h from hw/core/cpu.h, but that will be removed. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20230310195252.210956-5-richard.henderson@linaro.org> [AJB: add various additional cases shown by CI] Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20230315174331.2959-15-alex.bennee@linaro.org> Reviewed-by: Emilio Cota <cota@braap.org>	2023-03-22 15:06:57 +00:00
Cédric Le Goater	969dae5448	vfio: Fix vfio_get_dev_region() trace event Simply transpose 'x8' to fix the typo and remove the ending '8' Fixes: `e61a424f05` ("vfio: Create device specific region info helper") Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1526 Signed-off-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Link: https://lore.kernel.org/r/20230303074330.2609377-1-clg@kaod.org [aw: commit log s/revert/transpose/] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2023-03-07 11:19:07 -07:00
Alex Williamson	8249cffc62	vfio/migration: Rename entry points Pick names that align with the section drivers should use them from, avoiding the confusion of calling a _finalize() function from _exit() and generalizing the actual _finalize() to handle removing the viommu blocker. Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Link: https://lore.kernel.org/r/167820912978.606734.12740287349119694623.stgit@omen Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2023-03-07 11:19:07 -07:00
Joao Martins	95b29658b6	vfio/migration: Query device dirty page tracking support Now that everything has been set up for device dirty page tracking, query the device for device dirty page tracking support. Signed-off-by: Avihai Horon <avihaih@nvidia.com> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/r/20230307125450.62409-15-joao.m.martins@oracle.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2023-03-07 10:21:22 -07:00
Joao Martins	e46883204c	vfio/migration: Block migration with vIOMMU Migrating with vIOMMU will require either tracking maximum IOMMU supported address space (e.g. 39/48 address width on Intel) or range-track current mappings and dirty track the new ones post starting dirty tracking. This will be done as a separate series, so add a live migration blocker until that is fixed. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/r/20230307125450.62409-14-joao.m.martins@oracle.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2023-03-07 10:21:22 -07:00
Joao Martins	b153402a89	vfio/common: Add device dirty page bitmap sync Add device dirty page bitmap sync functionality. This uses the device DMA logging uAPI to sync dirty page bitmap from the device. Device dirty page bitmap sync is used only if all devices within a container support device dirty page tracking. Signed-off-by: Avihai Horon <avihaih@nvidia.com> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/r/20230307125450.62409-13-joao.m.martins@oracle.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2023-03-07 10:21:22 -07:00
Avihai Horon	6607109f05	vfio/common: Extract code from vfio_get_dirty_bitmap() to new function Extract the VFIO_IOMMU_DIRTY_PAGES ioctl code in vfio_get_dirty_bitmap() to its own function. This will help the code to be more readable after next patch will add device dirty page bitmap sync functionality. Signed-off-by: Avihai Horon <avihaih@nvidia.com> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/r/20230307125450.62409-12-joao.m.martins@oracle.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2023-03-07 10:21:22 -07:00
Joao Martins	5255bbf4ec	vfio/common: Add device dirty page tracking start/stop Add device dirty page tracking start/stop functionality. This uses the device DMA logging uAPI to start and stop dirty page tracking by device. Device dirty page tracking is used only if all devices within a container support device dirty page tracking. Signed-off-by: Avihai Horon <avihaih@nvidia.com> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Link: https://lore.kernel.org/r/20230307125450.62409-11-joao.m.martins@oracle.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2023-03-07 10:21:11 -07:00
Joao Martins	62c1b0024b	vfio/common: Record DMA mapped IOVA ranges According to the device DMA logging uAPI, IOVA ranges to be logged by the device must be provided all at once upon DMA logging start. As preparation for the following patches which will add device dirty page tracking, keep a record of all DMA mapped IOVA ranges so later they can be used for DMA logging start. Signed-off-by: Avihai Horon <avihaih@nvidia.com> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/r/20230307125450.62409-10-joao.m.martins@oracle.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2023-03-07 07:20:32 -07:00
Joao Martins	4ead830848	vfio/common: Add helper to consolidate iova/end calculation In preparation to be used in device dirty tracking, move the code that calculate a iova/end range from the container/section. This avoids duplication on the common checks across listener callbacks. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/r/20230307125450.62409-9-joao.m.martins@oracle.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2023-03-07 07:20:32 -07:00
Joao Martins	b92f237635	vfio/common: Consolidate skip/invalid section into helper The checks are replicated against region_add and region_del and will be soon added in another memory listener dedicated for dirty tracking. Move these into a new helper for avoid duplication. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Avihai Horon <avihaih@nvidia.com> Link: https://lore.kernel.org/r/20230307125450.62409-8-joao.m.martins@oracle.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2023-03-07 07:20:32 -07:00
Joao Martins	1cd7fa7adc	vfio/common: Use a single tracepoint for skipped sections In preparation to turn more of the memory listener checks into common functions, one of the affected places is how we trace when sections are skipped. Right now there is one for each. Change it into one single tracepoint `vfio_listener_region_skip` which receives a name which refers to the callback i.e. region_add and region_del. Suggested-by: Avihai Horon <avihaih@nvidia.com> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/r/20230307125450.62409-7-joao.m.martins@oracle.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2023-03-07 07:20:32 -07:00
Joao Martins	fbc6c92134	vfio/common: Add helper to validate iova/end against hostwin Move the code that finds the container host DMA window against a iova range. This avoids duplication on the common checks across listener callbacks. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Avihai Horon <avihaih@nvidia.com> Link: https://lore.kernel.org/r/20230307125450.62409-6-joao.m.martins@oracle.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2023-03-07 07:20:32 -07:00
Avihai Horon	725ccd7e41	vfio/common: Add VFIOBitmap and alloc function There are already two places where dirty page bitmap allocation and calculations are done in open code. To avoid code duplication, introduce VFIOBitmap struct and corresponding alloc function and use them where applicable. Signed-off-by: Avihai Horon <avihaih@nvidia.com> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/r/20230307125450.62409-5-joao.m.martins@oracle.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2023-03-07 07:20:32 -07:00
Avihai Horon	236e0a45f5	vfio/common: Abort migration if dirty log start/stop/sync fails If VFIO dirty pages log start/stop/sync fails during migration, migration should be aborted as pages dirtied by VFIO devices might not be reported properly. This is not the case today, where in such scenario only an error is printed. Fix it by aborting migration in the above scenario. Fixes: `758b96b61d` ("vfio/migrate: Move switch of dirty tracking into vfio_memory_listener") Fixes: `b6dd6504e3` ("vfio: Add vfio_listener_log_sync to mark dirty pages") Fixes: `9e7b0442f2` ("vfio: Add ioctl to get dirty pages bitmap during dma unmap") Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/r/20230307125450.62409-4-joao.m.martins@oracle.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2023-03-07 07:20:32 -07:00
Avihai Horon	db9b829b15	vfio/common: Fix wrong %m usages There are several places where the %m conversion is used if one of vfio_dma_map(), vfio_dma_unmap() or vfio_get_dirty_bitmap() fail. The %m usage in these places is wrong since %m relies on errno value while the above functions don't report errors via errno. Fix it by using strerror() with the returned value instead. Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/r/20230307125450.62409-3-joao.m.martins@oracle.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2023-03-07 07:20:32 -07:00

1 2 3 4 5 ...

552 Commits