qemu-e2k

Author	SHA1	Message	Date
Peter Xu	d7bb469afa	intel_iommu: reset intr_enabled when system reset This is found when I was debugging another problem. Until now no bug is reported with this but we'd better reset the IR status correctly after a system reset. Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2019-02-05 10:58:33 -05:00
Peter Xu	2a078b1080	intel_iommu: fix operator in vtd_switch_address_space When calculating use_iommu, we wanted to first detect whether DMAR is enabled, then check whether PT is enabled if DMAR is enabled. However in the current code we used "&" rather than "&&" so the ordering requirement is lost (instead it'll be an "AND" operation). This could introduce errors dumped in QEMU console when rebooting a guest with both assigned device and vIOMMU, like: qemu-system-x86_64: vtd_dev_to_context_entry: invalid root entry: rsvd=0xf000ff53f000e2c3, val=0xf000ff53f000ff53 (reserved nonzero) Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2019-02-05 10:58:33 -05:00
Peter Xu	a924b3d8df	x86-iommu: switch intr_supported to OnOffAuto type Switch the intr_supported variable from a boolean to OnOffAuto type so that we can know whether the user specified it or not. With that we'll have a chance to help the user to choose more wisely where possible. Introduce x86_iommu_ir_supported() to mask these changes. No functional change at all. Signed-off-by: Peter Xu <peterx@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2018-12-20 13:25:11 -05:00
Peter Xu	4b49b586c4	intel_iommu: remove "x-" prefix for "aw-bits" We're going to have 57bits aw-bits support sooner. It's possibly time to remove the "x-" prefix. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2018-12-19 16:48:16 -05:00
Peter Xu	ccc23bb08a	intel_iommu: dma read/write draining support Support DMA read/write draining should be easy for existing VT-d emulation since the emulation itself does not have any request queue there so we don't need to do anything to flush the un-commited queue. What we need to do is to declare the support. These capabilities are required to pass Windows SVVP test program. It is verified that when with parameters "x-aw-bits=48,caching-mode=off" we can pass the Windows SVVP test with this patch applied. Otherwise we'll fail with: IOMMU[0] - DWD (DMA write draining) not supported IOMMU[0] - DWD (DMA read draining) not supported Segment 0 has no DMA remapping capable IOMMU units However since these bits are not declared support for QEMU<=3.1, we'll need a compatibility bit for it and we turn this on by default only for QEMU>=4.0. Please refer to VT-d spec 6.5.4 for more information. CC: Yu Wang <wyu@redhat.com> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1654550 Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2018-12-19 16:48:16 -05:00
Peter Xu	095955b24d	intel_iommu: convert invalid traces into error reports Report more *_invalid() tracepoints to error_report_once() so that we can detect issues even without tracing enabled. Drop those tracepoints. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2018-12-19 16:48:16 -05:00
Peter Xu	662b4b69ba	intel_iommu: dump correct iova when failed The iotlb.iova can be zero if failure really happened. Dump the addr instead. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2018-12-19 16:48:16 -05:00
Singh, Brijesh	35c2450191	x86_iommu: move vtd_generate_msi_message in common file The vtd_generate_msi_message() in intel-iommu is used to construct a MSI Message from IRQ. A similar function will be needed when we add interrupt remapping support in amd-iommu. Moving the function in common file to avoid the code duplication. Rename it to x86_iommu_irq_to_msi_message(). There is no logic changes in the code flow. Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Suggested-by: Peter Xu <peterx@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Cc: Tom Lendacky <Thomas.Lendacky@amd.com> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2018-11-05 13:24:02 -05:00
Singh, Brijesh	50662ce16d	x86_iommu: move the kernel-irqchip check in common code Interrupt remapping needs kernel-irqchip={off\|split} on both Intel and AMD platforms. Move the check in common place. Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Reviewed-by: Peter Xu <peterx@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Cc: Tom Lendacky <Thomas.Lendacky@amd.com> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2018-11-05 13:24:02 -05:00
Peter Xu	c28b535d08	intel_iommu: handle invalid ce for shadow sync We should handle VTD_FR_CONTEXT_ENTRY_P properly when synchronizing shadow page tables. Having invalid context entry there is perfectly valid when we move a device out of an existing domain. When that happens, instead of posting an error we invalidate the whole region. Without this patch, QEMU will crash if we do these steps: (1) start QEMU with VT-d IOMMU and two 10G NICs (ixgbe) (2) bind the NICs with vfio-pci in the guest (3) start testpmd with the NICs applied (4) stop testpmd (5) rebind the NIC back to ixgbe kernel driver The patch should fix it. Reported-by: Pei Zhang <pezhang@redhat.com> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1627272 Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2018-11-05 13:24:02 -05:00
Peter Xu	95ecd3df78	intel_iommu: move ce fetching out when sync shadow There are two callers for vtd_sync_shadow_page_table_range(): one provided a valid context entry and one not. Move that fetching operation into the caller vtd_sync_shadow_page_table() where we need to fetch the context entry. Meanwhile, remove the error_report_once() directly since we're already tracing all the error cases in the previous call. Instead, return error number back to caller. This will not change anything functional since callers are dropping it after all. We do this move majorly because we want to do something more later in vtd_sync_shadow_page_table(). Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2018-11-05 13:24:02 -05:00
Peter Xu	2cc9ddcceb	intel_iommu: better handling of dmar state switch QEMU is not handling the global DMAR switch well, especially when from "on" to "off". Let's first take the example of system reset. Assuming that a guest has IOMMU enabled. When it reboots, we will drop all the existing DMAR mappings to handle the system reset, however we'll still keep the existing memory layouts which has the IOMMU memory region enabled. So after the reboot and before the kernel reloads again, there will be no mapping at all for the host device. That's problematic since any software (for example, SeaBIOS) that runs earlier than the kernel after the reboot will assume the IOMMU is disabled, so any DMA from the software will fail. For example, a guest that boots on an assigned NVMe device might fail to find the boot device after a system reboot/reset and we'll be able to observe SeaBIOS errors if we capture the debugging log: WARNING - Timeout at nvme_wait:144! Meanwhile, we should see DMAR errors on the host of that NVMe device. It's the DMA fault that caused a NVMe driver timeout. The correct fix should be that we do proper switching of device DMA address spaces when system resets, which will setup correct memory regions and notify the backend of the devices. This might not affect much on non-assigned devices since QEMU VT-d emulation will assume a default passthrough mapping if DMAR is not enabled in the GCMD register (please refer to vtd_iommu_translate). However that's required for an assigned devices, since that'll rebuild the correct GPA to HPA mapping that is needed for any DMA operation during guest bootstrap. Besides the system reset, we have some other places that might change the global DMAR status and we'd better do the same thing there. For example, when we change the state of GCMD register, or the DMAR root pointer. Do the same refresh for all these places. For these two places we'll also need to explicitly invalidate the context entry cache and iotlb cache. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1625173 CC: QEMU Stable <qemu-stable@nongnu.org> Reported-by: Cong Li <coli@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> -- v2: - do the same for GCMD write, or root pointer update [Alex] - test is carried out by me this time, by observing the vtd_switch_address_space tracepoint after system reboot v3: - rewrite commit message as suggested by Alex Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2018-11-05 13:24:02 -05:00
Peter Xu	06aba4ca52	intel_iommu: introduce vtd_reset_caches() Provide the function and use it in vtd_init(). Used to reset both context entry cache and iotlb cache for the whole IOMMU unit. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2018-11-05 13:24:02 -05:00
Peter Xu	4e4abd111a	intel-iommu: replace more vtd_err_* traces Replace all the trace_vtd_err_*() hooks with the new error_report_once() since they are similar to trace_vtd_err() - dumping the first error would be mostly enough, then we have them on by default too. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180815095328.32414-4-peterx@redhat.com> [Use "%x" instead of "%" PRIx16 to print uint16_t, whitespace tidied up] Signed-off-by: Markus Armbruster <armbru@redhat.com>	2018-08-27 15:09:20 +02:00
Peter Xu	1376211f77	intel-iommu: start to use error_report_once Replace existing trace_vtd_err() with error_report_once() then stderr will capture something if any of the error happens, meanwhile we don't suffer from any DDOS. Then remove the trace point. Since at it, provide more information where proper (now we can pass parameters into the report function). Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180815095328.32414-3-peterx@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> [Two format strings fixed, whitespace tidied up] Signed-off-by: Markus Armbruster <armbru@redhat.com>	2018-08-27 15:00:45 +02:00
Peter Maydell	2c91bcf273	iommu: Add IOMMU index argument to translate method Add an IOMMU index argument to the translate method of IOMMUs. Since all of our current IOMMU implementations support only a single IOMMU index, this has no effect on the behaviour. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-id: 20180604152941.20374-4-peter.maydell@linaro.org	2018-06-15 15:23:34 +01:00
Peter Maydell	cb1efcf462	iommu: Add IOMMU index argument to notifier APIs Add support for multiple IOMMU indexes to the IOMMU notifier APIs. When initializing a notifier with iommu_notifier_init(), the caller must pass the IOMMU index that it is interested in. When a change happens, the IOMMU implementation must pass memory_region_notify_iommu() the IOMMU index that has changed and that notifiers must be called for. IOMMUs which support only a single index don't need to change. Callers which only really support working with IOMMUs with a single index can use the result of passing MEMTXATTRS_UNSPECIFIED to memory_region_iommu_attrs_to_index(). Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-id: 20180604152941.20374-3-peter.maydell@linaro.org	2018-06-15 15:23:34 +01:00
Peter Xu	63b88968f1	intel-iommu: rework the page walk logic This patch fixes a potential small window that the DMA page table might be incomplete or invalid when the guest sends domain/context invalidations to a device. This can cause random DMA errors for assigned devices. This is a major change to the VT-d shadow page walking logic. It includes but is not limited to: - For each VTDAddressSpace, now we maintain what IOVA ranges we have mapped and what we have not. With that information, now we only send MAP or UNMAP when necessary. Say, we don't send MAP notifies if we know we have already mapped the range, meanwhile we don't send UNMAP notifies if we know we never mapped the range at all. - Introduce vtd_sync_shadow_page_table[_range] APIs so that we can call in any places to resync the shadow page table for a device. - When we receive domain/context invalidation, we should not really run the replay logic, instead we use the new sync shadow page table API to resync the whole shadow page table without unmapping the whole region. After this change, we'll only do the page walk once for each domain invalidations (before this, it can be multiple, depending on number of notifiers per address space). While at it, the page walking logic is also refactored to be simpler. CC: QEMU Stable <qemu-stable@nongnu.org> Reported-by: Jintack Lim <jintack@cs.columbia.edu> Tested-by: Jintack Lim <jintack@cs.columbia.edu> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2018-05-23 17:34:05 +03:00
Peter Xu	d118c06ebb	intel-iommu: trace domain id during page walk This patch only modifies the trace points. Previously we were tracing page walk levels. They are redundant since we have page mask (size) already. Now we trace something much more useful which is the domain ID of the page walking. That can be very useful when we trace more than one devices on the same system, so that we can know which map is for which domain. CC: QEMU Stable <qemu-stable@nongnu.org> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2018-05-23 17:33:58 +03:00
Peter Xu	2f764fa87d	intel-iommu: pass in address space when page walk We pass in the VTDAddressSpace too. It'll be used in the follow up patches. CC: QEMU Stable <qemu-stable@nongnu.org> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2018-05-23 17:33:58 +03:00
Peter Xu	fe215b0cbb	intel-iommu: introduce vtd_page_walk_info During the recursive page walking of IOVA page tables, some stack variables are constant variables and never changed during the whole page walking procedure. Isolate them into a struct so that we don't need to pass those contants down the stack every time and multiple times. CC: QEMU Stable <qemu-stable@nongnu.org> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2018-05-23 17:33:58 +03:00
Peter Xu	4f8a62a933	intel-iommu: only do page walk for MAP notifiers For UNMAP-only IOMMU notifiers, we don't need to walk the page tables. Fasten that procedure by skipping the page table walk. That should boost performance for UNMAP-only notifiers like vhost. CC: QEMU Stable <qemu-stable@nongnu.org> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2018-05-23 17:33:57 +03:00
Peter Xu	1d9efa73e1	intel-iommu: add iommu lock SECURITY IMPLICATION: this patch fixes a potential race when multiple threads access the IOMMU IOTLB cache. Add a per-iommu big lock to protect IOMMU status. Currently the only thing to be protected is the IOTLB/context cache, since that can be accessed even without BQL, e.g., in IO dataplane. Note that we don't need to protect device page tables since that's fully controlled by the guest kernel. However there is still possibility that malicious drivers will program the device to not obey the rule. In that case QEMU can't really do anything useful, instead the guest itself will be responsible for all uncertainties. CC: QEMU Stable <qemu-stable@nongnu.org> Reported-by: Fam Zheng <famz@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2018-05-23 17:33:57 +03:00
Peter Xu	b4a4ba0d68	intel-iommu: remove IntelIOMMUNotifierNode That is not really necessary. Removing that node struct and put the list entry directly into VTDAddressSpace. It simplfies the code a lot. Since at it, rename the old notifiers_list into vtd_as_with_notifiers. CC: QEMU Stable <qemu-stable@nongnu.org> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2018-05-23 17:33:57 +03:00
Peter Xu	36d2d52bdb	intel-iommu: send PSI always even if across PDEs SECURITY IMPLICATION: without this patch, any guest with both assigned device and a vIOMMU might encounter stale IO page mappings even if guest has already unmapped the page, which may lead to guest memory corruption. The stale mappings will only be limited to the guest's own memory range, so it should not affect the host memory or other guests on the host. During IOVA page table walking, there is a special case when the PSI covers one whole PDE (Page Directory Entry, which contains 512 Page Table Entries) or more. In the past, we skip that entry and we don't notify the IOMMU notifiers. This is not correct. We should send UNMAP notification to registered UNMAP notifiers in this case. For UNMAP only notifiers, this might cause IOTLBs cached in the devices even if they were already invalid. For MAP/UNMAP notifiers like vfio-pci, this will cause stale page mappings. This special case doesn't trigger often, but it is very easy to be triggered by nested device assignments, since in that case we'll possibly map the whole L2 guest RAM region into the device's IOVA address space (several GBs at least), which is far bigger than normal kernel driver usages of the device (tens of MBs normally). Without this patch applied to L1 QEMU, nested device assignment to L2 guests will dump some errors like: qemu-system-x86_64: VFIO_MAP_DMA: -17 qemu-system-x86_64: vfio_dma_map(0x557305420c30, 0xad000, 0x1000, 0x7f89a920d000) = -17 (File exists) CC: QEMU Stable <qemu-stable@nongnu.org> Acked-by: Jason Wang <jasowang@redhat.com> [peterx: rewrite the commit message] Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2018-05-23 17:33:57 +03:00
Jan Kiszka	b7a7bb358f	intel-iommu: Accept 64-bit writes to FEADDR Xen is doing this [1] and currently triggers an abort. [1] http://xenbits.xenproject.org/gitweb/?p=xen.git;a=blob;f=xen/drivers/passthrough/vtd/iommu.c;h=daaed0abbdd06b6ba3d948ea103aadf02651e83c;hb=refs/heads/master#l1108 Reported-by: Luis Lloret <luis_lloret@mentor.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2018-03-01 16:25:37 +02:00
Mohammed Gamal	29396ed9ac	x86_iommu: Move machine check to x86_iommu_realize() Instead of having the same error checks in vtd_realize() and amdvi_realize(), move that over to the generic x86_iommu_realize(). Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Mohammed Gamal <mgamal@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com>	2018-01-18 21:52:38 +02:00
Prasad Singamsetty	37f51384ae	intel-iommu: Extend address width to 48 bits The current implementation of Intel IOMMU code only supports 39 bits iova address width. This patch provides a new parameter (x-aw-bits) for intel-iommu to extend its address width to 48 bits but keeping the default the same (39 bits). The reason for not changing the default is to avoid potential compatibility problems with live migration of intel-iommu enabled QEMU guest. The only valid values for 'x-aw-bits' parameter are 39 and 48. After enabling larger address width (48), we should be able to map larger iova addresses in the guest. For example, a QEMU guest that is configured with large memory ( >=1TB ). To check whether 48 bits aw is enabled, we can grep in the guest dmesg output with line: "DMAR: Host address width 48". Signed-off-by: Prasad Singamsetty <prasad.singamsety@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2018-01-18 21:52:38 +02:00
Prasad Singamsetty	92e5d85e83	intel-iommu: Redefine macros to enable supporting 48 bit address width The current implementation of Intel IOMMU code only supports 39 bits host/iova address width so number of macros use hard coded values based on that. This patch is to redefine them so they can be used with variable address widths. This patch doesn't add any new functionality but enables adding support for 48 bit address width. Signed-off-by: Prasad Singamsetty <prasad.singamsety@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2018-01-18 21:52:38 +02:00
Peter Xu	4c427a4cf3	intel_iommu: fix error param in string It should be caching-mode. It may confuse people when it pops up. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Liu, Yi L <yi.l.liu@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-12-22 01:42:03 +02:00
Peter Xu	bf33cc75ad	intel_iommu: remove X86_IOMMU_PCI_DEVFN_MAX We have PCI_DEVFN_MAX now. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Liu, Yi L <yi.l.liu@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-12-22 01:42:03 +02:00
Peter Xu	66a4a0318e	intel_iommu: fix missing BQL in pt fast path In vtd_switch_address_space() we did the memory region switch, however it's possible that the caller of it has not taken the BQL at all. Make sure we have it. CC: Paolo Bonzini <pbonzini@redhat.com> CC: Jason Wang <jasowang@redhat.com> CC: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-09-08 16:15:17 +03:00
Peter Xu	07f7b73398	intel_iommu: use access_flags for iotlb It was cached by read/write separately. Let's merge them. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-08-02 00:13:25 +03:00
Peter Xu	892721d91d	intel_iommu: fix iova for pt IOMMUTLBEntry.iova is returned incorrectly on one PT path (though mostly we cannot really trigger this path, even if we do, we are mostly disgarding this value, so it didn't break anything). Fix it by converting the VTD_PAGE_MASK into the correct definition VTD_PAGE_MASK_4K, then remove VTD_PAGE_MASK. Fixes: b93130 ("intel_iommu: cleanup vtd_{do_}iommu_translate()") Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-08-02 00:13:25 +03:00
Alexey Kardashevskiy	1221a47467	memory/iommu: introduce IOMMUMemoryRegionClass This finishes QOM'fication of IOMMUMemoryRegion by introducing a IOMMUMemoryRegionClass. This also provides a fastpath analog for IOMMU_MEMORY_REGION_GET_CLASS(). This makes IOMMUMemoryRegion an abstract class. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20170711035620.4232-3-aik@ozlabs.ru> Acked-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-07-14 12:04:41 +02:00
Alexey Kardashevskiy	3df9d74806	memory/iommu: QOM'fy IOMMU MemoryRegion This defines new QOM object - IOMMUMemoryRegion - with MemoryRegion as a parent. This moves IOMMU-related fields from MR to IOMMU MR. However to avoid dymanic QOM casting in fast path (address_space_translate, etc), this adds an @is_iommu boolean flag to MR and provides new helper to do simple cast to IOMMU MR - memory_region_get_iommu. The flag is set in the instance init callback. This defines memory_region_is_iommu as memory_region_get_iommu()!=NULL. This switches MemoryRegion to IOMMUMemoryRegion in most places except the ones where MemoryRegion may be an alias. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20170711035620.4232-2-aik@ozlabs.ru> Acked-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-07-14 12:04:41 +02:00
Peter Xu	552a1e01a4	intel_iommu: fix migration breakage on mr switch Migration is broken after the vfio integration work: qemu-kvm: AHCI: Failed to start FIS receive engine: bad FIS receive buffer address qemu-kvm: Failed to load ich9_ahci:ahci qemu-kvm: error while loading state for instance 0x0 of device '0000:00:1f.2/ich9_ahci' qemu-kvm: load of migration failed: Operation not permitted The problem is that vfio work introduced dynamic memory region switching (actually it is also used for future PT mode), and this memory region layout is not properly delivered to destination when migration happens. Solution is to rebuild the layout in post_load. Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1459906 Fixes: `558e0024` ("intel_iommu: allow dynamic switch of IOMMU region") Reviewed-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-07-03 22:29:49 +03:00
Ladi Prosek	8991c460be	intel_iommu: relax iq tail check on VTD_GCMD_QIE enable The VT-d spec (section 6.5.2) prescribes software to zero the Invalidation Queue Tail Register before enabling the VTD_GCMD_QIE Global Command Register bit. Windows Server 2012 R2 and possibly other older Windows versions violate the protocol and set a non-zero queue tail first, which in effect makes them crash early on boot with -device intel-iommu,intremap=on. This commit relaxes the check and instead of failing to enable VTD_GCMD_QIE with vtd_err_qi_enable, it behaves as if the tail register was set just after enabling VTD_GCMD_QIE (see vtd_handle_iqt_write). Signed-off-by: Ladi Prosek <lprosek@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-07-03 22:29:48 +03:00
Peter Xu	e7a3b91fdf	intel_iommu: cleanup vtd_interrupt_remap_msi() Move the memcpy upper into where needed, then share the trace so that we trace every correct remapping. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-06-16 18:44:56 +03:00
Peter Xu	b9313021f3	intel_iommu: cleanup vtd_{do_}iommu_translate() First, let vtd_do_iommu_translate() return a status, so that we explicitly knows whether error occured. Meanwhile, we make sure that IOMMUTLBEntry is filled in in that. Then, cleanup vtd_iommu_translate a bit. So even with PT we'll get a log now. Also, remove useless assignments. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-06-16 18:44:55 +03:00
Peter Xu	7feb51b709	intel_iommu: switching the rest DPRINTF to trace We have converted many of the DPRINTF() into traces. This patch does the last 100+ ones. To debug VT-d when error happens, let's try enable: -trace enable="vtd_err*" This should works just like the old GENERAL but of course better, since we don't need to recompile. Similar rules apply to the other modules. I was trying to make the prefix good enough for sub-module debugging. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-06-16 18:44:55 +03:00
Peter Xu	dbaabb25f4	intel_iommu: support passthrough (PT) Hardware support for VT-d device passthrough. Although current Linux can live with iommu=pt even without this, but this is faster than when using software passthrough. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Liu, Yi L <yi.l.liu@linux.intel.com> Reviewed-by: Jason Wang <jasowang@redhat.com>	2017-05-25 21:25:27 +03:00
Peter Xu	f80c98740e	intel_iommu: allow dev-iotlb context entry conditionally When device-iotlb is not specified, we should fail this check. A new function vtd_ce_type_check() is introduced. While I'm at it, clean up the vtd_dev_to_context_entry() a bit - replace many "else if" usage into direct if check. That'll make the logic more clear. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Jason Wang <jasowang@redhat.com>	2017-05-25 21:25:27 +03:00
Peter Xu	5a38cb5940	intel_iommu: use IOMMU_ACCESS_FLAG() We have that now, so why not use it. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Jason Wang <jasowang@redhat.com>	2017-05-25 21:25:27 +03:00
Peter Xu	127ff5c356	intel_iommu: provide vtd_ce_get_type() Helper to fetch VT-d context entry type. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Jason Wang <jasowang@redhat.com>	2017-05-25 21:25:27 +03:00
Peter Xu	8f7d7161dd	intel_iommu: renaming context entry helpers The old names are too long and less ordered. Let's start to use vtd_ce_*() as a pattern. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Jason Wang <jasowang@redhat.com>	2017-05-25 21:25:27 +03:00
Peter Xu	bf55b7afce	memory: tune last param of iommu_ops.translate() This patch converts the old "is_write" bool into IOMMUAccessFlags. The difference is that "is_write" can only express either read/write, but sometimes what we really want is "none" here (neither read nor write). Replay is an good example - during replay, we should not check any RW permission bits since thats not an actual IO at all. CC: Paolo Bonzini <pbonzini@redhat.com> CC: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Jason Wang <jasowang@redhat.com>	2017-05-25 21:25:27 +03:00
Stefan Hajnoczi	adb354dd1e	pci, virtio, vhost: fixes A bunch of fixes that missed the release. Most notably we are reverting shpc back to enabled by default state as guests uses that as an indicator that hotplug is supported (even though it's unused). Unfortunately we can't fix this on the stable branch since that would break migration. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJZHMOuAAoJECgfDbjSjVRp/5IH/3kOa7yV3KUi4QVbQV7WwBH3 LK+/jwIz4UhOZn+bS4qi+gjN6aFhNoBNDFmYsRTWKKdLMvZvkRBMDcv8DMIKeAyl kG/ispv8VI+GY/CRKnqzPm0FSulv8WPRryxkdGzK4oHiMv+4FpFR0v/n9NRHjwTA XNJ4k33IqBldXyZwwAzP5dT019EMvbn4bNrkLzlcF2w8mTWPf43eX/kIkRX0cAys 5IVTQVGEOwpnyV0jxJDP+aoVMrqv8xl88LLuRpTgWUo0UnxXL5/GZQOCCUN6DQ7M BOLmyyP9mT9k8iUI+fQsDxAtY7cL9torq+p985nQdH0nxmI3GCoufn9aJG0J9yc= =d34x -----END PGP SIGNATURE----- Merge remote-tracking branch 'mst/tags/for_upstream' into staging pci, virtio, vhost: fixes A bunch of fixes that missed the release. Most notably we are reverting shpc back to enabled by default state as guests uses that as an indicator that hotplug is supported (even though it's unused). Unfortunately we can't fix this on the stable branch since that would break migration. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # gpg: Signature made Wed 17 May 2017 10:42:06 PM BST # gpg: using RSA key 0x281F0DB8D28D5469 # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * mst/tags/for_upstream: exec: abstract address_space_do_translate() pci: deassert intx when pci device unrealize virtio: allow broken device to notify guest Revert "hw/pci: disable pci-bridge's shpc by default" acpi-defs: clean up open brace usage ACPI: don't call acpi_pcihp_device_plug_cb on xen iommu: Don't crash if machine is not PC_MACHINE pc: add 2.10 machine type pc/fwcfg: unbreak migration from qemu-2.5 and qemu-2.6 during firmware boot libvhost-user: fix crash when rings aren't ready hw/virtio: fix vhost user fails to startup when MQ hw/arm/virt: generate 64-bit addressable ACPI objects hw/acpi-defs: replace leading X with x_ in FADT field names Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-05-18 10:01:08 +01:00
Eduardo Habkost	8ab5700ca0	iommu: Remove FIXME comment about user_creatable=true amd-iommu and intel-iommu are really meant to be used with -device, so they need user_creatable=true. Remove the FIXME comment. Cc: Marcel Apfelbaum <marcel@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Reviewed-by: Marcel Apfelbaum <marcel@redhat.com> Acked-by: Marcel Apfelbaum <marcel@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20170503203604.31462-5-ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-05-17 10:37:01 -03:00
Eduardo Habkost	e4f4fb1eca	sysbus: Set user_creatable=false by default on TYPE_SYS_BUS_DEVICE commit `33cd52b5d7` unset cannot_instantiate_with_device_add_yet in TYPE_SYSBUS, making all sysbus devices appear on "-device help" and lack the "no-user" flag in "info qdm". To fix this, we can set user_creatable=false by default on TYPE_SYS_BUS_DEVICE, but this requires setting user_creatable=true explicitly on the sysbus devices that actually work with -device. Fortunately today we have just a few has_dynamic_sysbus=1 machines: virt, pc-q35-, ppce500, and spapr. virt, ppce500, and spapr have extra checks to ensure just a few device types can be instantiated: virt supports only TYPE_VFIO_CALXEDA_XGMAC, TYPE_VFIO_AMD_XGBE. * ppce500 supports only TYPE_ETSEC_COMMON. * spapr supports only TYPE_SPAPR_PCI_HOST_BRIDGE. This patch sets user_creatable=true explicitly on those 4 device classes. Now, the more complex cases: pc-q35-: q35 has no sysbus device whitelist yet (which is a separate bug). We are in the process of fixing it and building a sysbus whitelist on q35, but in the meantime we can fix the "-device help" and "info qdm" bugs mentioned above. Also, despite not being strictly necessary for fixing the q35 bug, reducing the list of user_creatable=true devices will help us be more confident when building the q35 whitelist. xen: We also have a hack at xen_set_dynamic_sysbus(), that sets has_dynamic_sysbus=true at runtime when using the Xen accelerator. This hack is only used to allow xen-backend devices to be dynamically plugged/unplugged. This means today we can use -device with the following 22 device types, that are the ones compiled into the qemu-system-x86_64 and qemu-system-i386 binaries: allwinner-ahci * amd-iommu * cfi.pflash01 * esp * fw_cfg_io * fw_cfg_mem * generic-sdhci * hpet * intel-iommu * ioapic * isabus-bridge * kvmclock * kvm-ioapic * kvmvapic * SUNW,fdtwo * sysbus-ahci * sysbus-fdc * sysbus-ohci * unimplemented-device * virtio-mmio * xen-backend * xen-sysdev This patch adds user_creatable=true explicitly to those devices, temporarily, just to keep 100% compatibility with existing behavior of q35. Subsequent patches will remove user_creatable=true from the devices that are really not meant to user-creatable on any machine, and remove the FIXME comment from the ones that are really supposed to be user-creatable. This is being done in separate patches because we still don't have an obvious list of devices that will be whitelisted by q35, and I would like to get each device reviewed individually. Cc: Alexander Graf <agraf@suse.de> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Alistair Francis <alistair.francis@xilinx.com> Cc: Beniamino Galvani <b.galvani@gmail.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Cornelia Huck <cornelia.huck@de.ibm.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: "Edgar E. Iglesias" <edgar.iglesias@gmail.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Frank Blaschka <frank.blaschka@de.ibm.com> Cc: Gabriel L. Somlo <somlo@cmu.edu> Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: John Snow <jsnow@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Laszlo Ersek <lersek@redhat.com> Cc: Marcel Apfelbaum <marcel@redhat.com> Cc: Markus Armbruster <armbru@redhat.com> Cc: Max Reitz <mreitz@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Pierre Morel <pmorel@linux.vnet.ibm.com> Cc: Prasad J Pandit <pjp@fedoraproject.org> Cc: qemu-arm@nongnu.org Cc: qemu-block@nongnu.org Cc: qemu-ppc@nongnu.org Cc: Richard Henderson <rth@twiddle.net> Cc: Rob Herring <robh@kernel.org> Cc: Shannon Zhao <zhaoshenglong@huawei.com> Cc: sstabellini@kernel.org Cc: Thomas Huth <thuth@redhat.com> Cc: Yi Min Zhao <zyimin@linux.vnet.ibm.com> Acked-by: John Snow <jsnow@redhat.com> Acked-by: Juergen Gross <jgross@suse.com> Acked-by: Marcel Apfelbaum <marcel@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20170503203604.31462-3-ehabkost@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> [ehabkost: Small changes at sysbus_device_class_init() comments] Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-05-17 10:37:01 -03:00

1 2 3

105 Commits