qemu-e2k

Author	SHA1	Message	Date
Mohammed Gamal	29396ed9ac	x86_iommu: Move machine check to x86_iommu_realize() Instead of having the same error checks in vtd_realize() and amdvi_realize(), move that over to the generic x86_iommu_realize(). Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Mohammed Gamal <mgamal@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com>	2018-01-18 21:52:38 +02:00
Prasad Singamsetty	37f51384ae	intel-iommu: Extend address width to 48 bits The current implementation of Intel IOMMU code only supports 39 bits iova address width. This patch provides a new parameter (x-aw-bits) for intel-iommu to extend its address width to 48 bits but keeping the default the same (39 bits). The reason for not changing the default is to avoid potential compatibility problems with live migration of intel-iommu enabled QEMU guest. The only valid values for 'x-aw-bits' parameter are 39 and 48. After enabling larger address width (48), we should be able to map larger iova addresses in the guest. For example, a QEMU guest that is configured with large memory ( >=1TB ). To check whether 48 bits aw is enabled, we can grep in the guest dmesg output with line: "DMAR: Host address width 48". Signed-off-by: Prasad Singamsetty <prasad.singamsety@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2018-01-18 21:52:38 +02:00
Prasad Singamsetty	92e5d85e83	intel-iommu: Redefine macros to enable supporting 48 bit address width The current implementation of Intel IOMMU code only supports 39 bits host/iova address width so number of macros use hard coded values based on that. This patch is to redefine them so they can be used with variable address widths. This patch doesn't add any new functionality but enables adding support for 48 bit address width. Signed-off-by: Prasad Singamsetty <prasad.singamsety@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2018-01-18 21:52:38 +02:00
Peter Xu	4c427a4cf3	intel_iommu: fix error param in string It should be caching-mode. It may confuse people when it pops up. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Liu, Yi L <yi.l.liu@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-12-22 01:42:03 +02:00
Peter Xu	bf33cc75ad	intel_iommu: remove X86_IOMMU_PCI_DEVFN_MAX We have PCI_DEVFN_MAX now. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Liu, Yi L <yi.l.liu@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-12-22 01:42:03 +02:00
Peter Xu	66a4a0318e	intel_iommu: fix missing BQL in pt fast path In vtd_switch_address_space() we did the memory region switch, however it's possible that the caller of it has not taken the BQL at all. Make sure we have it. CC: Paolo Bonzini <pbonzini@redhat.com> CC: Jason Wang <jasowang@redhat.com> CC: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-09-08 16:15:17 +03:00
Peter Xu	07f7b73398	intel_iommu: use access_flags for iotlb It was cached by read/write separately. Let's merge them. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-08-02 00:13:25 +03:00
Peter Xu	892721d91d	intel_iommu: fix iova for pt IOMMUTLBEntry.iova is returned incorrectly on one PT path (though mostly we cannot really trigger this path, even if we do, we are mostly disgarding this value, so it didn't break anything). Fix it by converting the VTD_PAGE_MASK into the correct definition VTD_PAGE_MASK_4K, then remove VTD_PAGE_MASK. Fixes: b93130 ("intel_iommu: cleanup vtd_{do_}iommu_translate()") Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-08-02 00:13:25 +03:00
Alexey Kardashevskiy	1221a47467	memory/iommu: introduce IOMMUMemoryRegionClass This finishes QOM'fication of IOMMUMemoryRegion by introducing a IOMMUMemoryRegionClass. This also provides a fastpath analog for IOMMU_MEMORY_REGION_GET_CLASS(). This makes IOMMUMemoryRegion an abstract class. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20170711035620.4232-3-aik@ozlabs.ru> Acked-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-07-14 12:04:41 +02:00
Alexey Kardashevskiy	3df9d74806	memory/iommu: QOM'fy IOMMU MemoryRegion This defines new QOM object - IOMMUMemoryRegion - with MemoryRegion as a parent. This moves IOMMU-related fields from MR to IOMMU MR. However to avoid dymanic QOM casting in fast path (address_space_translate, etc), this adds an @is_iommu boolean flag to MR and provides new helper to do simple cast to IOMMU MR - memory_region_get_iommu. The flag is set in the instance init callback. This defines memory_region_is_iommu as memory_region_get_iommu()!=NULL. This switches MemoryRegion to IOMMUMemoryRegion in most places except the ones where MemoryRegion may be an alias. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20170711035620.4232-2-aik@ozlabs.ru> Acked-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-07-14 12:04:41 +02:00
Peter Xu	552a1e01a4	intel_iommu: fix migration breakage on mr switch Migration is broken after the vfio integration work: qemu-kvm: AHCI: Failed to start FIS receive engine: bad FIS receive buffer address qemu-kvm: Failed to load ich9_ahci:ahci qemu-kvm: error while loading state for instance 0x0 of device '0000:00:1f.2/ich9_ahci' qemu-kvm: load of migration failed: Operation not permitted The problem is that vfio work introduced dynamic memory region switching (actually it is also used for future PT mode), and this memory region layout is not properly delivered to destination when migration happens. Solution is to rebuild the layout in post_load. Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1459906 Fixes: `558e0024` ("intel_iommu: allow dynamic switch of IOMMU region") Reviewed-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-07-03 22:29:49 +03:00
Ladi Prosek	8991c460be	intel_iommu: relax iq tail check on VTD_GCMD_QIE enable The VT-d spec (section 6.5.2) prescribes software to zero the Invalidation Queue Tail Register before enabling the VTD_GCMD_QIE Global Command Register bit. Windows Server 2012 R2 and possibly other older Windows versions violate the protocol and set a non-zero queue tail first, which in effect makes them crash early on boot with -device intel-iommu,intremap=on. This commit relaxes the check and instead of failing to enable VTD_GCMD_QIE with vtd_err_qi_enable, it behaves as if the tail register was set just after enabling VTD_GCMD_QIE (see vtd_handle_iqt_write). Signed-off-by: Ladi Prosek <lprosek@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-07-03 22:29:48 +03:00
Peter Xu	e7a3b91fdf	intel_iommu: cleanup vtd_interrupt_remap_msi() Move the memcpy upper into where needed, then share the trace so that we trace every correct remapping. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-06-16 18:44:56 +03:00
Peter Xu	b9313021f3	intel_iommu: cleanup vtd_{do_}iommu_translate() First, let vtd_do_iommu_translate() return a status, so that we explicitly knows whether error occured. Meanwhile, we make sure that IOMMUTLBEntry is filled in in that. Then, cleanup vtd_iommu_translate a bit. So even with PT we'll get a log now. Also, remove useless assignments. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-06-16 18:44:55 +03:00
Peter Xu	7feb51b709	intel_iommu: switching the rest DPRINTF to trace We have converted many of the DPRINTF() into traces. This patch does the last 100+ ones. To debug VT-d when error happens, let's try enable: -trace enable="vtd_err*" This should works just like the old GENERAL but of course better, since we don't need to recompile. Similar rules apply to the other modules. I was trying to make the prefix good enough for sub-module debugging. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-06-16 18:44:55 +03:00
Peter Xu	dbaabb25f4	intel_iommu: support passthrough (PT) Hardware support for VT-d device passthrough. Although current Linux can live with iommu=pt even without this, but this is faster than when using software passthrough. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Liu, Yi L <yi.l.liu@linux.intel.com> Reviewed-by: Jason Wang <jasowang@redhat.com>	2017-05-25 21:25:27 +03:00
Peter Xu	f80c98740e	intel_iommu: allow dev-iotlb context entry conditionally When device-iotlb is not specified, we should fail this check. A new function vtd_ce_type_check() is introduced. While I'm at it, clean up the vtd_dev_to_context_entry() a bit - replace many "else if" usage into direct if check. That'll make the logic more clear. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Jason Wang <jasowang@redhat.com>	2017-05-25 21:25:27 +03:00
Peter Xu	5a38cb5940	intel_iommu: use IOMMU_ACCESS_FLAG() We have that now, so why not use it. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Jason Wang <jasowang@redhat.com>	2017-05-25 21:25:27 +03:00
Peter Xu	127ff5c356	intel_iommu: provide vtd_ce_get_type() Helper to fetch VT-d context entry type. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Jason Wang <jasowang@redhat.com>	2017-05-25 21:25:27 +03:00
Peter Xu	8f7d7161dd	intel_iommu: renaming context entry helpers The old names are too long and less ordered. Let's start to use vtd_ce_*() as a pattern. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Jason Wang <jasowang@redhat.com>	2017-05-25 21:25:27 +03:00
Peter Xu	bf55b7afce	memory: tune last param of iommu_ops.translate() This patch converts the old "is_write" bool into IOMMUAccessFlags. The difference is that "is_write" can only express either read/write, but sometimes what we really want is "none" here (neither read nor write). Replay is an good example - during replay, we should not check any RW permission bits since thats not an actual IO at all. CC: Paolo Bonzini <pbonzini@redhat.com> CC: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Jason Wang <jasowang@redhat.com>	2017-05-25 21:25:27 +03:00
Stefan Hajnoczi	adb354dd1e	pci, virtio, vhost: fixes A bunch of fixes that missed the release. Most notably we are reverting shpc back to enabled by default state as guests uses that as an indicator that hotplug is supported (even though it's unused). Unfortunately we can't fix this on the stable branch since that would break migration. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJZHMOuAAoJECgfDbjSjVRp/5IH/3kOa7yV3KUi4QVbQV7WwBH3 LK+/jwIz4UhOZn+bS4qi+gjN6aFhNoBNDFmYsRTWKKdLMvZvkRBMDcv8DMIKeAyl kG/ispv8VI+GY/CRKnqzPm0FSulv8WPRryxkdGzK4oHiMv+4FpFR0v/n9NRHjwTA XNJ4k33IqBldXyZwwAzP5dT019EMvbn4bNrkLzlcF2w8mTWPf43eX/kIkRX0cAys 5IVTQVGEOwpnyV0jxJDP+aoVMrqv8xl88LLuRpTgWUo0UnxXL5/GZQOCCUN6DQ7M BOLmyyP9mT9k8iUI+fQsDxAtY7cL9torq+p985nQdH0nxmI3GCoufn9aJG0J9yc= =d34x -----END PGP SIGNATURE----- Merge remote-tracking branch 'mst/tags/for_upstream' into staging pci, virtio, vhost: fixes A bunch of fixes that missed the release. Most notably we are reverting shpc back to enabled by default state as guests uses that as an indicator that hotplug is supported (even though it's unused). Unfortunately we can't fix this on the stable branch since that would break migration. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # gpg: Signature made Wed 17 May 2017 10:42:06 PM BST # gpg: using RSA key 0x281F0DB8D28D5469 # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * mst/tags/for_upstream: exec: abstract address_space_do_translate() pci: deassert intx when pci device unrealize virtio: allow broken device to notify guest Revert "hw/pci: disable pci-bridge's shpc by default" acpi-defs: clean up open brace usage ACPI: don't call acpi_pcihp_device_plug_cb on xen iommu: Don't crash if machine is not PC_MACHINE pc: add 2.10 machine type pc/fwcfg: unbreak migration from qemu-2.5 and qemu-2.6 during firmware boot libvhost-user: fix crash when rings aren't ready hw/virtio: fix vhost user fails to startup when MQ hw/arm/virt: generate 64-bit addressable ACPI objects hw/acpi-defs: replace leading X with x_ in FADT field names Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-05-18 10:01:08 +01:00
Eduardo Habkost	8ab5700ca0	iommu: Remove FIXME comment about user_creatable=true amd-iommu and intel-iommu are really meant to be used with -device, so they need user_creatable=true. Remove the FIXME comment. Cc: Marcel Apfelbaum <marcel@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Reviewed-by: Marcel Apfelbaum <marcel@redhat.com> Acked-by: Marcel Apfelbaum <marcel@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20170503203604.31462-5-ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-05-17 10:37:01 -03:00
Eduardo Habkost	e4f4fb1eca	sysbus: Set user_creatable=false by default on TYPE_SYS_BUS_DEVICE commit `33cd52b5d7` unset cannot_instantiate_with_device_add_yet in TYPE_SYSBUS, making all sysbus devices appear on "-device help" and lack the "no-user" flag in "info qdm". To fix this, we can set user_creatable=false by default on TYPE_SYS_BUS_DEVICE, but this requires setting user_creatable=true explicitly on the sysbus devices that actually work with -device. Fortunately today we have just a few has_dynamic_sysbus=1 machines: virt, pc-q35-, ppce500, and spapr. virt, ppce500, and spapr have extra checks to ensure just a few device types can be instantiated: virt supports only TYPE_VFIO_CALXEDA_XGMAC, TYPE_VFIO_AMD_XGBE. * ppce500 supports only TYPE_ETSEC_COMMON. * spapr supports only TYPE_SPAPR_PCI_HOST_BRIDGE. This patch sets user_creatable=true explicitly on those 4 device classes. Now, the more complex cases: pc-q35-: q35 has no sysbus device whitelist yet (which is a separate bug). We are in the process of fixing it and building a sysbus whitelist on q35, but in the meantime we can fix the "-device help" and "info qdm" bugs mentioned above. Also, despite not being strictly necessary for fixing the q35 bug, reducing the list of user_creatable=true devices will help us be more confident when building the q35 whitelist. xen: We also have a hack at xen_set_dynamic_sysbus(), that sets has_dynamic_sysbus=true at runtime when using the Xen accelerator. This hack is only used to allow xen-backend devices to be dynamically plugged/unplugged. This means today we can use -device with the following 22 device types, that are the ones compiled into the qemu-system-x86_64 and qemu-system-i386 binaries: allwinner-ahci * amd-iommu * cfi.pflash01 * esp * fw_cfg_io * fw_cfg_mem * generic-sdhci * hpet * intel-iommu * ioapic * isabus-bridge * kvmclock * kvm-ioapic * kvmvapic * SUNW,fdtwo * sysbus-ahci * sysbus-fdc * sysbus-ohci * unimplemented-device * virtio-mmio * xen-backend * xen-sysdev This patch adds user_creatable=true explicitly to those devices, temporarily, just to keep 100% compatibility with existing behavior of q35. Subsequent patches will remove user_creatable=true from the devices that are really not meant to user-creatable on any machine, and remove the FIXME comment from the ones that are really supposed to be user-creatable. This is being done in separate patches because we still don't have an obvious list of devices that will be whitelisted by q35, and I would like to get each device reviewed individually. Cc: Alexander Graf <agraf@suse.de> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Alistair Francis <alistair.francis@xilinx.com> Cc: Beniamino Galvani <b.galvani@gmail.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Cornelia Huck <cornelia.huck@de.ibm.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: "Edgar E. Iglesias" <edgar.iglesias@gmail.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Frank Blaschka <frank.blaschka@de.ibm.com> Cc: Gabriel L. Somlo <somlo@cmu.edu> Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: John Snow <jsnow@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Laszlo Ersek <lersek@redhat.com> Cc: Marcel Apfelbaum <marcel@redhat.com> Cc: Markus Armbruster <armbru@redhat.com> Cc: Max Reitz <mreitz@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Pierre Morel <pmorel@linux.vnet.ibm.com> Cc: Prasad J Pandit <pjp@fedoraproject.org> Cc: qemu-arm@nongnu.org Cc: qemu-block@nongnu.org Cc: qemu-ppc@nongnu.org Cc: Richard Henderson <rth@twiddle.net> Cc: Rob Herring <robh@kernel.org> Cc: Shannon Zhao <zhaoshenglong@huawei.com> Cc: sstabellini@kernel.org Cc: Thomas Huth <thuth@redhat.com> Cc: Yi Min Zhao <zyimin@linux.vnet.ibm.com> Acked-by: John Snow <jsnow@redhat.com> Acked-by: Juergen Gross <jgross@suse.com> Acked-by: Marcel Apfelbaum <marcel@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20170503203604.31462-3-ehabkost@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> [ehabkost: Small changes at sysbus_device_class_init() comments] Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-05-17 10:37:01 -03:00
Eduardo Habkost	ef0e8fc768	iommu: Don't crash if machine is not PC_MACHINE Currently it's possible to crash QEMU using "-device *-iommu" and "-machine none": $ qemu-system-x86_64 -machine none -device amd-iommu qemu/hw/i386/amd_iommu.c:1140:amdvi_realize: Object 0x55627dafbc90 is not an instance of type generic-pc-machine Aborted (core dumped) $ qemu-system-x86_64 -machine none -device intel-iommu qemu/hw/i386/intel_iommu.c:2972:vtd_realize: Object 0x56292ec0bc90 is not an instance of type generic-pc-machine Aborted (core dumped) Fix amd-iommu and intel-iommu to ensure the current machine is really a TYPE_PC_MACHINE instance at their realize methods. Resulting error messages: $ qemu-system-x86_64 -machine none -device amd-iommu qemu-system-x86_64: -device amd-iommu: Machine-type 'none' not supported by amd-iommu $ qemu-system-x86_64 -machine none -device intel-iommu qemu-system-x86_64: -device intel-iommu: Machine-type 'none' not supported by intel-iommu Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-05-10 22:04:23 +03:00
Peter Xu	dd4d607e40	intel_iommu: enable remote IOTLB This patch is based on Aviv Ben-David (<bd.aviv@gmail.com>)'s patch upstream: "IOMMU: enable intel_iommu map and unmap notifiers" https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg01453.html However I removed/fixed some content, and added my own codes. Instead of translate() every page for iotlb invalidations (which is slower), we walk the pages when needed and notify in a hook function. This patch enables vfio devices for VT-d emulation. And, since we already have vhost DMAR support via device-iotlb, a natural benefit that this patch brings is that vt-d enabled vhost can live even without ATS capability now. Though more tests are needed. Signed-off-by: Aviv Ben-David <bdaviv@cs.technion.ac.il> Reviewed-by: Jason Wang <jasowang@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: \"Michael S. Tsirkin\" <mst@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1491562755-23867-10-git-send-email-peterx@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-04-20 15:22:41 -03:00
Peter Xu	558e0024a4	intel_iommu: allow dynamic switch of IOMMU region This is preparation work to finally enabled dynamic switching ON/OFF for VT-d protection. The old VT-d codes is using static IOMMU address space, and that won't satisfy vfio-pci device listeners. Let me explain. vfio-pci devices depend on the memory region listener and IOMMU replay mechanism to make sure the device mapping is coherent with the guest even if there are domain switches. And there are two kinds of domain switches: (1) switch from domain A -> B (2) switch from domain A -> no domain (e.g., turn DMAR off) Case (1) is handled by the context entry invalidation handling by the VT-d replay logic. What the replay function should do here is to replay the existing page mappings in domain B. However for case (2), we don't want to replay any domain mappings - we just need the default GPA->HPA mappings (the address_space_memory mapping). And this patch helps on case (2) to build up the mapping automatically by leveraging the vfio-pci memory listeners. Another important thing that this patch does is to seperate IR (Interrupt Remapping) from DMAR (DMA Remapping). IR region should not depend on the DMAR region (like before this patch). It should be a standalone region, and it should be able to be activated without DMAR (which is a common behavior of Linux kernel - by default it enables IR while disabled DMAR). Reviewed-by: Jason Wang <jasowang@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: \"Michael S. Tsirkin\" <mst@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1491562755-23867-9-git-send-email-peterx@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-04-20 15:22:41 -03:00
Peter Xu	f06a696dc9	intel_iommu: provide its own replay() callback The default replay() don't work for VT-d since vt-d will have a huge default memory region which covers address range 0-(2^64-1). This will normally consumes a lot of time (which looks like a dead loop). The solution is simple - we don't walk over all the regions. Instead, we jump over the regions when we found that the page directories are empty. It'll greatly reduce the time to walk the whole region. To achieve this, we provided a page walk helper to do that, invoking corresponding hook function when we found an page we are interested in. vtd_page_walk_level() is the core logic for the page walking. It's interface is designed to suite further use case, e.g., to invalidate a range of addresses. Reviewed-by: Jason Wang <jasowang@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: \"Michael S. Tsirkin\" <mst@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1491562755-23867-8-git-send-email-peterx@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-04-20 15:22:41 -03:00
Jason Wang	10315b9b28	intel_iommu: use the correct memory region for device IOTLB notification We have a specific memory region for DMAR now, so it's wrong to trigger the notifier with the root region. Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: \"Michael S. Tsirkin\" <mst@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1491562755-23867-7-git-send-email-peterx@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-04-20 15:22:41 -03:00
Peter Xu	7e58326ad7	intel_iommu: vtd_slpt_level_shift check level This helps in debugging incorrect level passed in. Reviewed-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-02-17 21:52:31 +02:00
Peter Xu	6c441e1d61	intel_iommu: convert dbg macros to trace for trans Another patch to convert the DPRINTF() stuffs. This patch focuses on the address translation path and caching. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Jason Wang <jasowang@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-02-17 21:52:31 +02:00
Peter Xu	bc535e59c4	intel_iommu: convert dbg macros to traces for inv VT-d codes are still using static DEBUG_INTEL_IOMMU macro. That's not good, and we should end the day when we need to recompile the code before getting useful debugging information for vt-d. Time to switch to the trace system. This is the first patch to do it. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Jason Wang <jasowang@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-02-17 21:52:31 +02:00
Peter Xu	6e9055641b	intel_iommu: renaming gpa to iova where proper There are lots of places in current intel_iommu.c codes that named "iova" as "gpa". It is really confusing to use a name "gpa" in these places (which is very easily to be understood as "Guest Physical Address", while it's not). To make the codes (much) easier to be read, I decided to do this once and for all. No functional change is made. Only literal ones. Reviewed-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-02-17 21:52:31 +02:00
Peter Xu	046ab7e9be	intel_iommu: simplify irq region translation Now we have a standalone memory region for MSI, all the irq region requests should be redirected there. Cleaning up the block with an assertion instead. Reviewed-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-02-17 21:52:31 +02:00
Aviv Ben-David	3b40f0e53c	intel_iommu: add "caching-mode" option This capability asks the guest to invalidate cache before each map operation. We can use this invalidation to trap map operations in the hypervisor. Signed-off-by: Aviv Ben-David <bd.aviv@gmail.com> [peterx: using "caching-mode" instead of "cache-mode" to align with spec] [peterx: re-write the subject to make it short and clear] Reviewed-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Aviv Ben-David <bd.aviv@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-02-17 21:52:31 +02:00
Jason Wang	04eb6247eb	intel_iommu: fix and simplify size calculation in process_device_iotlb_desc() We don't use 1ULL which is wrong during size calculation. Fix it, and while at it, switch to use cto64() and adds a comments to make it simpler and easier to be understood. Reported-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>	2017-02-01 03:37:17 +02:00
Jason Wang	554f5e1604	intel_iommu: support device iotlb descriptor This patch enables device IOTLB support for intel iommu. The major work is to implement QI device IOTLB descriptor processing and notify the device through iommu notifier. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-01-10 05:56:58 +02:00
Jason Wang	2d3fc5816e	intel_iommu: allocate new key when creating new address space We use the pointer to stack for key for new address space, this will break hash table searching, fixing by g_malloc() a new key instead. Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Eduardo Habkost <ehabkost@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-01-10 05:56:58 +02:00
Jason Wang	e0a3c8ccaa	intel_iommu: name vtd address space with devfn To avoid duplicated name and ease debugging. Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Eduardo Habkost <ehabkost@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-01-10 05:56:58 +02:00
Peter Xu	8cdcf3c1e5	intel_iommu: allow migration IOMMU needs to be migrated before all the PCI devices (in case there are devices that will request for address translation). So marking it with a priority higher than the default (which PCI devices and other belong). Migration framework handled the rest. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-01-10 05:56:58 +02:00
Peter Xu	6cb99acc28	intel_iommu: fix incorrect device invalidate "mask" needs to be inverted before use. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2016-11-30 04:20:57 +02:00
Peter Xu	8e7a0a1616	intel_iommu: fix incorrect assert Reported-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2016-11-15 17:20:36 +02:00
Peter Xu	1a43713b02	intel_iommu: fix several incorrect endianess and bit fields Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2016-11-15 17:20:36 +02:00
Jason Wang	bacabb0afa	intel_iommu: fixing source id during IOTLB hash key calculation Using uint8_t for source id will lose bus num and get the wrong/invalid IOTLB entry. Fixing by using uint16_t instead and enlarge level shift. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2016-11-15 17:20:36 +02:00
Radim Krčmář	fb506e701e	intel_iommu: reject broken EIM Cluster x2APIC cannot work without KVM's x2apic API when the maximal APIC ID is greater than 8 and only KVM's LAPIC can support x2APIC, so we forbid other APICs and also the old KVM case with less than 9, to simplify the code. There is no point in enabling EIM in forbidden APICs, so we keep it enabled only for the KVM APIC; unconditionally, because making the option depend on KVM version would be a maintanance burden. Old QEMUs would enable eim whenever intremap was on, which would trick guests into thinking that they can enable cluster x2APIC even if any interrupt destination would get clamped to 8 bits. Depending on your configuration, QEMU could notice that the destination LAPIC is not present and report it with a very non-obvious: KVM: injection failed, MSI lost (Operation not permitted) Or the guest could say something about unexpected interrupts, because clamping leads to aliasing so interrupts were being delivered to incorrect VCPUs. KVM_X2APIC_API is the feature that allows us to enable EIM for KVM. QEMU 2.7 allowed EIM whenever interrupt remapping was enabled. In order to keep backward compatibility, we again allow guests to misbehave in non-obvious ways, and make it the default for old machine types. A user can enable the buggy mode it with "x-buggy-eim=on". Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2016-10-17 15:44:49 -02:00
Radim Krčmář	e6b6af0560	intel_iommu: add OnOffAuto intr_eim as "eim" property The default (auto) emulates the current behavior. A user can now control EIM like -device intel-iommu,intremap=on,eim=off Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2016-10-17 15:44:49 -02:00
Radim Krčmář	6333e93c77	intel_iommu: redo configuraton check in realize * there no point in configuring the device if realization is going to fail, so move the check to the beginning, * create a separate function for the check, * use error_setg() instead error_report(). Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2016-10-17 15:44:49 -02:00
Radim Krčmář	329460191d	intel_iommu: pass whole remapped addresses to apic The MMIO interface to APIC only allowed 8 bit addresses, which is not enough for 32 bit addresses from EIM remapping. Intel stored upper 24 bits in the high MSI address, so use the same technique. The technique is also used in KVM MSI interface. Other APICs are unlikely to handle those upper bits. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2016-10-17 15:44:49 -02:00
Feng Wu	dea651a95a	intel-iommu: Check IOAPIC's Trigger Mode against the one in IRTE The Trigger Mode field of IOAPIC must match the Trigger Mode in the IRTE according to VT-d Spec 5.1.5.1. Signed-off-by: Feng Wu <feng.wu@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2016-10-10 02:38:14 +03:00
Peter Xu	a3276f786c	intel_iommu, amd_iommu: allow UNMAP notifiers x86 vIOMMUs still lack of a complete IOMMU notifier mechanism. Before that is achieved, let's open a door for vhost DMAR support, which only requires cache invalidations (UNMAP operations). Meanwhile, convert hw_error() to error_report() and exit(1), to make the error messages cleaner and obvious (no CPU registers will be dumped). Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1474606948-14391-4-git-send-email-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-09-27 11:57:28 +02:00

1 2

79 Commits