This patch enables device IOTLB support for intel iommu. The major
work is to implement QI device IOTLB descriptor processing and notify
the device through iommu notifier.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
We use the pointer to stack for key for new address space, this will break hash
table searching, fixing by g_malloc() a new key instead.
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
To avoid duplicated name and ease debugging.
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
IOMMU needs to be migrated before all the PCI devices (in case there are
devices that will request for address translation). So marking it with a
priority higher than the default (which PCI devices and other belong).
Migration framework handled the rest.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
"mask" needs to be inverted before use.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reported-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Using uint8_t for source id will lose bus num and get the
wrong/invalid IOTLB entry. Fixing by using uint16_t instead and
enlarge level shift.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Cluster x2APIC cannot work without KVM's x2apic API when the maximal
APIC ID is greater than 8 and only KVM's LAPIC can support x2APIC, so we
forbid other APICs and also the old KVM case with less than 9, to
simplify the code.
There is no point in enabling EIM in forbidden APICs, so we keep it
enabled only for the KVM APIC; unconditionally, because making the
option depend on KVM version would be a maintanance burden.
Old QEMUs would enable eim whenever intremap was on, which would trick
guests into thinking that they can enable cluster x2APIC even if any
interrupt destination would get clamped to 8 bits.
Depending on your configuration, QEMU could notice that the destination
LAPIC is not present and report it with a very non-obvious:
KVM: injection failed, MSI lost (Operation not permitted)
Or the guest could say something about unexpected interrupts, because
clamping leads to aliasing so interrupts were being delivered to
incorrect VCPUs.
KVM_X2APIC_API is the feature that allows us to enable EIM for KVM.
QEMU 2.7 allowed EIM whenever interrupt remapping was enabled. In order
to keep backward compatibility, we again allow guests to misbehave in
non-obvious ways, and make it the default for old machine types.
A user can enable the buggy mode it with "x-buggy-eim=on".
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
The default (auto) emulates the current behavior.
A user can now control EIM like
-device intel-iommu,intremap=on,eim=off
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
* there no point in configuring the device if realization is going to
fail, so move the check to the beginning,
* create a separate function for the check,
* use error_setg() instead error_report().
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
The MMIO interface to APIC only allowed 8 bit addresses, which is not
enough for 32 bit addresses from EIM remapping.
Intel stored upper 24 bits in the high MSI address, so use the same
technique. The technique is also used in KVM MSI interface.
Other APICs are unlikely to handle those upper bits.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
The Trigger Mode field of IOAPIC must match the Trigger Mode in
the IRTE according to VT-d Spec 5.1.5.1.
Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
x86 vIOMMUs still lack of a complete IOMMU notifier mechanism.
Before that is achieved, let's open a door for vhost DMAR support,
which only requires cache invalidations (UNMAP operations).
Meanwhile, convert hw_error() to error_report() and exit(1), to make
the error messages cleaner and obvious (no CPU registers will be dumped).
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1474606948-14391-4-git-send-email-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The new interface can be used to replace the old notify_started() and
notify_stopped(). Meanwhile it provides explicit flags so that IOMMUs
can know what kind of notifications it is requested for.
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1474606948-14391-3-git-send-email-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add IVRS table for AMD IOMMU. Generate IVRS or DMAR
depending on emulated IOMMU.
Signed-off-by: David Kiarie <davidkiarie4@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Also avoid unnamed fields for portability.
Also, rename VTD_IRTE to VTD_IR_TableEntry for coding
style compliance.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
When user specify "intremap=on" with "-M kernel-irqchip=on", throw error
and then quit.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Linux guests do not gracefully handle cases when the invalidation mask
they wanted is not supported, probably because real hardware always
allowed all.
We can just say that all 16 masks are supported, because both
ioapic_iec_notifier and kvm_update_msi_routes_all invalidate all caches.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This patch enables SID validation. Invalid interrupts will be dropped.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
As neither QEMU nor KVM support more than 255 CPUs so far, this is
simple: we only need to switch the destination ID translation in
vtd_remap_irq_get if EIME is set.
Once CFI support is there, it will have to take EIM into account as
well. So far, nothing to do for this.
This patch allows to use x2APIC in split irqchip mode of KVM.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
[use le32_to_cpu() to retrieve dest_id]
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This patch introduces x86 IOMMU IEC (Interrupt Entry Cache)
invalidation notifier list. When vIOMMU receives IEC invalidate
request, all the registered units will be notified with specific
invalidation requests.
Intel IOMMU is the first provider that generates such a event.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
In split irqchip mode, IOAPIC is working in user space, only update
kernel irq routes when entry changed. When IR is enabled, we directly
update the kernel with translated messages. It works just like a kernel
cache for the remapping entries.
Since KVM irqfd is using kernel gsi routes to deliver interrupts, as
long as we can support split irqchip, we will support irqfd as
well. Also, since kernel gsi routes will cache translated interrupts,
irqfd delivery will not suffer from any performance impact due to IR.
And, since we supported irqfd, vhost devices will be able to work
seamlessly with IR now. Logically this should contain both vhost-net and
vhost-user case.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[move trace-events lines into target-i386/trace-events]
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This patch translates all IOAPIC interrupts into MSI ones. One pseudo
ioapic address space is added to transfer the MSI message. By default,
it will be system memory address space. When IR is enabled, it will be
IOMMU address space.
Currently, only emulated IOAPIC is supported.
Idea suggested by Jan Kiszka and Rita Sinha in the following patch:
https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg01933.html
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Correct and portable in theory, but triggers warnings with older gcc
versions when -Wmissing-braces is enabled.
See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53119
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This patch enables interrupt remapping for PCI devices.
To play the trick, one memory region "iommu_ir" is added as child region
of the original iommu memory region, covering range 0xfeeXXXXX (which is
the address range for APIC). All the writes to this range will be taken
as MSI, and translation is carried out only when IR is enabled.
Idea suggested by Paolo Bonzini.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Handle writting to IRE bit in global command register.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Defined Interrupt Remap Table Address register to store IR table
pointer. Also, do proper handling on global command register writes to
store table pointer and its size.
One more debug flag "DEBUG_IR" is added for interrupt remapping.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Enable IR in IOMMU Extended Capability register.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Queued invalidation is required for IR. This patch add basic support for
interrupt cache invalidate requests. Since we currently have no IR cache
implemented yet, we can just skip all interrupt cache invalidation
requests for now.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Introducing parent class for intel-iommu devices named "x86-iommu". This
is preparation work to abstract shared functionalities out from Intel
and AMD IOMMUs. Currently, only the parent class is introduced. It does
nothing yet.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Use the standard '-device intel-iommu' to create the IOMMU device.
The legacy '-machine,iommu=on' can still be used.
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
We don't currently support the MemoryRegionIOMMUOps notifier, so throw
an error should a device require it.
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Peter Xu <peterx@redhat.com>
Tested-by: Peter Xu <peterx@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Current intel_iommu only supports 4K page which may not be sufficient
to cover guest working set. This patch tries to enable 2M and 1G mapping
for intel_iommu. This is also useful for future device IOTLB
implementation to have a better hit rate.
Major work is adding a page mask field on IOTLB entry to make it
support large page. And also use the slpte level as key to do IOTLB
lookup. MAMV was increased to 18 to support direct invalidation for 1G
mapping.
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.
This commit was created with scripts/clean-includes.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1453832250-766-11-git-send-email-peter.maydell@linaro.org
- Use a hash table indexed on bus pointers to store information about buses
instead of using the bus numbers.
Bus pointers are stored in a new VTDBus struct together with the vector
of device address space pointers indexed by devfn.
- The bus number is still used for lookup for selective SID based invalidate,
in which case the bus number is lazily resolved from the bus hash table and
cached in a separate index.
Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Switch all the uses of ld/st*_phys to address_space_ld/st*,
except for those cases where the address space is the CPU's
(ie cs->as). This was done with the following script which
generates a Coccinelle patch.
A few over-80-columns lines in the result were rewrapped by
hand where Coccinelle failed to do the wrapping automatically,
as well as one location where it didn't put a line-continuation
'\' when wrapping lines on a change made to a match inside
a macro definition.
===begin===
#!/bin/sh -e
# Usage:
# ./ldst-phys.spatch.sh > ldst-phys.spatch
# spatch -sp_file ldst-phys.spatch -dir . | sed -e '/^+/s/\t/ /g' > out.patch
# patch -p1 < out.patch
for FN in ub uw_le uw_be l_le l_be q_le q_be uw l q; do
cat <<EOF
@ cpu_matches_ld_${FN} @
expression E1,E2;
identifier as;
@@
ld${FN}_phys(E1->as,E2)
@ other_matches_ld_${FN} depends on !cpu_matches_ld_${FN} @
expression E1,E2;
@@
-ld${FN}_phys(E1,E2)
+address_space_ld${FN}(E1,E2, MEMTXATTRS_UNSPECIFIED, NULL)
EOF
done
for FN in b w_le w_be l_le l_be q_le q_be w l q; do
cat <<EOF
@ cpu_matches_st_${FN} @
expression E1,E2,E3;
identifier as;
@@
st${FN}_phys(E1->as,E2,E3)
@ other_matches_st_${FN} depends on !cpu_matches_st_${FN} @
expression E1,E2,E3;
@@
-st${FN}_phys(E1,E2,E3)
+address_space_st${FN}(E1,E2,E3, MEMTXATTRS_UNSPECIFIED, NULL)
EOF
done
===endit===
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Note that even after this patch, most callers of address_space_*
functions must still be under the big QEMU lock, otherwise the memory
region returned by address_space_translate can disappear as soon as
address_space_translate returns. This will be fixed in the next part
of this series.
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add IOTLB to cache information about the translation of input-addresses. IOTLB
use a GHashTable as cache. The key of the hash table is the logical-OR of gfn
and source id after left-shifting.
Signed-off-by: Le Tan <tamlokveer@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Add context-cache to cache context-entry encountered on a page-walk. Each
VTDAddressSpace has a member of VTDContextCacheEntry which represents an entry
in the context-cache. Since devices with different bus_num and devfn have their
respective VTDAddressSpace, this will be a good way to reference the cached
entries.
Each VTDContextCacheEntry will have a context_cache_gen and the cached entry
is valid only when context_cache_gen equals IntelIOMMUState.context_cache_gen.
Signed-off-by: Le Tan <tamlokveer@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Add supports for queued invalidation interface, an expended invalidation
interface with extended capabilities.
Signed-off-by: Le Tan <tamlokveer@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Add support for emulating Intel IOMMU according to the VT-d specification for
the q35 chipset machine. Implement the logics for DMAR (DMA remapping) without
PASID support. The emulation supports register-based invalidation and primary
fault logging.
Signed-off-by: Le Tan <tamlokveer@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>