AMD-Vi support for IOMMU sysfs. This allows us to associate devices
with a specific IOMMU device and examine the capabilities and features
of that IOMMU. The AMD IOMMU is hosted on and actual PCI device, so
we make that device the parent for the IOMMU class device. This
initial implementaiton exposes only the capability header and extended
features register for the IOMMU.
# find /sys | grep ivhd
/sys/devices/pci0000:00/0000:00:00.2/iommu/ivhd0
/sys/devices/pci0000:00/0000:00:00.2/iommu/ivhd0/devices
/sys/devices/pci0000:00/0000:00:00.2/iommu/ivhd0/devices/0000:00:00.0
/sys/devices/pci0000:00/0000:00:00.2/iommu/ivhd0/devices/0000:00:02.0
/sys/devices/pci0000:00/0000:00:00.2/iommu/ivhd0/devices/0000:00:04.0
/sys/devices/pci0000:00/0000:00:00.2/iommu/ivhd0/devices/0000:00:09.0
/sys/devices/pci0000:00/0000:00:00.2/iommu/ivhd0/devices/0000:00:11.0
/sys/devices/pci0000:00/0000:00:00.2/iommu/ivhd0/devices/0000:00:12.0
/sys/devices/pci0000:00/0000:00:00.2/iommu/ivhd0/devices/0000:00:12.2
/sys/devices/pci0000:00/0000:00:00.2/iommu/ivhd0/devices/0000:00:13.0
...
/sys/devices/pci0000:00/0000:00:00.2/iommu/ivhd0/power
/sys/devices/pci0000:00/0000:00:00.2/iommu/ivhd0/power/control
...
/sys/devices/pci0000:00/0000:00:00.2/iommu/ivhd0/device
/sys/devices/pci0000:00/0000:00:00.2/iommu/ivhd0/subsystem
/sys/devices/pci0000:00/0000:00:00.2/iommu/ivhd0/amd-iommu
/sys/devices/pci0000:00/0000:00:00.2/iommu/ivhd0/amd-iommu/cap
/sys/devices/pci0000:00/0000:00:00.2/iommu/ivhd0/amd-iommu/features
/sys/devices/pci0000:00/0000:00:00.2/iommu/ivhd0/uevent
/sys/class/iommu/ivhd0
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
set_device_exclusion_range(u16 devid, struct ivmd_header *m) enables
exclusion range for ONE device. IOMMU does not translate the access
to the exclusion range from the device.
The device is specified by input argument 'devid'. But 'devid' is not
passed to the actual set function set_dev_entry_bit(), instead
'm->devid' is passed. 'm->devid' does not specify the exact device
which needs enable the exclusion range. 'm->devid' represents DeviceID
field of IVMD, which has different meaning depends on IVMD type.
The caller init_exclusion_range() sets 'devid' for ONE device. When
m->type is equal to ACPI_IVMD_TYPE_ALL or ACPI_IVMD_TYPE_RANGE,
'm->devid' is not equal to 'devid'.
This patch fixes 'm->devid' to 'devid'.
Signed-off-by: Su Friendy <friendy.su@sony.com.cn>
Signed-off-by: Tamori Masahiro <Masahiro.Tamori@jp.sony.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
In reality, the spec can only support 16-bit PASID since
INVALIDATE_IOTLB_PAGES and COMPLETE_PPR_REQUEST commands only allow 16-bit
PASID. So, we updated the PASID_MASK accordingly and invoke BUG_ON
if the hardware is reporting PASmax more than 16-bit.
Besides, max PASID is defined as ((2^(PASmax+1)) - 1). The current does not
determine this correctly.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Tested-by: Jay Cornwall <Jay.Cornwall@amd.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
Replace direct inclusions of <acpi/acpi.h>, <acpi/acpi_bus.h> and
<acpi/acpi_drivers.h>, which are incorrect, with <linux/acpi.h>
inclusions and remove some inclusions of those files that aren't
necessary.
First of all, <acpi/acpi.h>, <acpi/acpi_bus.h> and <acpi/acpi_drivers.h>
should not be included directly from any files that are built for
CONFIG_ACPI unset, because that generally leads to build warnings about
undefined symbols in !CONFIG_ACPI builds. For CONFIG_ACPI set,
<linux/acpi.h> includes those files and for CONFIG_ACPI unset it
provides stub ACPI symbols to be used in that case.
Second, there are ordering dependencies between those files that always
have to be met. Namely, it is required that <acpi/acpi_bus.h> be included
prior to <acpi/acpi_drivers.h> so that the acpi_pci_root declarations the
latter depends on are always there. And <acpi/acpi.h> which provides
basic ACPICA type declarations should always be included prior to any other
ACPI headers in CONFIG_ACPI builds. That also is taken care of including
<linux/acpi.h> as appropriate.
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com> (drivers/pci stuff)
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> (Xen stuff)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
PCI core will initialize device MSI/MSI-X capability in
pci_msi_init_pci_dev(). So device driver should use
pci_dev->msi_cap/msix_cap to determine whether the device
support MSI/MSI-X instead of using
pci_find_capability(pci_dev, PCI_CAP_ID_MSI/MSIX). Access
to PCIe device config space again will consume more time.
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: iommu@lists.linux-foundation.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Joerg Roedel <joro@8bytes.org>
Add functionality to check the availability of the AMD IOMMU Performance
Counters and export this functionality to other core drivers, such as in this
case, a perf AMD IOMMU PMU. This feature is not bound to any specific AMD
family/model other than the presence of the IOMMU with P-C enabled.
The AMD IOMMU P-C support static counting only at this time.
Signed-off-by: Steven Kinney <steven.kinney@amd.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1370466709-3212-2-git-send-email-suravee.suthikulpanit@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The updates are mostly about the x86 IOMMUs this time. Exceptions are
the groundwork for the PAMU IOMMU from Freescale (for a PPC platform)
and an extension to the IOMMU group interface. On the x86 side this
includes a workaround for VT-d to disable interrupt remapping on broken
chipsets. On the AMD-Vi side the most important new feature is a kernel
command-line interface to override broken information in IVRS ACPI
tables and get interrupt remapping working this way. Besides that there
are small fixes all over the place.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJRh2vAAAoJECvwRC2XARrjbjkP/jzKzeffybUpQsIJF8rs/IEt
hSwqpGLr6WR5FdneEH9fiBIp4pyMDXmuAb/2ZNgB+DgPN3xgqmWVo4WLk7pMo3BS
/xIz/lu7hIX3AtKt807pL9+rPdhGYEJ43Vmr4bW9x0l1kuNXy6fmMLcN5FaPKjV4
p4hY4jOstEgtYQw4wi39/9b4FsYoipZizkOUSdtCzWwTv7jOHH7/Wra8iZyzL6Je
1VlF/efp0ytTcwLdHOfGwPCIlZrQRtQCM4SqdAUG9bOL3ARR9Yu/0iW1295nbLzo
CQX5CfKePvo/fGxki1jcBi+UCyxYKPosB5kCxmh4MAxCg/VzzMsaME/A73tLJa6W
Y29bbjwPoBPMq03HX8S9R5QWY8HpFujUUp+J4TXcKuTgYEV28WfLu1uaeKD716nM
LoXUojov7Cj8ZQZnhyu5l+XNaephBZLfw/8bM6bAxhlKXwAjmLiS5Z+srPl1GJee
5GCV+L94JifHLZaREWh3JFsh9O3W7Wno2++c4JU32uCWJHXH7tMgs2P8n5AY9rnT
Km1a9y6w2MF3Gg9j4y6u75m0XnFTNzYjeJMUtqVlwVhNHhgaXfuIWY63xOQCLJs1
ThTHOjoh0VqONGobR/ywn+0ouo9X07DnWpluyFd+zY3XK0UE0NOu9XMr4i6TWxOf
mlzWoEKxtw36XGHB/FtQ
=MVc/
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU updates from Joerg Roedel:
"The updates are mostly about the x86 IOMMUs this time.
Exceptions are the groundwork for the PAMU IOMMU from Freescale (for a
PPC platform) and an extension to the IOMMU group interface.
On the x86 side this includes a workaround for VT-d to disable
interrupt remapping on broken chipsets. On the AMD-Vi side the most
important new feature is a kernel command-line interface to override
broken information in IVRS ACPI tables and get interrupt remapping
working this way.
Besides that there are small fixes all over the place."
* tag 'iommu-updates-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (24 commits)
iommu/tegra: Fix printk formats for dma_addr_t
iommu: Add a function to find an iommu group by id
iommu/vt-d: Remove warning for HPET scope type
iommu: Move swap_pci_ref function to drivers/iommu/pci.h.
iommu/vt-d: Disable translation if already enabled
iommu/amd: fix error return code in early_amd_iommu_init()
iommu/AMD: Per-thread IOMMU Interrupt Handling
iommu: Include linux/err.h
iommu/amd: Workaround for ERBT1312
iommu/amd: Document ivrs_ioapic and ivrs_hpet parameters
iommu/amd: Don't report firmware bugs with cmd-line ivrs overrides
iommu/amd: Add ioapic and hpet ivrs override
iommu/amd: Add early maps for ioapic and hpet
iommu/amd: Extend IVRS special device data structure
iommu/amd: Move add_special_device() to __init
iommu: Fix compile warnings with forward declarations
iommu/amd: Properly initialize irq-table lock
iommu/amd: Use AMD specific data structure for irq remapping
iommu/amd: Remove map_sg_no_iommu()
iommu/vt-d: add quirk for broken interrupt remapping on 55XX chipsets
...
Fix to return -ENOMEM int the memory alloc error handling
case instead of 0, as done elsewhere in this function.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
In the current interrupt handling scheme, there are as many threads as
the number of IOMMUs. Each thread is created and assigned to an IOMMU at
the time of registering interrupt handlers (request_threaded_irq).
When an IOMMU HW generates an interrupt, the irq handler (top half) wakes up
the corresponding thread to process event and PPR logs of all IOMMUs
starting from the 1st IOMMU.
In the system with multiple IOMMU,this handling scheme complicates the
synchronization of the IOMMU data structures and status registers as
there could be multiple threads competing for the same IOMMU while
the other IOMMU could be left unhandled.
To simplify, this patch is proposing a different interrupt handling scheme
by having each thread only managing interrupts of the corresponding IOMMU.
This can be achieved by passing the struct amd_iommu when registering the
interrupt handlers. This structure is unique for each IOMMU and can be used
by the bottom half thread to identify the IOMMU to be handled instead
of calling for_each_iommu. Besides this also eliminate the needs to lock
the IOMMU for processing event and PPR logs.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
When the IVRS entries for IOAPIC and HPET are overridden on
the kernel command line, a problem detected in the check
function might not be a firmware bug anymore. So disable
the firmware bug reporting if the user provided valid
ivrs_ioapic or ivrs_hpet entries on the command line.
Reviewed-by: Shuah Khan <shuahkhan@gmail.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
Add two new kernel commandline parameters ivrs_ioapic and
ivrs_hpet to override the Id->DeviceId mapping from the IVRS
ACPI table. This can be used to work around broken BIOSes to
get interrupt remapping working on AMD systems.
Tested-by: Borislav Petkov <bp@suse.de>
Tested-by: Suravee Suthikulanit <suravee.suthikulpanit@amd.com>
Reviewed-by: Shuah Khan <shuahkhan@gmail.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
This is needed in a later patch were ioapic_map and hpet_map
entries are created before the slab allocator is initialized
(and thus add_special_device() can't be used).
Reviewed-by: Shuah Khan <shuahkhan@gmail.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
This patch extends the devid_map data structure to allow
ioapic and hpet entries in ivrs to be overridden on the
kernel command line.
Reviewed-by: Shuah Khan <shuahkhan@gmail.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
The function is only called by other __init functions, so it
can be moved to __init too.
Reviewed-by: Shuah Khan <shuahkhan@gmail.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
Change to remove local PCI_BUS() define and use the new PCI_BUS_NUM()
interface from PCI.
Signed-off-by: Shuah Khan <shuah.khan@hp.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Joerg Roedel <joro@8bytes.org>
commit 318fe78 ("IOMMU, AMD Family15h Model10-1Fh erratum 746 Workaround")
added amd_iommu_erratum_746_workaround and it's marked as __init, which is wrong
WARNING: drivers/iommu/built-in.o(.text+0x639c): Section mismatch in reference from the function iommu_init_pci() to the function .init.text:amd_iommu_erratum_746_workaround()
The function iommu_init_pci() references
the function __init amd_iommu_erratum_746_workaround().
This is often because iommu_init_pci lacks a __init
annotation or the annotation of amd_iommu_erratum_746_workaround is wrong.
Signed-off-by: Nikola Pajkovsky <npajkovs@redhat.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
When dma_ops are initialized the unity mappings are
created. The init_device_table_dma() function makes sure DMA
from all devices is blocked by default. This opens a short
window in time where DMA to unity mapped regions is blocked
by the IOMMU. Make sure this does not happen by initializing
the device table after dma_ops.
Cc: stable@vger.kernel.org
Signed-off-by: Joerg Roedel <joro@8bytes.org>
The IOMMU may stop processing page translations due to a perceived lack
of credits for writing upstream peripheral page service request (PPR)
or event logs. If the L2B miscellaneous clock gating feature is enabled
the IOMMU does not properly register credits after the log request has
completed, leading to a potential system hang.
BIOSes are supposed to disable L2B micellaneous clock gating by setting
L2_L2B_CK_GATE_CONTROL[CKGateL2BMiscDisable](D0F2xF4_x90[2]) = 1b. This
patch corrects that for those which do not enable this workaround.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org
Signed-off-by: Joerg Roedel <joro@8bytes.org>
On some systems the BIOS puts the wrong device-id for the
IO-APIC into the IVRS table. The result is that interrupt
remapping is not working for the IO-APIC irqs. This usually
means a kernel panic at boot because the timer is not
working.
Fix this kernel panic by disabling interrupt remapping if
this problem is discovered in the IVRS table.
Reported-by: Andrew Oakley <andrew@ado.is-a-geek.net>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Do not deinitialize the AMD IOMMU driver completly when
interrupt remapping is already in use but the initialization
of the DMA layer fails for some reason. Make sure the IOMMU
can still be used to remap interrupts.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Add the six routines required to setup interrupt remapping
with the AMD IOMMU. Also put it all together into the AMD
specific irq_remap_ops.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
The IVRS table usually includes the IOMMU device. But the
IOMMU does never translate itself, so make sure the IOMMU
driver knows this.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
When the IOMMU is enabled very early (as with irq-remapping)
some devices are still in BIOS hand. When dma is blocked
early this can cause lots of IO_PAGE_FAULTs. So delay the
DMA initialization and do it right before the dma_ops are
initialized.
To be secure, block all interrupts by default when irq-remapping is
enabled in the system. They will be reenabled on demand
later. Without blocking interrupts by default devices can
issue arbitrary interrupts by sending special DMA packets to
the CPU that look like MSI messages. This is especially
dangerous when a device is assigned to a KVM guest because
the guest can then DoS the host.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
When the IOAPIC information provided in the IVRS table is
not correct or not complete the system may not boot at all
when interrupt remapping is enabled. So check if this
information is correct and print out a firmware bug message
when it is not.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
The irq remapping tables for the AMD IOMMU need to be
aligned on a 128 byte boundary. Create a seperate slab-cache
to guarantee this alignment.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
The IVRS ACPI table provides information about the IOAPICs
and the HPETs available in the system and which PCI device
ID they use in transactions. Save that information for later
usage in interrupt remapping.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
When the AMD IOMMU doesn't have extended features, an empty line
gets issued in dmesg like so:
[ 3.061417] AMD-Vi: Found IOMMU at 0000:00:00.2 cap 0x40
[ 3.066757] <---
[ 3.068294] pci 0000:00:00.2: irq 72 for MSI/MSI-X
[ 3.081213] AMD-Vi: Lazy IO/TLB flushing enabled
Fix it.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Fix some typos in comments and user-visible messages. No
functional changes.
Signed-off-by: Frank Arnold <frank.arnold@amd.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
The pci_request_acs() function needs to be called before PCI
probing to be effective. So move it to another call-place to
ensure that.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This step makes it very easy to keep track about the current
intialization state of the iommu driver. With this change we
can initialize the IOMMU hardware to a point where it can
remap interrupts and later resume the initializion to enable
dma remapping.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This function will initialize everthing necessary so that
devices can do DMA. This includes dma_ops and iommu_ops.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This function will be called before the PCI subsystem is
initialized. Therefore dev_name doen't work and IOMMU
information can't be printed to the klog as before. Move the
code to print that information to a later point where PCI
initializtion has already happened.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
For interrupt remapping the relevant IOMMU initialization
needs to run earlier at boot when the PCI subsystem is not
yet initialized. To support that this patch splits the parts
of IOMMU initialization which need PCI accesses out of the
initial setup path so that this can be done later.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This makes it easier to propagate errors while parsing the
IVRS table and makes the amd_iommu_init_err hack obsolete.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
A few sparse warnings fire in drivers/iommu/amd_iommu_init.c.
Fix most of them with this patch. Also fix the sparse
warnings in drivers/iommu/irq_remapping.c while at it.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
write_file_bool() modifies 32 bits of data, so "amd_iommu_unmap_flush"
needs to be 32 bits as well or we'll corrupt memory. Fortunately it
looks like the data is aligned with a gap after the declaration so this
is harmless in production.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
The iommu_shutdown callback is not initialized when the AMD
IOMMU driver runs in passthrough mode. Fix that by moving
the callback initialization before the check for
passthrough mode.
Signed-off-by: Shuah Khan <shuah.khan@hp.com>
Cc: stable@vger.kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
At some point pci_get_bus_and_slot started to enable
interrupts. Since this function is used in the
amd_iommu_resume path it will enable interrupts on resume
which causes a warning. The fix will use a cached pointer
to the root-bridge to re-enable the IOMMU in case the BIOS
is broken.
Cc: stable@vger.kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>