Commit Graph

174 Commits

Author SHA1 Message Date
Jose Ricardo Ziviani
15126cba86 vfio: Set MemoryRegionOps:max_access_size and min_access_size
Sets valid.max_access_size and valid.min_access_size to ensure safe
8-byte accesses to vfio. Today, 8-byte accesses are broken into pairs
of 4-byte calls that goes unprotected:

qemu_mutex_lock locked mutex 0x10905ad8
  vfio_region_write  (0001:03:00.0:region1+0xc0, 0x2020c, 4)
qemu_mutex_unlock unlocked mutex 0x10905ad8
qemu_mutex_lock locked mutex 0x10905ad8
  vfio_region_write  (0001:03:00.0:region1+0xc4, 0xa0000, 4)
qemu_mutex_unlock unlocked mutex 0x10905ad8

which occasionally leads to:

qemu_mutex_lock locked mutex 0x10905ad8
  vfio_region_write  (0001:03:00.0:region1+0xc0, 0x2030c, 4)
qemu_mutex_unlock unlocked mutex 0x10905ad8
qemu_mutex_lock locked mutex 0x10905ad8
  vfio_region_write  (0001:03:00.0:region1+0xc0, 0x1000c, 4)
qemu_mutex_unlock unlocked mutex 0x10905ad8
qemu_mutex_lock locked mutex 0x10905ad8
  vfio_region_write  (0001:03:00.0:region1+0xc4, 0xb0000, 4)
qemu_mutex_unlock unlocked mutex 0x10905ad8
qemu_mutex_lock locked mutex 0x10905ad8
  vfio_region_write  (0001:03:00.0:region1+0xc4, 0xa0000, 4)
qemu_mutex_unlock unlocked mutex 0x10905ad8

causing strange errors in guest OS. With this patch, such accesses
are protected by the same lock guard:

qemu_mutex_lock locked mutex 0x10905ad8
vfio_region_write  (0001:03:00.0:region1+0xc0, 0x2000c, 4)
vfio_region_write  (0001:03:00.0:region1+0xc4, 0xb0000, 4)
qemu_mutex_unlock unlocked mutex 0x10905ad8

This happens because the 8-byte write should be broken into 4-byte
writes by memory.c:access_with_adjusted_size() in order to be under
the same lock. Today, it's done in exec.c:address_space_write_continue()
which was able to handle only 4 bytes due to a zero'ed
valid.max_access_size (see exec.c:memory_access_size()).

Signed-off-by: Jose Ricardo Ziviani <joserz@linux.vnet.ibm.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2017-05-03 14:52:34 -06:00
Peter Xu
698feb5e13 memory: add section range info for IOMMU notifier
In this patch, IOMMUNotifier.{start|end} are introduced to store section
information for a specific notifier. When notification occurs, we not
only check the notification type (MAP|UNMAP), but also check whether the
notified iova range overlaps with the range of specific IOMMU notifier,
and skip those notifiers if not in the listened range.

When removing an region, we need to make sure we removed the correct
VFIOGuestIOMMU by checking the IOMMUNotifier.start address as well.

This patch is solving the problem that vfio-pci devices receive
duplicated UNMAP notification on x86 platform when vIOMMU is there. The
issue is that x86 IOMMU has a (0, 2^64-1) IOMMU region, which is
splitted by the (0xfee00000, 0xfeefffff) IRQ region. AFAIK
this (splitted IOMMU region) is only happening on x86.

This patch also helps vhost to leverage the new interface as well, so
that vhost won't get duplicated cache flushes. In that sense, it's an
slight performance improvement.

Suggested-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1491562755-23867-2-git-send-email-peterx@redhat.com>
[ehabkost: included extra vhost_iommu_region_del() change from Peter Xu]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-04-20 15:22:41 -03:00
Alex Williamson
8f419c5b43 vfio/pci-quirks: Exclude non-ioport BAR from NVIDIA quirk
The NVIDIA BAR5 quirk is targeting an ioport BAR.  Some older devices
have a BAR5 which is not ioport and can induce a segfault here.  Test
the BAR type to skip these devices.

Link: https://bugs.launchpad.net/qemu/+bug/1678466
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2017-04-06 16:03:26 -06:00
Xiong Zhang
93587e3af3 Revert "vfio/pci-quirks.c: Disable stolen memory for igd VFIO"
This reverts commit c2b2e158cc.

The original patch intend to prevent linux i915 driver from using
stolen meory. But this patch breaks windows IGD driver loading on
Gen9+, as IGD HW will use stolen memory on Gen9+, once windows IGD
driver see zero size stolen memory, it will unload.
Meanwhile stolen memory will be disabled in 915 when i915 run as
a guest.

Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
[aw: Gen9+ is SkyLake and newer]
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2017-03-31 10:04:41 -06:00
XiongZhang
c2b2e158cc vfio/pci-quirks.c: Disable stolen memory for igd VFIO
Regardless of running in UPT or legacy mode, the guest igd
drivers may attempt to use stolen memory, however only legacy
mode has BIOS support for reserving stolen memmory in the
guest VM. We zero out the stolen memory size in all cases,
then guest igd driver won't use stolen memory.
In legacy mode, user could use x-igd-gms option to specify the
amount of stolen memory which will be pre-allocated and reserved
by bios for igd use.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99028
          https://bugs.freedesktop.org/show_bug.cgi?id=99025

Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2017-02-22 13:19:59 -07:00
Alex Williamson
d0d1cd70d1 vfio/pci: Improve extended capability comments, skip masked caps
Since commit 4bb571d857 ("pci/pcie: don't assume cap id 0 is
reserved") removes the internal use of extended capability ID 0, the
comment here becomes invalid.  However, peeling back the onion, the
code is still correct and we still can't seed the capability chain
with ID 0, unless we want to muck with using the version number to
force the header to be non-zero, which is much uglier to deal with.
The comment also now covers some of the subtleties of using cap ID 0,
such as transparently indicating absence of capabilities if none are
added.  This doesn't detract from the correctness of the referenced
commit as vfio in the kernel also uses capability ID zero to mask
capabilties.  In fact, we should skip zero capabilities precisely
because the kernel might also expose such a capability at the head
position and re-introduce the problem.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Tested-by: Peter Xu <peterx@redhat.com>
Reported-by: Jintack Lim <jintack@cs.columbia.edu>
Tested-by: Jintack Lim <jintack@cs.columbia.edu>
2017-02-22 13:19:58 -07:00
Alex Williamson
35c7cb4caf vfio/pci: Report errors from qdev_unplug() via device request
Currently we ignore this error, report it with error_reportf_err()

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
2017-02-22 13:19:58 -07:00
Peter Xu
dfbd90e5b9 vfio: allow to notify unmap for very large region
Linux vfio driver supports to do VFIO_IOMMU_UNMAP_DMA for a very big
region. This can be leveraged by QEMU IOMMU implementation to cleanup
existing page mappings for an entire iova address space (by notifying
with an IOTLB with extremely huge addr_mask). However current
vfio_iommu_map_notify() does not allow that. It make sure that all the
translated address in IOTLB is falling into RAM range.

The check makes sense, but it should only be a sensible checker for
mapping operations, and mean little for unmap operations.

This patch moves this check into map logic only, so that we'll get
faster unmap handling (no need to translate again), and also we can then
better support unmapping a very big region when it covers non-ram ranges
or even not-existing ranges.

Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-02-17 21:52:31 +02:00
Peter Xu
4a4b88fbe1 vfio: introduce vfio_get_vaddr()
A cleanup for vfio_iommu_map_notify(). Now we will fetch vaddr even if
the operation is unmap, but it won't hurt much.

One thing to mention is that we need the RCU read lock to protect the
whole translation and map/unmap procedure.

Acked-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-02-17 21:52:31 +02:00
Peter Xu
3213835720 vfio: trace map/unmap for notify as well
We traces its range, but we don't know whether it's a MAP/UNMAP. Let's
dump it as well.

Acked-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-02-17 21:52:31 +02:00
Thomas Huth
e197de50c6 hw/vfio: Add CONFIG switches for calxeda-xgmac and amd-xgbe
Both devices seem to be specific to the ARM platform. It's confusing
for the users if they show up on other target architectures, too
(e.g. when the user runs QEMU with "-device ?" to get a list of
supported devices). Thus let's introduce proper configuration switches
so that the devices are only compiled and included when they are
really required.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2017-02-10 13:12:03 -07:00
Thomas Huth
f23363ea44 hw/vfio/pci-quirks: Set category of the "vfio-pci-igd-lpc-bridge" device
The device has "bridge" in its name, so it should obviously be in
the category DEVICE_CATEGORY_BRIDGE.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2017-02-10 13:12:03 -07:00
Alex Williamson
ac2a9862b7 vfio-pci: Fix GTT wrap-around for Skylake+ IGD
Previous IGD, up through Broadwell, only seem to write GTT values into
the first 1MB of space allocated for the BDSM, but clearly the GTT
can be multiple MB in size.  Our test in vfio_igd_quirk_data_write()
correctly filters out indexes beyond 1MB, but given the 1MB mask we're
using, we re-apply writes only to the first 1MB of the guest allocated
BDSM.

We can't assume either the host or guest BDSM is naturally aligned, so
we can't simply apply a different mask.  Instead, save the host BDSM
and do the arithmetic to subtract the host value to get the BDSM
offset and add it to the guest allocated BDSM.

Reported-by: Alexander Indenbaum <alexander.indenbaum@gmail.com>
Tested-by: Alexander Indenbaum <alexander.indenbaum@gmail.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2017-02-10 13:12:03 -07:00
Peter Maydell
4e9f5244e1 -----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJYkeZAAAoJEJykq7OBq3PI6oUH/3qlRvQrWmhWLR+XCtwU0gON
 HRApL57Of+B1YbqJzb8wzjLMLfzZQYLoT7kf3FDRON751Iwpv2Qyl6j79kbmOQwy
 txvtgUTtPZrOZ9HMk6M1VboiKrkM1t0I1QiRYy/af2f1gD3KTqIt8YN1ic3xatKD
 Fgmx+oD+6EkrNilthemvDyaXtGsdTl4GC9ZbGcJB2VJzzWkksRUfeZWysIu9p2zP
 l6viegW/1+o5wYgBt6DxMalfNGbEiuBgXgx6PVFPbkw0xNURC52qDHhQ91xTSWt1
 pvFrIhYWR/ETN0twJh+jtmCjkawKWSsx2nrLlrSh4H0EpwFoRfFqH/ZrOFSg0wg=
 =QnCX
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/stefanha/tags/tracing-pull-request' into staging

# gpg: Signature made Wed 01 Feb 2017 13:44:32 GMT
# gpg:                using RSA key 0x9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>"
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35  775A 9CA4 ABB3 81AB 73C8

* remotes/stefanha/tags/tracing-pull-request:
  trace: clean up trace-events files
  qapi: add missing trace_visit_type_enum() call
  trace: improve error reporting when parsing simpletrace header
  trace: update docs to reflect new code generation approach
  trace: switch to modular code generation for sub-directories
  trace: move setting of group name into Makefiles
  trace: move hw/i386/xen events to correct subdir
  trace: move hw/xen events to correct subdir
  trace: move hw/block/dataplane events to correct subdir
  make: move top level dir to end of include search path

# Conflicts:
#	Makefile

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-02-02 16:08:28 +00:00
Cao jin
ee640c625e pci: Convert msix_init() to Error and fix callers
msix_init() reports errors with error_report(), which is wrong when
it's used in realize().  The same issue was fixed for msi_init() in
commit 1108b2f. In order to make the API change as small as possible,
leave the return value check to later patch.

For some devices(like e1000e, vmxnet3, nvme) who won't fail because of
msix_init's failure, suppress the error report by passing NULL error
object.

Bonus: add comment for msix_init.

CC: Jiri Pirko <jiri@resnulli.us>
CC: Gerd Hoffmann <kraxel@redhat.com>
CC: Dmitry Fleytman <dmitry@daynix.com>
CC: Jason Wang <jasowang@redhat.com>
CC: Michael S. Tsirkin <mst@redhat.com>
CC: Hannes Reinecke <hare@suse.de>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Alex Williamson <alex.williamson@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
CC: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-02-01 03:37:18 +02:00
Stefan Hajnoczi
7f4076c1bb trace: clean up trace-events files
There are a number of unused trace events that
scripts/cleanup-trace-events.pl finds.  The "hw/vfio/pci-quirks.c"
filename was typoed and "qapi/qapi-visit-core.c" was missing the qapi/
directory prefix.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20170126171613.1399-3-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-01-31 17:12:15 +00:00
Cao jin
8907379204 vfio: remove a duplicated word in comments
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-01-24 23:26:53 +03:00
Stefan Weil
b12227afb1 hw: Fix typos found by codespell
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Acked-by: Alistair Francis <alistair.francis@xilinx.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-01-24 23:26:52 +03:00
Yongji Xie
95251725e3 vfio: Add support for mmapping sub-page MMIO BARs
Now the kernel commit 05f0c03fbac1 ("vfio-pci: Allow to mmap
sub-page MMIO BARs if the mmio page is exclusive") allows VFIO
to mmap sub-page BARs. This is the corresponding QEMU patch.
With those patches applied, we could passthrough sub-page BARs
to guest, which can help to improve IO performance for some devices.

In this patch, we expand MemoryRegions of these sub-page
MMIO BARs to PAGE_SIZE in vfio_pci_write_config(), so that
the BARs could be passed to KVM ioctl KVM_SET_USER_MEMORY_REGION
with a valid size. The expanding size will be recovered when
the base address of sub-page BAR is changed and not page aligned
any more in guest. And we also set the priority of these BARs'
memory regions to zero in case of overlap with BARs which share
the same page with sub-page BARs in guest.

Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-31 09:53:04 -06:00
Ido Yariv
a52a4c4717 vfio/pci: fix out-of-sync BAR information on reset
When a PCI device is reset, pci_do_device_reset resets all BAR addresses
in the relevant PCIDevice's config buffer.

The VFIO configuration space stays untouched, so the guest OS may choose
to skip restoring the BAR addresses as they would seem intact. The PCI
device may be left non-operational.
One example of such a scenario is when the guest exits S3.

Fix this by resetting the BAR addresses in the VFIO configuration space
as well.

Signed-off-by: Ido Yariv <ido@wizery.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-31 09:53:04 -06:00
Alex Williamson
24acf72b9a vfio: Handle zero-length sparse mmap ranges
As reported in the link below, user has a PCI device with a 4KB BAR
which contains the MSI-X table.  This seems to hit a corner case in
the kernel where the region reports being mmap capable, but the sparse
mmap information reports a zero sized range.  It's not entirely clear
that the kernel is incorrect in doing this, but regardless, we need
to handle it.  To do this, fill our mmap array only with non-zero
sized sparse mmap entries and add an error return from the function
so we can tell the difference between nr_mmaps being zero based on
sparse mmap info vs lack of sparse mmap info.

NB, this doesn't actually change the behavior of the device, it only
removes the scary "Failed to mmap ... Performance may be slow" error
message.  We cannot currently create an mmap over the MSI-X table.

Link: http://lists.nongnu.org/archive/html/qemu-discuss/2016-10/msg00009.html
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-31 09:53:03 -06:00
Alex Williamson
21e00fa55f memory: Replace skip_dump flag with "ram_device"
Setting skip_dump on a MemoryRegion allows us to modify one specific
code path, but the restriction we're trying to address encompasses
more than that.  If we have a RAM MemoryRegion backed by a physical
device, it not only restricts our ability to dump that region, but
also affects how we should manipulate it.  Here we recognize that
MemoryRegions do not change to sometimes allow dumps and other times
not, so we replace setting the skip_dump flag with a new initializer
so that we know exactly the type of region to which we're applying
this behavior.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
2016-10-31 09:53:03 -06:00
Cao jin
893bfc3cc8 vfio: fix duplicate function call
When vfio device is reset(encounter FLR, or bus reset), if need to do
bus reset(vfio_pci_hot_reset_one is called), vfio_pci_pre_reset &
vfio_pci_post_reset will be called twice.

Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17 10:58:03 -06:00
Thorsten Kohfeldt
31e6a7b17b vfio/pci: Fix vfio_rtl8168_quirk_data_read address offset
Introductory comment for rtl8168 VFIO MSI-X quirk states:
At BAR2 offset 0x70 there is a dword data register,
         offset 0x74 is a dword address register.
vfio: vfio_bar_read(0000:05:00.0:BAR2+0x70, 4) = 0xfee00398 // read data

Thus, correct offset for data read is 0x70,
but function vfio_rtl8168_quirk_data_read() wrongfully uses offset 0x74.

Signed-off-by: Thorsten Kohfeldt <thorsten.kohfeldt@gmx.de>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17 10:58:02 -06:00
Eric Auger
4a94626850 vfio/pci: Handle host oversight
In case the end-user calls qemu with -vfio-pci option without passing
either sysfsdev or host property value, the device is interpreted as
0000:00:00.0. Let's create a specific error message to guide the end-user.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17 10:58:02 -06:00
Eric Auger
e04cff9d97 vfio/pci: Remove vfio_populate_device returned value
The returned value (either -errno or -1) is not used anymore by the caller,
vfio_realize, since the error now is stored in the error object. So let's
remove it.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17 10:58:02 -06:00
Eric Auger
ec3bcf424e vfio/pci: Remove vfio_msix_early_setup returned value
The returned value is not used anymore by the caller, vfio_realize,
since the error now is stored in the error object. So let's remove it.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17 10:58:01 -06:00
Eric Auger
1a22aca1d0 vfio/pci: Conversion to realize
This patch converts VFIO PCI to realize function.

Also original initfn errors now are propagated using QEMU
error objects. All errors are formatted with the same pattern:
"vfio: %s: the error description"

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17 10:58:01 -06:00
Eric Auger
9bdbfbd50d vfio/platform: Pass an error object to vfio_base_device_init
This patch propagates errors encountered during vfio_base_device_init
up to the realize function.

In case the host value is not set or badly formed we now report an
error.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17 10:58:01 -06:00
Eric Auger
0d84f47bff vfio/platform: fix a wrong returned value in vfio_populate_device
In case the vfio_init_intp fails we currently do not return an
error value. This patch fixes the bug. The returned value is not
explicit but in practice the error object is the one used to
report the error to the end-user and the actual returned error
value is not used.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17 10:58:00 -06:00
Eric Auger
5ff7419d4c vfio/platform: Pass an error object to vfio_populate_device
Propagate the vfio_populate_device errors up to vfio_base_device_init.
The error object also is passed to vfio_init_intp. At the moment we
only report the error. Subsequent patches will propagate the error
up to the realize function.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17 10:58:00 -06:00
Eric Auger
59f7d6743c vfio: Pass an error object to vfio_get_device
Pass an error object to prepare for migration to VFIO-PCI realize.

In vfio platform vfio_base_device_init we currently just report the
error. Subsequent patches will propagate the error up to the realize
function.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17 10:58:00 -06:00
Eric Auger
1b808d5be0 vfio: Pass an error object to vfio_get_group
Pass an error object to prepare for migration to VFIO-PCI realize.

For the time being let's just simply report the error in
vfio platform's vfio_base_device_init(). A subsequent patch will
duly propagate the error up to vfio_platform_realize.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17 10:57:59 -06:00
Eric Auger
01905f58f1 vfio: Pass an Error object to vfio_connect_container
The error is currently simply reported in vfio_get_group. Don't
bother too much with the prefix which will be handled at upper level,
later on.

Also return an error value in case container->error is not 0 and
the container is teared down.

On vfio_spapr_remove_window failure, we also report an error whereas
it was silent before.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17 10:57:59 -06:00
Eric Auger
7237011d05 vfio/pci: Pass an error object to vfio_pci_igd_opregion_init
Pass an error object to prepare for migration to VFIO-PCI realize.

In vfio_probe_igd_bar4_quirk, simply report the error.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17 10:57:59 -06:00
Eric Auger
7ef165b9a8 vfio/pci: Pass an error object to vfio_add_capabilities
Pass an error object to prepare for migration to VFIO-PCI realize.
The error is cascaded downto vfio_add_std_cap and then vfio_msi(x)_setup,
vfio_setup_pcie_cap.

vfio_add_ext_cap does not return anything else than 0 so let's transform
it into a void function.

Also use pci_add_capability2 which takes an error object.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17 10:57:58 -06:00
Eric Auger
7dfb34247e vfio/pci: Pass an error object to vfio_intx_enable
Pass an error object to prepare for migration to VFIO-PCI realize.

The error object is propagated down to vfio_intx_enable_kvm().

The three other callers, vfio_intx_enable_kvm(), vfio_msi_disable_common()
and vfio_pci_post_reset() do not propagate the error and simply call
error_reportf_err() with the ERR_PREFIX formatting.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17 10:57:58 -06:00
Eric Auger
008d0e2d7b vfio/pci: Pass an error object to vfio_msix_early_setup
Pass an error object to prepare for migration to VFIO-PCI realize.
The returned value will be removed later on.

We now format an error in case of reading failure for
- the MSIX flags
- the MSIX table,
- the MSIX PBA.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17 10:57:58 -06:00
Eric Auger
2312d907dd vfio/pci: Pass an error object to vfio_populate_device
Pass an error object to prepare for migration to VFIO-PCI realize.
The returned value will be removed later on.

The case where error recovery cannot be enabled is not converted into
an error object but directly reported through error_report, as before.
Populating an error instead would cause the future realize function to
fail, which is not wanted.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17 10:57:57 -06:00
Eric Auger
cde4279baa vfio/pci: Pass an error object to vfio_populate_vga
Pass an error object to prepare for the same operation in
vfio_populate_device. Eventually this contributes to the migration
to VFIO-PCI realize.

We now report an error on vfio_get_region_info failure.

vfio_probe_igd_bar4_quirk is not involved in the migration to realize
and simply calls error_reportf_err.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17 10:57:57 -06:00
Eric Auger
426ec9049e vfio/pci: Use local error object in vfio_initfn
To prepare for migration to realize, let's use a local error
object in vfio_initfn. Also let's use the same error prefix for all
error messages.

On top of the 1-1 conversion, we start using a common error prefix for
all error messages. We also introduce a similar warning prefix which will
be used later on.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17 10:57:56 -06:00
Peter Xu
cdb3081269 memory: introduce IOMMUNotifier and its caps
IOMMU Notifier list is used for notifying IO address mapping changes.
Currently VFIO is the only user.

However it is possible that future consumer like vhost would like to
only listen to part of its notifications (e.g., cache invalidations).

This patch introduced IOMMUNotifier and IOMMUNotfierFlag bits for a
finer grained control of it.

IOMMUNotifier contains a bitfield for the notify consumer describing
what kind of notification it is interested in. Currently two kinds of
notifications are defined:

- IOMMU_NOTIFIER_MAP:    for newly mapped entries (additions)
- IOMMU_NOTIFIER_UNMAP:  for entries to be removed (cache invalidates)

When registering the IOMMU notifier, we need to specify one or multiple
types of messages to listen to.

When notifications are triggered, its type will be checked against the
notifier's type bits, and only notifiers with registered bits will be
notified.

(For any IOMMU implementation, an in-place mapping change should be
 notified with an UNMAP followed by a MAP.)

Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1474606948-14391-2-git-send-email-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-09-27 08:59:16 +02:00
David Gibson
6d17a018d0 vfio/pci: Fix regression in MSI routing configuration
d1f6af6 "kvm-irqchip: simplify kvm_irqchip_add_msi_route" was a cleanup
of kvmchip routing configuration, that was mostly intended for x86.
However, it also contains a subtle change in behaviour which breaks EEH[1]
error recovery on certain VFIO passthrough devices on spapr guests.  So far
it's only been seen on a BCM5719 NIC on a POWER8 server, but there may be
other hardware with the same problem.  It's also possible there could be
circumstances where it causes a bug on x86 as well, though I don't know of
any obvious candidates.

Prior to d1f6af6, both vfio_msix_vector_do_use() and
vfio_add_kvm_msi_virq() used msg == NULL as a special flag to mark this
as the "dummy" vector used to make the host hardware state sync with the
guest expected hardware state in terms of MSI configuration.

Specifically that flag caused vfio_add_kvm_msi_virq() to become a no-op,
meaning the dummy irq would always be delivered via qemu. d1f6af6 changed
vfio_add_kvm_msi_virq() so it takes a vector number instead of the msg
parameter, and determines the correct message itself.  The test for !msg
was removed, and not replaced with anything there or in the caller.

With an spapr guest which has a VFIO device, if an EEH error occurs on the
host hardware, then the device will be isolated then reset.  This is a
combination of host and guest action, mediated by some EEH related
hypercalls.  I haven't fully traced the mechanics, but somehow installing
the kvm irqchip route for the dummy irq on the BCM5719 means that after EEH
reset and recovery, at least some irqs are no longer delivered to the
guest.

In particular, the guest never gets the link up event, and so the NIC is
effectively dead.

[1] EEH (Enhanced Error Handling) is an IBM POWER server specific PCI-*
    error reporting and recovery mechanism.  The concept is somewhat
    similar to PCI-E AER, but the details are different.

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1373802

Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Gavin Shan <gwshan@au1.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Cc: qemu-stable@nongnu.org
Fixes: d1f6af6a17 ("kvm-irqchip: simplify kvm_irqchip_add_msi_route")
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-09-15 10:41:36 -06:00
Laurent Vivier
e723b87103 trace-events: fix first line comment in trace-events
Documentation is docs/tracing.txt instead of docs/trace-events.txt.

find . -name trace-events -exec \
     sed -i "s?See docs/trace-events.txt for syntax documentation.?See docs/tracing.txt for syntax documentation.?" \
     {} \;

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-id: 1470669081-17860-1-git-send-email-lvivier@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-08-12 10:36:01 +01:00
Markus Armbruster
fea1c0999a vfio: Use error_report() instead of error_printf() for errors
Cc: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1470224274-31522-4-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-08-08 09:01:18 +02:00
Peter Xu
3f1fea0fb5 kvm-irqchip: do explicit commit when update irq
In the past, we are doing gsi route commit for each irqchip route
update. This is not efficient if we are updating lots of routes in the
same time. This patch removes the committing phase in
kvm_irqchip_update_msi_route(). Instead, we do explicit commit after all
routes updated.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-07-21 20:44:19 +03:00
Peter Xu
d1f6af6a17 kvm-irqchip: simplify kvm_irqchip_add_msi_route
Changing the original MSIMessage parameter in kvm_irqchip_add_msi_route
into the vector number. Vector index provides more information than the
MSIMessage, we can retrieve the MSIMessage using the vector easily. This
will avoid fetching MSIMessage every time before adding MSI routes.

Meanwhile, the vector info will be used in the coming patches to further
enable gsi route update notifications.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-07-21 20:44:18 +03:00
Alex Williamson
383a7af7ec vfio/pci: Hide ARI capability
QEMU supports ARI on downstream ports and assigned devices may support
ARI in their extended capabilities.  The endpoint ARI capability
specifies the next function, such that the OS doesn't need to walk
each possible function, however this next function is relative to the
host, not the guest.  This leads to device discovery issues when we
combine separate functions into virtual multi-function packages in a
guest.  For example, SR-IOV VFs are not enumerated by simply probing
the function address space, therefore the ARI next-function field is
zero.  When we combine multiple VFs together as a multi-function
device in the guest, the guest OS identifies ARI is enabled, relies on
this next-function field, and stops looking for additional function
after the first is found.

Long term we should expose the ARI capability to the guest to enable
configurations with more than 8 functions per slot, but this requires
additional QEMU PCI infrastructure to manage the next-function field
for multiple, otherwise independent devices.  In the short term,
hiding this capability allows equivalent functionality to what we
currently have on non-express chipsets.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
2016-07-18 10:55:17 -06:00
David Gibson
21bb3093e6 vfio/spapr: Remove stale ioctl() call
This ioctl() call to VFIO_IOMMU_SPAPR_TCE_REMOVE was left over from an
earlier version of the code and has since been folded into
vfio_spapr_remove_window().

It wasn't caught because although the argument structure has been removed,
the libc function remove() means this didn't trigger a compile failure.
The ioctl() was also almost certain to fail silently and harmlessly with
the bogus argument, so this wasn't caught in testing.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2016-07-18 10:40:27 +10:00
Markus Armbruster
a9c94277f0 Use #include "..." for our own headers, <...> for others
Tracked down with an ugly, brittle and probably buggy Perl script.

Also move includes converted to <...> up so they get included before
ours where that's obviously okay.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Tested-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
2016-07-12 16:19:16 +02:00