docs/devel: Add VFIO iommufd backend documentation
Suggested-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
This commit is contained in:
parent
6106a32914
commit
98dad2b019
@ -2176,6 +2176,7 @@ F: backends/iommufd.c
|
||||
F: include/sysemu/iommufd.h
|
||||
F: include/qemu/chardev_open.h
|
||||
F: util/chardev_open.c
|
||||
F: docs/devel/vfio-iommufd.rst
|
||||
|
||||
vhost
|
||||
M: Michael S. Tsirkin <mst@redhat.com>
|
||||
|
@ -18,5 +18,6 @@ Details about QEMU's various subsystems including how to add features to them.
|
||||
s390-dasd-ipl
|
||||
tracing
|
||||
vfio-migration
|
||||
vfio-iommufd
|
||||
writing-monitor-commands
|
||||
virtio-backends
|
||||
|
166
docs/devel/vfio-iommufd.rst
Normal file
166
docs/devel/vfio-iommufd.rst
Normal file
@ -0,0 +1,166 @@
|
||||
===============================
|
||||
IOMMUFD BACKEND usage with VFIO
|
||||
===============================
|
||||
|
||||
(Same meaning for backend/container/BE)
|
||||
|
||||
With the introduction of iommufd, the Linux kernel provides a generic
|
||||
interface for user space drivers to propagate their DMA mappings to kernel
|
||||
for assigned devices. While the legacy kernel interface is group-centric,
|
||||
the new iommufd interface is device-centric, relying on device fd and iommufd.
|
||||
|
||||
To support both interfaces in the QEMU VFIO device, introduce a base container
|
||||
to abstract the common part of VFIO legacy and iommufd container. So that the
|
||||
generic VFIO code can use either container.
|
||||
|
||||
The base container implements generic functions such as memory_listener and
|
||||
address space management whereas the derived container implements callbacks
|
||||
specific to either legacy or iommufd. Each container has its own way to setup
|
||||
secure context and dma management interface. The below diagram shows how it
|
||||
looks like with both containers.
|
||||
|
||||
::
|
||||
|
||||
VFIO AddressSpace/Memory
|
||||
+-------+ +----------+ +-----+ +-----+
|
||||
| pci | | platform | | ap | | ccw |
|
||||
+---+---+ +----+-----+ +--+--+ +--+--+ +----------------------+
|
||||
| | | | | AddressSpace |
|
||||
| | | | +------------+---------+
|
||||
+---V-----------V-----------V--------V----+ /
|
||||
| VFIOAddressSpace | <------------+
|
||||
| | | MemoryListener
|
||||
| VFIOContainerBase list |
|
||||
+-------+----------------------------+----+
|
||||
| |
|
||||
| |
|
||||
+-------V------+ +--------V----------+
|
||||
| iommufd | | vfio legacy |
|
||||
| container | | container |
|
||||
+-------+------+ +--------+----------+
|
||||
| |
|
||||
| /dev/iommu | /dev/vfio/vfio
|
||||
| /dev/vfio/devices/vfioX | /dev/vfio/$group_id
|
||||
Userspace | |
|
||||
============+============================+===========================
|
||||
Kernel | device fd |
|
||||
+---------------+ | group/container fd
|
||||
| (BIND_IOMMUFD | | (SET_CONTAINER/SET_IOMMU)
|
||||
| ATTACH_IOAS) | | device fd
|
||||
| | |
|
||||
| +-------V------------V-----------------+
|
||||
iommufd | | vfio |
|
||||
(map/unmap | +---------+--------------------+-------+
|
||||
ioas_copy) | | | map/unmap
|
||||
| | |
|
||||
+------V------+ +-----V------+ +------V--------+
|
||||
| iommfd core | | device | | vfio iommu |
|
||||
+-------------+ +------------+ +---------------+
|
||||
|
||||
* Secure Context setup
|
||||
|
||||
- iommufd BE: uses device fd and iommufd to setup secure context
|
||||
(bind_iommufd, attach_ioas)
|
||||
- vfio legacy BE: uses group fd and container fd to setup secure context
|
||||
(set_container, set_iommu)
|
||||
|
||||
* Device access
|
||||
|
||||
- iommufd BE: device fd is opened through ``/dev/vfio/devices/vfioX``
|
||||
- vfio legacy BE: device fd is retrieved from group fd ioctl
|
||||
|
||||
* DMA Mapping flow
|
||||
|
||||
1. VFIOAddressSpace receives MemoryRegion add/del via MemoryListener
|
||||
2. VFIO populates DMA map/unmap via the container BEs
|
||||
* iommufd BE: uses iommufd
|
||||
* vfio legacy BE: uses container fd
|
||||
|
||||
Example configuration
|
||||
=====================
|
||||
|
||||
Step 1: configure the host device
|
||||
---------------------------------
|
||||
|
||||
It's exactly same as the VFIO device with legacy VFIO container.
|
||||
|
||||
Step 2: configure QEMU
|
||||
----------------------
|
||||
|
||||
Interactions with the ``/dev/iommu`` are abstracted by a new iommufd
|
||||
object (compiled in with the ``CONFIG_IOMMUFD`` option).
|
||||
|
||||
Any QEMU device (e.g. VFIO device) wishing to use ``/dev/iommu`` must
|
||||
be linked with an iommufd object. It gets a new optional property
|
||||
named iommufd which allows to pass an iommufd object. Take ``vfio-pci``
|
||||
device for example:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
-object iommufd,id=iommufd0
|
||||
-device vfio-pci,host=0000:02:00.0,iommufd=iommufd0
|
||||
|
||||
Note the ``/dev/iommu`` and VFIO cdev can be externally opened by a
|
||||
management layer. In such a case the fd is passed, the fd supports a
|
||||
string naming the fd or a number, for example:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
-object iommufd,id=iommufd0,fd=22
|
||||
-device vfio-pci,iommufd=iommufd0,fd=23
|
||||
|
||||
If the ``fd`` property is not passed, the fd is opened by QEMU.
|
||||
|
||||
If no ``iommufd`` object is passed to the ``vfio-pci`` device, iommufd
|
||||
is not used and the user gets the behavior based on the legacy VFIO
|
||||
container:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
-device vfio-pci,host=0000:02:00.0
|
||||
|
||||
Supported platform
|
||||
==================
|
||||
|
||||
Supports x86, ARM and s390x currently.
|
||||
|
||||
Caveats
|
||||
=======
|
||||
|
||||
Dirty page sync
|
||||
---------------
|
||||
|
||||
Dirty page sync with iommufd backend is unsupported yet, live migration is
|
||||
disabled by default. But it can be force enabled like below, low efficient
|
||||
though.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
-object iommufd,id=iommufd0
|
||||
-device vfio-pci,host=0000:02:00.0,iommufd=iommufd0,enable-migration=on
|
||||
|
||||
P2P DMA
|
||||
-------
|
||||
|
||||
PCI p2p DMA is unsupported as IOMMUFD doesn't support mapping hardware PCI
|
||||
BAR region yet. Below warning shows for assigned PCI device, it's not a bug.
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
qemu-system-x86_64: warning: IOMMU_IOAS_MAP failed: Bad address, PCI BAR?
|
||||
qemu-system-x86_64: vfio_container_dma_map(0x560cb6cb1620, 0xe000000021000, 0x3000, 0x7f32ed55c000) = -14 (Bad address)
|
||||
|
||||
FD passing with mdev
|
||||
--------------------
|
||||
|
||||
``vfio-pci`` device checks sysfsdev property to decide if backend is a mdev.
|
||||
If FD passing is used, there is no way to know that and the mdev is treated
|
||||
like a real PCI device. There is an error as below if user wants to enable
|
||||
RAM discarding for mdev.
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
qemu-system-x86_64: -device vfio-pci,iommufd=iommufd0,x-balloon-allowed=on,fd=9: vfio VFIO_FD9: x-balloon-allowed only potentially compatible with mdev devices
|
||||
|
||||
``vfio-ap`` and ``vfio-ccw`` devices don't have same issue as their backend
|
||||
devices are always mdev and RAM discarding is force enabled.
|
Loading…
Reference in New Issue
Block a user