24563a587f
This documents the overall XIVE architecture and the XIVE support for sPAPR guest machines (pseries). It also provides documentation on the 'info pic' command. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190521082411.24719-1-clg@kaod.org> Reviewed-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
175 lines
5.9 KiB
ReStructuredText
175 lines
5.9 KiB
ReStructuredText
XIVE for sPAPR (pseries machines)
|
|
=================================
|
|
|
|
The POWER9 processor comes with a new interrupt controller
|
|
architecture, called XIVE as "eXternal Interrupt Virtualization
|
|
Engine". It supports a larger number of interrupt sources and offers
|
|
virtualization features which enables the HW to deliver interrupts
|
|
directly to virtual processors without hypervisor assistance.
|
|
|
|
A QEMU ``pseries`` machine (which is PAPR compliant) using POWER9
|
|
processors can run under two interrupt modes:
|
|
|
|
- *Legacy Compatibility Mode*
|
|
|
|
the hypervisor provides identical interfaces and similar
|
|
functionality to PAPR+ Version 2.7. This is the default mode
|
|
|
|
It is also referred as *XICS* in QEMU.
|
|
|
|
- *XIVE native exploitation mode*
|
|
|
|
the hypervisor provides new interfaces to manage the XIVE control
|
|
structures, and provides direct control for interrupt management
|
|
through MMIO pages.
|
|
|
|
Which interrupt modes can be used by the machine is negotiated with
|
|
the guest O/S during the Client Architecture Support negotiation
|
|
sequence. The two modes are mutually exclusive.
|
|
|
|
Both interrupt mode share the same IRQ number space. See below for the
|
|
layout.
|
|
|
|
CAS Negotiation
|
|
---------------
|
|
|
|
QEMU advertises the supported interrupt modes in the device tree
|
|
property "ibm,arch-vec-5-platform-support" in byte 23 and the OS
|
|
Selection for XIVE is indicated in the "ibm,architecture-vec-5"
|
|
property byte 23.
|
|
|
|
The interrupt modes supported by the machine depend on the CPU type
|
|
(POWER9 is required for XIVE) but also on the machine property
|
|
``ic-mode`` which can be set on the command line. It can take the
|
|
following values: ``xics``, ``xive``, ``dual`` and currently ``xics``
|
|
is the default but it may change in the future.
|
|
|
|
The choosen interrupt mode is activated after a reconfiguration done
|
|
in a machine reset.
|
|
|
|
XIVE Device tree properties
|
|
---------------------------
|
|
|
|
The properties for the PAPR interrupt controller node when the *XIVE
|
|
native exploitation mode* is selected shoud contain:
|
|
|
|
- ``device_type``
|
|
|
|
value should be "power-ivpe".
|
|
|
|
- ``compatible``
|
|
|
|
value should be "ibm,power-ivpe".
|
|
|
|
- ``reg``
|
|
|
|
contains the base address and size of the thread interrupt
|
|
managnement areas (TIMA), for the User level and for the Guest OS
|
|
level. Only the Guest OS level is taken into account today.
|
|
|
|
- ``ibm,xive-eq-sizes``
|
|
|
|
the size of the event queues. One cell per size supported, contains
|
|
log2 of size, in ascending order.
|
|
|
|
- ``ibm,xive-lisn-ranges``
|
|
|
|
the IRQ interrupt number ranges assigned to the guest for the IPIs.
|
|
|
|
The root node also exports :
|
|
|
|
- ``ibm,plat-res-int-priorities``
|
|
|
|
contains a list of priorities that the hypervisor has reserved for
|
|
its own use.
|
|
|
|
IRQ number space
|
|
----------------
|
|
|
|
IRQ Number space of the ``pseries`` machine is 8K wide and is the same
|
|
for both interrupt mode. The different ranges are defined as follow :
|
|
|
|
- ``0x0000 .. 0x0FFF`` 4K CPU IPIs (only used under XIVE)
|
|
- ``0x1000 .. 0x1000`` 1 EPOW
|
|
- ``0x1001 .. 0x1001`` 1 HOTPLUG
|
|
- ``0x1100 .. 0x11FF`` 256 VIO devices
|
|
- ``0x1200 .. 0x127F`` 32 PHBs devices
|
|
- ``0x1280 .. 0x12FF`` unused
|
|
- ``0x1300 .. 0x1FFF`` PHB MSIs
|
|
|
|
Monitoring XIVE
|
|
---------------
|
|
|
|
The state of the XIVE interrupt controller can be queried through the
|
|
monitor commands ``info pic``. The output comes in two parts.
|
|
|
|
First, the state of the thread interrupt context registers is dumped
|
|
for each CPU :
|
|
|
|
::
|
|
|
|
(qemu) info pic
|
|
CPU[0000]: QW NSR CPPR IPB LSMFB ACK# INC AGE PIPR W2
|
|
CPU[0000]: USER 00 00 00 00 00 00 00 00 00000000
|
|
CPU[0000]: OS 00 ff 00 00 ff 00 ff ff 80000400
|
|
CPU[0000]: POOL 00 00 00 00 00 00 00 00 00000000
|
|
CPU[0000]: PHYS 00 00 00 00 00 00 00 ff 00000000
|
|
...
|
|
|
|
In the case of a ``pseries`` machine, QEMU acts as the hypervisor and only
|
|
the O/S and USER register rings make sense. ``W2`` contains the vCPU CAM
|
|
line which is set to the VP identifier.
|
|
|
|
Then comes the routing information which aggregates the EAS and the
|
|
END configuration:
|
|
|
|
::
|
|
|
|
...
|
|
LISN PQ EISN CPU/PRIO EQ
|
|
00000000 MSI -- 00000010 0/6 380/16384 @1fe3e0000 ^1 [ 80000010 ... ]
|
|
00000001 MSI -- 00000010 1/6 305/16384 @1fc230000 ^1 [ 80000010 ... ]
|
|
00000002 MSI -- 00000010 2/6 220/16384 @1fc2f0000 ^1 [ 80000010 ... ]
|
|
00000003 MSI -- 00000010 3/6 201/16384 @1fc390000 ^1 [ 80000010 ... ]
|
|
00000004 MSI -Q M 00000000
|
|
00000005 MSI -Q M 00000000
|
|
00000006 MSI -Q M 00000000
|
|
00000007 MSI -Q M 00000000
|
|
00001000 MSI -- 00000012 0/6 380/16384 @1fe3e0000 ^1 [ 80000010 ... ]
|
|
00001001 MSI -- 00000013 0/6 380/16384 @1fe3e0000 ^1 [ 80000010 ... ]
|
|
00001100 MSI -- 00000100 1/6 305/16384 @1fc230000 ^1 [ 80000010 ... ]
|
|
00001101 MSI -Q M 00000000
|
|
00001200 LSI -Q M 00000000
|
|
00001201 LSI -Q M 00000000
|
|
00001202 LSI -Q M 00000000
|
|
00001203 LSI -Q M 00000000
|
|
00001300 MSI -- 00000102 1/6 305/16384 @1fc230000 ^1 [ 80000010 ... ]
|
|
00001301 MSI -- 00000103 2/6 220/16384 @1fc2f0000 ^1 [ 80000010 ... ]
|
|
00001302 MSI -- 00000104 3/6 201/16384 @1fc390000 ^1 [ 80000010 ... ]
|
|
|
|
The source information and configuration:
|
|
|
|
- The ``LISN`` column outputs the interrupt number of the source in
|
|
range ``[ 0x0 ... 0x1FFF ]`` and its type : ``MSI`` or ``LSI``
|
|
- The ``PQ`` column reflects the state of the PQ bits of the source :
|
|
|
|
- ``--`` source is ready to take events
|
|
- ``P-`` an event was sent and an EOI is PENDING
|
|
- ``PQ`` an event was QUEUED
|
|
- ``-Q`` source is OFF
|
|
|
|
a ``M`` indicates that source is *MASKED* at the EAS level,
|
|
|
|
The targeting configuration :
|
|
|
|
- The ``EISN`` column is the event data that will be queued in the event
|
|
queue of the O/S.
|
|
- The ``CPU/PRIO`` column is the tuple defining the CPU number and
|
|
priority queue serving the source.
|
|
- The ``EQ`` column outputs :
|
|
|
|
- the current index of the event queue/ the max number of entries
|
|
- the O/S event queue address
|
|
- the toggle bit
|
|
- the last entries that were pushed in the event queue.
|