ppc 7.0 queue:

* More documentation updates (Leonardo)
 * Fixes for the 7448 CPU (Fabiano and Cedric)
 * Final removal of 403 CPUs and the .load_state_old handler (Cedric)
 * More cleanups of PHB4 models (Daniel and Cedric)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEoPZlSPBIlev+awtgUaNDx8/77KEFAmHmq5QACgkQUaNDx8/7
 7KF1fxAAypwqJyihGosGWau4Wfmh2fIXPLJTL7EWXbEslFJ5rq71btGz+pDLh1++
 2M4SPMami66+1RggCrM48UuePWhK5EervvWj3l1UZ5067qVcAI8x3pNqgZBnEACJ
 z8YIZ1Cr3f3vThefOja7Mor286Z0nlWZD4WyYYtJBEmnhmTk15xyqjtvo8dv664q
 5segf/yLbgH7bUG2gQSGCtW8wFW6qrBuSVXZJjVLpmXexVUxvCsUgsGoYXM4wHUK
 nO1nBP690bv5vhy5E/2YWPpoCoEJ9Ynx2IgTd3D06GxvCJfBgxYYLSwEePcxW1aa
 XCYRNz3soUOoNzLCRN/9stWW6FZGYyvfZZfrhBm5vOKlLfnFkC4vP4/xOrYbIgcP
 pNp4P0h+cZ/9E4UfeX0ifdsTCzOa0GXy87MFUYLM9BBUf4EUQy/2+AwGhZZVD7Co
 RhTm30OHkK4BBb263X2d+TBOp7JVlGfwq1toESwvr5BslVYEz3dGgElim4W54VXU
 jlT6d5XowhnBsRrenIsjEx7ILKDpZg/WkJL3FRW/FEM8IcpiaXV9Ps6bInmMjyRu
 zDgRnPPEusomNoonl2tfjiyzlOCogPQBj+Uh7Jt0lojxHllyHzwm4Jrg0CQcQyZN
 4qblJg9/L3IL98pkk13ODrtEWofcI1eZ/He8kUO+0SOH9Ykp7Lc=
 =M4/2
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/legoater/tags/pull-ppc-20220118' into staging

ppc 7.0 queue:

* More documentation updates (Leonardo)
* Fixes for the 7448 CPU (Fabiano and Cedric)
* Final removal of 403 CPUs and the .load_state_old handler (Cedric)
* More cleanups of PHB4 models (Daniel and Cedric)

# gpg: Signature made Tue 18 Jan 2022 11:59:16 GMT
# gpg:                using RSA key A0F66548F04895EBFE6B0B6051A343C7CFFBECA1
# gpg: Good signature from "Cédric Le Goater <clg@kaod.org>" [undefined]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: A0F6 6548 F048 95EB FE6B  0B60 51A3 43C7 CFFB ECA1

* remotes/legoater/tags/pull-ppc-20220118: (31 commits)
  ppc/pnv: Remove PHB4 version property
  ppc/pnv: Add a 'rp_model' class attribute for the PHB4 PEC
  ppc/pnv: Move root port allocation under pnv_pec_default_phb_realize()
  ppc/pnv: rename pnv_pec_stk_update_map()
  ppc/pnv: remove PnvPhb4PecStack object
  ppc/pnv: make PECs create and realize PHB4s
  ppc/pnv: remove PnvPhb4PecStack::stack_no
  ppc/pnv: move default_phb_realize() to pec_realize()
  ppc/pnv: remove stack pointer from PnvPHB4
  ppc/pnv: reduce stack->stack_no usage
  ppc/pnv: introduce PnvPHB4 'pec' property
  ppc/pnv: move phb_regs_mr to PnvPHB4
  ppc/pnv: move nest_regs_mr to PnvPHB4
  ppc/pnv: change pnv_pec_stk_update_map() to use PnvPHB4
  ppc/pnv: move nest_regs[] to PnvPHB4
  ppc/pnv: move mmbar0/mmbar1 and friends to PnvPHB4
  ppc/pnv: change pnv_phb4_update_regions() to use PnvPHB4
  ppc/pnv: move intbar to PnvPHB4
  ppc/pnv: move phbbar to PnvPHB4
  ppc/pnv: move PCI registers to PnvPHB4
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This commit is contained in:
Peter Maydell 2022-01-18 19:43:33 +00:00
commit 0dabdd6b3a
15 changed files with 938 additions and 945 deletions

View File

@ -0,0 +1,510 @@
=============================
sPAPR Dynamic Reconfiguration
=============================
sPAPR or pSeries guests make use of a facility called dynamic reconfiguration
to handle hot plugging of dynamic "physical" resources like PCI cards, or
"logical"/para-virtual resources like memory, CPUs, and "physical"
host-bridges, which are generally managed by the host/hypervisor and provided
to guests as virtualized resources. The specifics of dynamic reconfiguration
are documented extensively in section 13 of the Linux on Power Architecture
Reference document ([LoPAR]_). This document provides a summary of that
information as it applies to the implementation within QEMU.
Dynamic-reconfiguration Connectors
==================================
To manage hot plug/unplug of these resources, a firmware abstraction known as
a Dynamic Resource Connector (DRC) is used to assign a particular dynamic
resource to the guest, and provide an interface for the guest to manage
configuration/removal of the resource associated with it.
Device tree description of DRCs
===============================
A set of four Open Firmware device tree array properties are used to describe
the name/index/power-domain/type of each DRC allocated to a guest at
boot time. There may be multiple sets of these arrays, rooted at different
paths in the device tree depending on the type of resource the DRCs manage.
In some cases, the DRCs themselves may be provided by a dynamic resource,
such as the DRCs managing PCI slots on a hot plugged PHB. In this case the
arrays would be fetched as part of the device tree retrieval interfaces
for hot plugged resources described under :ref:`guest-host-interface`.
The array properties are described below. Each entry/element in an array
describes the DRC identified by the element in the corresponding position
of ``ibm,drc-indexes``:
``ibm,drc-names``
-----------------
First 4-bytes: big-endian (BE) encoded integer denoting the number of entries.
Each entry: a NULL-terminated ``<name>`` string encoded as a byte array.
``<name>`` values for logical/virtual resources are defined in the Linux on
Power Architecture Reference ([LoPAR]_) section 13.5.2.4, and basically
consist of the type of the resource followed by a space and a numerical
value that's unique across resources of that type.
``<name>`` values for "physical" resources such as PCI or VIO devices are
defined as being "location codes", which are the "location labels" of each
encapsulating device, starting from the chassis down to the individual slot
for the device, concatenated by a hyphen. This provides a mapping of
resources to a physical location in a chassis for debugging purposes. For
QEMU, this mapping is less important, so we assign a location code that
conforms to naming specifications, but is simply a location label for the
slot by itself to simplify the implementation. The naming convention for
location labels is documented in detail in the [LoPAR]_ section 12.3.1.5,
and in our case amounts to using ``C<n>`` for PCI/VIO device slots, where
``<n>`` is unique across all PCI/VIO device slots.
``ibm,drc-indexes``
-------------------
First 4-bytes: BE-encoded integer denoting the number of entries.
Each 4-byte entry: BE-encoded ``<index>`` integer that is unique across all
DRCs in the machine.
``<index>`` is arbitrary, but in the case of QEMU we try to maintain the
convention used to assign them to pSeries guests on pHyp (the hypervisor
portion of PowerVM):
``bit[31:28]``: integer encoding of ``<type>``, where ``<type>`` is:
``1`` for CPU resource.
``2`` for PHB resource.
``3`` for VIO resource.
``4`` for PCI resource.
``8`` for memory resource.
``bit[27:0]``: integer encoding of ``<id>``, where ``<id>`` is unique
across all resources of specified type.
``ibm,drc-power-domains``
-------------------------
First 4-bytes: BE-encoded integer denoting the number of entries.
Each 4-byte entry: 32-bit, BE-encoded ``<index>`` integer that specifies the
power domain the resource will be assigned to. In the case of QEMU we
associated all resources with a "live insertion" domain, where the power is
assumed to be managed automatically. The integer value for this domain is a
special value of ``-1``.
``ibm,drc-types``
-----------------
First 4-bytes: BE-encoded integer denoting the number of entries.
Each entry: a NULL-terminated ``<type>`` string encoded as a byte array.
``<type>`` is assigned as follows:
"CPU" for a CPU.
"PHB" for a physical host-bridge.
"SLOT" for a VIO slot.
"28" for a PCI slot.
"MEM" for memory resource.
.. _guest-host-interface:
Guest->Host interface to manage dynamic resources
=================================================
Each DRC is given a globally unique DRC index, and resources associated with a
particular DRC are configured/managed by the guest via a number of RTAS calls
which reference individual DRCs based on the DRC index. This can be considered
the guest->host interface.
``rtas-set-power-level``
------------------------
Set the power level for a specified power domain.
``arg[0]``: integer identifying power domain.
``arg[1]``: new power level for the domain, ``0-100``.
``output[0]``: status, ``0`` on success.
``output[1]``: power level after command.
``rtas-get-power-level``
------------------------
Get the power level for a specified power domain.
``arg[0]``: integer identifying power domain.
``output[0]``: status, ``0`` on success.
``output[1]``: current power level.
``rtas-set-indicator``
----------------------
Set the state of an indicator or sensor.
``arg[0]``: integer identifying sensor/indicator type.
``arg[1]``: index of sensor, for DR-related sensors this is generally the DRC
index.
``arg[2]``: desired sensor value.
``output[0]``: status, ``0`` on success.
For the purpose of this document we focus on the indicator/sensor types
associated with a DRC. The types are:
* ``9001``: ``isolation-state``, controls/indicates whether a device has been
made accessible to a guest. Supported sensor values:
``0``: ``isolate``, device is made inaccessible by guest OS.
``1``: ``unisolate``, device is made available to guest OS.
* ``9002``: ``dr-indicator``, controls "visual" indicator associated with
device. Supported sensor values:
``0``: ``inactive``, resource may be safely removed.
``1``: ``active``, resource is in use and cannot be safely removed.
``2``: ``identify``, used to visually identify slot for interactive hot plug.
``3``: ``action``, in most cases, used in the same manner as identify.
* ``9003``: ``allocation-state``, generally only used for "logical" DR resources
to request the allocation/deallocation of a resource prior to acquiring it via
``isolation-state->unisolate``, or after releasing it via
``isolation-state->isolate``, respectively. For "physical" DR (like PCI
hot plug/unplug) the pre-allocation of the resource is implied and this sensor
is unused. Supported sensor values:
``0``: ``unusable``, tell firmware/system the resource can be
unallocated/reclaimed and added back to the system resource pool.
``1``: ``usable``, request the resource be allocated/reserved for use by
guest OS.
``2``: ``exchange``, used to allocate a spare resource to use for fail-over
in certain situations. Unused in QEMU.
``3``: ``recover``, used to reclaim a previously allocated resource that's
not currently allocated to the guest OS. Unused in QEMU.
``rtas-get-sensor-state:``
--------------------------
Used to read an indicator or sensor value.
``arg[0]``: integer identifying sensor/indicator type.
``arg[1]``: index of sensor, for DR-related sensors this is generally the DRC
index
``output[0]``: status, 0 on success
For DR-related operations, the only noteworthy sensor is ``dr-entity-sense``,
which has a type value of ``9003``, as ``allocation-state`` does in the case of
``rtas-set-indicator``. The semantics/encodings of the sensor values are
distinct however.
Supported sensor values for ``dr-entity-sense`` (``9003``) sensor:
``0``: empty.
For physical resources: DRC/slot is empty.
For logical resources: unused.
``1``: present.
For physical resources: DRC/slot is populated with a device/resource.
For logical resources: resource has been allocated to the DRC.
``2``: unusable.
For physical resources: unused.
For logical resources: DRC has no resource allocated to it.
``3``: exchange.
For physical resources: unused.
For logical resources: resource available for exchange (see
``allocation-state`` sensor semantics above).
``4``: recovery.
For physical resources: unused.
For logical resources: resource available for recovery (see
``allocation-state`` sensor semantics above).
``rtas-ibm-configure-connector``
--------------------------------
Used to fetch an OpenFirmware device tree description of the resource associated
with a particular DRC.
``arg[0]``: guest physical address of 4096-byte work area buffer.
``arg[1]``: 0, or address of additional 4096-byte work area buffer; only
non-zero if a prior RTAS response indicated a need for additional memory.
``output[0]``: status:
``0``: completed transmittal of device tree node.
``1``: instruct guest to prepare for next device tree sibling node.
``2``: instruct guest to prepare for next device tree child node.
``3``: instruct guest to prepare for next device tree property.
``4``: instruct guest to ascend to parent device tree node.
``5``: instruct guest to provide additional work-area buffer via ``arg[1]``.
``990x``: instruct guest that operation took too long and to try again
later.
The DRC index is encoded in the first 4-bytes of the first work area buffer.
Work area (``wa``) layout, using 4-byte offsets:
``wa[0]``: DRC index of the DRC to fetch device tree nodes from.
``wa[1]``: ``0`` (hard-coded).
``wa[2]``:
For next-sibling/next-child response:
``wa`` offset of null-terminated string denoting the new node's name.
For next-property response:
``wa`` offset of null-terminated string denoting new property's name.
``wa[3]``: for next-property response (unused otherwise):
Byte-length of new property's value.
``wa[4]``: for next-property response (unused otherwise):
New property's value, encoded as an OFDT-compatible byte array.
Hot plug/unplug events
======================
For most DR operations, the hypervisor will issue host->guest add/remove events
using the EPOW/check-exception notification framework, where the host issues a
check-exception interrupt, then provides an RTAS event log via an
rtas-check-exception call issued by the guest in response. This framework is
documented by PAPR+ v2.7, and already use in by QEMU for generating powerdown
requests via EPOW events.
For DR, this framework has been extended to include hotplug events, which were
previously unneeded due to direct manipulation of DR-related guest userspace
tools by host-level management such as an HMC. This level of management is not
applicable to KVM on Power, hence the reason for extending the notification
framework to support hotplug events.
The format for these EPOW-signalled events is described below under
:ref:`hot-plug-unplug-event-structure`. Note that these events are not formally
part of the PAPR+ specification, and have been superseded by a newer format,
also described below under :ref:`hot-plug-unplug-event-structure`, and so are
now deemed a "legacy" format. The formats are similar, but the "modern" format
contains additional fields/flags, which are denoted for the purposes of this
documentation with ``#ifdef GUEST_SUPPORTS_MODERN`` guards.
QEMU should assume support only for "legacy" fields/flags unless the guest
advertises support for the "modern" format via
``ibm,client-architecture-support`` hcall by setting byte 5, bit 6 of it's
``ibm,architecture-vec-5`` option vector structure (as described by [LoPAR]_,
section B.5.2.3). As with "legacy" format events, "modern" format events are
surfaced to the guest via check-exception RTAS calls, but use a dedicated event
source to signal the guest. This event source is advertised to the guest by the
addition of a ``hot-plug-events`` node under ``/event-sources`` node of the
guest's device tree using the standard format described in [LoPAR]_,
section B.5.12.2.
.. _hot-plug-unplug-event-structure:
Hot plug/unplug event structure
===============================
The hot plug specific payload in QEMU is implemented as follows (with all values
encoded in big-endian format):
.. code-block:: c
struct rtas_event_log_v6_hp {
#define SECTION_ID_HOTPLUG 0x4850 /* HP */
struct section_header {
uint16_t section_id; /* set to SECTION_ID_HOTPLUG */
uint16_t section_length; /* sizeof(rtas_event_log_v6_hp),
* plus the length of the DRC name
* if a DRC name identifier is
* specified for hotplug_identifier
*/
uint8_t section_version; /* version 1 */
uint8_t section_subtype; /* unused */
uint16_t creator_component_id; /* unused */
} hdr;
#define RTAS_LOG_V6_HP_TYPE_CPU 1
#define RTAS_LOG_V6_HP_TYPE_MEMORY 2
#define RTAS_LOG_V6_HP_TYPE_SLOT 3
#define RTAS_LOG_V6_HP_TYPE_PHB 4
#define RTAS_LOG_V6_HP_TYPE_PCI 5
uint8_t hotplug_type; /* type of resource/device */
#define RTAS_LOG_V6_HP_ACTION_ADD 1
#define RTAS_LOG_V6_HP_ACTION_REMOVE 2
uint8_t hotplug_action; /* action (add/remove) */
#define RTAS_LOG_V6_HP_ID_DRC_NAME 1
#define RTAS_LOG_V6_HP_ID_DRC_INDEX 2
#define RTAS_LOG_V6_HP_ID_DRC_COUNT 3
#ifdef GUEST_SUPPORTS_MODERN
#define RTAS_LOG_V6_HP_ID_DRC_COUNT_INDEXED 4
#endif
uint8_t hotplug_identifier; /* type of the resource identifier,
* which serves as the discriminator
* for the 'drc' union field below
*/
#ifdef GUEST_SUPPORTS_MODERN
uint8_t capabilities; /* capability flags, currently unused
* by QEMU
*/
#else
uint8_t reserved;
#endif
union {
uint32_t index; /* DRC index of resource to take action
* on
*/
uint32_t count; /* number of DR resources to take
* action on (guest chooses which)
*/
#ifdef GUEST_SUPPORTS_MODERN
struct {
uint32_t count; /* number of DR resources to take
* action on
*/
uint32_t index; /* DRC index of first resource to take
* action on. guest will take action
* on DRC index <index> through
* DRC index <index + count - 1> in
* sequential order
*/
} count_indexed;
#endif
char name[1]; /* string representing the name of the
* DRC to take action on
*/
} drc;
} QEMU_PACKED;
``ibm,lrdr-capacity``
=====================
``ibm,lrdr-capacity`` is a property in the /rtas device tree node that
identifies the dynamic reconfiguration capabilities of the guest. It consists
of a triple consisting of ``<phys>``, ``<size>`` and ``<maxcpus>``.
``<phys>``, encoded in BE format represents the maximum address in bytes and
hence the maximum memory that can be allocated to the guest.
``<size>``, encoded in BE format represents the size increments in which
memory can be hot-plugged to the guest.
``<maxcpus>``, a BE-encoded integer, represents the maximum number of
processors that the guest can have.
``pseries`` guests use this property to note the maximum allowed CPUs for the
guest.
``ibm,dynamic-reconfiguration-memory``
======================================
``ibm,dynamic-reconfiguration-memory`` is a device tree node that represents
dynamically reconfigurable logical memory blocks (LMB). This node is generated
only when the guest advertises the support for it via
``ibm,client-architecture-support`` call. Memory that is not dynamically
reconfigurable is represented by ``/memory`` nodes. The properties of this node
that are of interest to the sPAPR memory hotplug implementation in QEMU are
described here.
``ibm,lmb-size``
----------------
This 64-bit integer defines the size of each dynamically reconfigurable LMB.
``ibm,associativity-lookup-arrays``
-----------------------------------
This property defines a lookup array in which the NUMA associativity
information for each LMB can be found. It is a property encoded array
that begins with an integer M, the number of associativity lists followed
by an integer N, the number of entries per associativity list and terminated
by M associativity lists each of length N integers.
This property provides the same information as given by ``ibm,associativity``
property in a ``/memory`` node. Each assigned LMB has an index value between
0 and M-1 which is used as an index into this table to select which
associativity list to use for the LMB. This index value for each LMB is defined
in ``ibm,dynamic-memory`` property.
``ibm,dynamic-memory``
----------------------
This property describes the dynamically reconfigurable memory. It is a
property encoded array that has an integer N, the number of LMBs followed
by N LMB list entries.
Each LMB list entry consists of the following elements:
- Logical address of the start of the LMB encoded as a 64-bit integer. This
corresponds to ``reg`` property in ``/memory`` node.
- DRC index of the LMB that corresponds to ``ibm,my-drc-index`` property
in a ``/memory`` node.
- Four bytes reserved for expansion.
- Associativity list index for the LMB that is used as an index into
``ibm,associativity-lookup-arrays`` property described earlier. This is used
to retrieve the right associativity list to be used for this LMB.
- A 32-bit flags word. The bit at bit position ``0x00000008`` defines whether
the LMB is assigned to the partition as of boot time.
``ibm,dynamic-memory-v2``
-------------------------
This property describes the dynamically reconfigurable memory. This is
an alternate and newer way to describe dynamically reconfigurable memory.
It is a property encoded array that has an integer N (the number of
LMB set entries) followed by N LMB set entries. There is an LMB set entry
for each sequential group of LMBs that share common attributes.
Each LMB set entry consists of the following elements:
- Number of sequential LMBs in the entry represented by a 32-bit integer.
- Logical address of the first LMB in the set encoded as a 64-bit integer.
- DRC index of the first LMB in the set.
- Associativity list index that is used as an index into
``ibm,associativity-lookup-arrays`` property described earlier. This
is used to retrieve the right associativity list to be used for all
the LMBs in this set.
- A 32-bit flags word that applies to all the LMBs in the set.

View File

@ -1,409 +0,0 @@
= sPAPR Dynamic Reconfiguration =
sPAPR/"pseries" guests make use of a facility called dynamic-reconfiguration
to handle hotplugging of dynamic "physical" resources like PCI cards, or
"logical"/paravirtual resources like memory, CPUs, and "physical"
host-bridges, which are generally managed by the host/hypervisor and provided
to guests as virtualized resources. The specifics of dynamic-reconfiguration
are documented extensively in PAPR+ v2.7, Section 13.1. This document
provides a summary of that information as it applies to the implementation
within QEMU.
== Dynamic-reconfiguration Connectors ==
To manage hotplug/unplug of these resources, a firmware abstraction known as
a Dynamic Resource Connector (DRC) is used to assign a particular dynamic
resource to the guest, and provide an interface for the guest to manage
configuration/removal of the resource associated with it.
== Device-tree description of DRCs ==
A set of 4 Open Firmware device tree array properties are used to describe
the name/index/power-domain/type of each DRC allocated to a guest at
boot-time. There may be multiple sets of these arrays, rooted at different
paths in the device tree depending on the type of resource the DRCs manage.
In some cases, the DRCs themselves may be provided by a dynamic resource,
such as the DRCs managing PCI slots on a hotplugged PHB. In this case the
arrays would be fetched as part of the device tree retrieval interfaces
for hotplugged resources described under "Guest->Host interface".
The array properties are described below. Each entry/element in an array
describes the DRC identified by the element in the corresponding position
of ibm,drc-indexes:
ibm,drc-names:
first 4-bytes: BE-encoded integer denoting the number of entries
each entry: a NULL-terminated <name> string encoded as a byte array
<name> values for logical/virtual resources are defined in PAPR+ v2.7,
Section 13.5.2.4, and basically consist of the type of the resource
followed by a space and a numerical value that's unique across resources
of that type.
<name> values for "physical" resources such as PCI or VIO devices are
defined as being "location codes", which are the "location labels" of
each encapsulating device, starting from the chassis down to the
individual slot for the device, concatenated by a hyphen. This provides
a mapping of resources to a physical location in a chassis for debugging
purposes. For QEMU, this mapping is less important, so we assign a
location code that conforms to naming specifications, but is simply a
location label for the slot by itself to simplify the implementation.
The naming convention for location labels is documented in detail in
PAPR+ v2.7, Section 12.3.1.5, and in our case amounts to using "C<n>"
for PCI/VIO device slots, where <n> is unique across all PCI/VIO
device slots.
ibm,drc-indexes:
first 4-bytes: BE-encoded integer denoting the number of entries
each 4-byte entry: BE-encoded <index> integer that is unique across all DRCs
in the machine
<index> is arbitrary, but in the case of QEMU we try to maintain the
convention used to assign them to pSeries guests on pHyp:
bit[31:28]: integer encoding of <type>, where <type> is:
1 for CPU resource
2 for PHB resource
3 for VIO resource
4 for PCI resource
8 for Memory resource
bit[27:0]: integer encoding of <id>, where <id> is unique across
all resources of specified type
ibm,drc-power-domains:
first 4-bytes: BE-encoded integer denoting the number of entries
each 4-byte entry: 32-bit, BE-encoded <index> integer that specifies the
power domain the resource will be assigned to. In the case of QEMU
we associated all resources with a "live insertion" domain, where the
power is assumed to be managed automatically. The integer value for
this domain is a special value of -1.
ibm,drc-types:
first 4-bytes: BE-encoded integer denoting the number of entries
each entry: a NULL-terminated <type> string encoded as a byte array
<type> is assigned as follows:
"CPU" for a CPU
"PHB" for a physical host-bridge
"SLOT" for a VIO slot
"28" for a PCI slot
"MEM" for memory resource
== Guest->Host interface to manage dynamic resources ==
Each DRC is given a globally unique DRC Index, and resources associated with
a particular DRC are configured/managed by the guest via a number of RTAS
calls which reference individual DRCs based on the DRC index. This can be
considered the guest->host interface.
rtas-set-power-level:
arg[0]: integer identifying power domain
arg[1]: new power level for the domain, 0-100
output[0]: status, 0 on success
output[1]: power level after command
Set the power level for a specified power domain
rtas-get-power-level:
arg[0]: integer identifying power domain
output[0]: status, 0 on success
output[1]: current power level
Get the power level for a specified power domain
rtas-set-indicator:
arg[0]: integer identifying sensor/indicator type
arg[1]: index of sensor, for DR-related sensors this is generally the
DRC index
arg[2]: desired sensor value
output[0]: status, 0 on success
Set the state of an indicator or sensor. For the purpose of this document we
focus on the indicator/sensor types associated with a DRC. The types are:
9001: isolation-state, controls/indicates whether a device has been made
accessible to a guest
supported sensor values:
0: isolate, device is made unaccessible by guest OS
1: unisolate, device is made available to guest OS
9002: dr-indicator, controls "visual" indicator associated with device
supported sensor values:
0: inactive, resource may be safely removed
1: active, resource is in use and cannot be safely removed
2: identify, used to visually identify slot for interactive hotplug
3: action, in most cases, used in the same manner as identify
9003: allocation-state, generally only used for "logical" DR resources to
request the allocation/deallocation of a resource prior to acquiring
it via isolation-state->unisolate, or after releasing it via
isolation-state->isolate, respectively. for "physical" DR (like PCI
hotplug/unplug) the pre-allocation of the resource is implied and
this sensor is unused.
supported sensor values:
0: unusable, tell firmware/system the resource can be
unallocated/reclaimed and added back to the system resource pool
1: usable, request the resource be allocated/reserved for use by
guest OS
2: exchange, used to allocate a spare resource to use for fail-over
in certain situations. unused in QEMU
3: recover, used to reclaim a previously allocated resource that's
not currently allocated to the guest OS. unused in QEMU
rtas-get-sensor-state:
arg[0]: integer identifying sensor/indicator type
arg[1]: index of sensor, for DR-related sensors this is generally the
DRC index
output[0]: status, 0 on success
Used to read an indicator or sensor value.
For DR-related operations, the only noteworthy sensor is dr-entity-sense,
which has a type value of 9003, as allocation-state does in the case of
rtas-set-indicator. The semantics/encodings of the sensor values are distinct
however:
supported sensor values for dr-entity-sense (9003) sensor:
0: empty,
for physical resources: DRC/slot is empty
for logical resources: unused
1: present,
for physical resources: DRC/slot is populated with a device/resource
for logical resources: resource has been allocated to the DRC
2: unusable,
for physical resources: unused
for logical resources: DRC has no resource allocated to it
3: exchange,
for physical resources: unused
for logical resources: resource available for exchange (see
allocation-state sensor semantics above)
4: recovery,
for physical resources: unused
for logical resources: resource available for recovery (see
allocation-state sensor semantics above)
rtas-ibm-configure-connector:
arg[0]: guest physical address of 4096-byte work area buffer
arg[1]: 0, or address of additional 4096-byte work area buffer. only non-zero
if a prior RTAS response indicated a need for additional memory
output[0]: status:
0: completed transmittal of device-tree node
1: instruct guest to prepare for next DT sibling node
2: instruct guest to prepare for next DT child node
3: instruct guest to prepare for next DT property
4: instruct guest to ascend to parent DT node
5: instruct guest to provide additional work-area buffer
via arg[1]
990x: instruct guest that operation took too long and to try
again later
Used to fetch an OF device-tree description of the resource associated with
a particular DRC. The DRC index is encoded in the first 4-bytes of the first
work area buffer.
Work area layout, using 4-byte offsets:
wa[0]: DRC index of the DRC to fetch device-tree nodes from
wa[1]: 0 (hard-coded)
wa[2]: for next-sibling/next-child response:
wa offset of null-terminated string denoting the new node's name
for next-property response:
wa offset of null-terminated string denoting new property's name
wa[3]: for next-property response (unused otherwise):
byte-length of new property's value
wa[4]: for next-property response (unused otherwise):
new property's value, encoded as an OFDT-compatible byte array
== hotplug/unplug events ==
For most DR operations, the hypervisor will issue host->guest add/remove events
using the EPOW/check-exception notification framework, where the host issues a
check-exception interrupt, then provides an RTAS event log via an
rtas-check-exception call issued by the guest in response. This framework is
documented by PAPR+ v2.7, and already use in by QEMU for generating powerdown
requests via EPOW events.
For DR, this framework has been extended to include hotplug events, which were
previously unneeded due to direct manipulation of DR-related guest userspace
tools by host-level management such as an HMC. This level of management is not
applicable to PowerKVM, hence the reason for extending the notification
framework to support hotplug events.
The format for these EPOW-signalled events is described below under
"hotplug/unplug event structure". Note that these events are not
formally part of the PAPR+ specification, and have been superseded by a
newer format, also described below under "hotplug/unplug event structure",
and so are now deemed a "legacy" format. The formats are similar, but the
"modern" format contains additional fields/flags, which are denoted for the
purposes of this documentation with "#ifdef GUEST_SUPPORTS_MODERN" guards.
QEMU should assume support only for "legacy" fields/flags unless the guest
advertises support for the "modern" format via ibm,client-architecture-support
hcall by setting byte 5, bit 6 of it's ibm,architecture-vec-5 option vector
structure (as described by LoPAPR v11, B.6.2.3). As with "legacy" format events,
"modern" format events are surfaced to the guest via check-exception RTAS calls,
but use a dedicated event source to signal the guest. This event source is
advertised to the guest by the addition of a "hot-plug-events" node under
"/event-sources" node of the guest's device tree using the standard format
described in LoPAPR v11, B.6.12.1.
== hotplug/unplug event structure ==
The hotplug-specific payload in QEMU is implemented as follows (with all values
encoded in big-endian format):
struct rtas_event_log_v6_hp {
#define SECTION_ID_HOTPLUG 0x4850 /* HP */
struct section_header {
uint16_t section_id; /* set to SECTION_ID_HOTPLUG */
uint16_t section_length; /* sizeof(rtas_event_log_v6_hp),
* plus the length of the DRC name
* if a DRC name identifier is
* specified for hotplug_identifier
*/
uint8_t section_version; /* version 1 */
uint8_t section_subtype; /* unused */
uint16_t creator_component_id; /* unused */
} hdr;
#define RTAS_LOG_V6_HP_TYPE_CPU 1
#define RTAS_LOG_V6_HP_TYPE_MEMORY 2
#define RTAS_LOG_V6_HP_TYPE_SLOT 3
#define RTAS_LOG_V6_HP_TYPE_PHB 4
#define RTAS_LOG_V6_HP_TYPE_PCI 5
uint8_t hotplug_type; /* type of resource/device */
#define RTAS_LOG_V6_HP_ACTION_ADD 1
#define RTAS_LOG_V6_HP_ACTION_REMOVE 2
uint8_t hotplug_action; /* action (add/remove) */
#define RTAS_LOG_V6_HP_ID_DRC_NAME 1
#define RTAS_LOG_V6_HP_ID_DRC_INDEX 2
#define RTAS_LOG_V6_HP_ID_DRC_COUNT 3
#ifdef GUEST_SUPPORTS_MODERN
#define RTAS_LOG_V6_HP_ID_DRC_COUNT_INDEXED 4
#endif
uint8_t hotplug_identifier; /* type of the resource identifier,
* which serves as the discriminator
* for the 'drc' union field below
*/
#ifdef GUEST_SUPPORTS_MODERN
uint8_t capabilities; /* capability flags, currently unused
* by QEMU
*/
#else
uint8_t reserved;
#endif
union {
uint32_t index; /* DRC index of resource to take action
* on
*/
uint32_t count; /* number of DR resources to take
* action on (guest chooses which)
*/
#ifdef GUEST_SUPPORTS_MODERN
struct {
uint32_t count; /* number of DR resources to take
* action on
*/
uint32_t index; /* DRC index of first resource to take
* action on. guest will take action
* on DRC index <index> through
* DRC index <index + count - 1> in
* sequential order
*/
} count_indexed;
#endif
char name[1]; /* string representing the name of the
* DRC to take action on
*/
} drc;
} QEMU_PACKED;
== ibm,lrdr-capacity ==
ibm,lrdr-capacity is a property in the /rtas device tree node that identifies
the dynamic reconfiguration capabilities of the guest. It consists of a triple
consisting of <phys>, <size> and <maxcpus>.
<phys>, encoded in BE format represents the maximum address in bytes and
hence the maximum memory that can be allocated to the guest.
<size>, encoded in BE format represents the size increments in which
memory can be hot-plugged to the guest.
<maxcpus>, a BE-encoded integer, represents the maximum number of
processors that the guest can have.
pseries guests use this property to note the maximum allowed CPUs for the
guest.
== ibm,dynamic-reconfiguration-memory ==
ibm,dynamic-reconfiguration-memory is a device tree node that represents
dynamically reconfigurable logical memory blocks (LMB). This node
is generated only when the guest advertises the support for it via
ibm,client-architecture-support call. Memory that is not dynamically
reconfigurable is represented by /memory nodes. The properties of this
node that are of interest to the sPAPR memory hotplug implementation
in QEMU are described here.
ibm,lmb-size
This 64bit integer defines the size of each dynamically reconfigurable LMB.
ibm,associativity-lookup-arrays
This property defines a lookup array in which the NUMA associativity
information for each LMB can be found. It is a property encoded array
that begins with an integer M, the number of associativity lists followed
by an integer N, the number of entries per associativity list and terminated
by M associativity lists each of length N integers.
This property provides the same information as given by ibm,associativity
property in a /memory node. Each assigned LMB has an index value between
0 and M-1 which is used as an index into this table to select which
associativity list to use for the LMB. This index value for each LMB
is defined in ibm,dynamic-memory property.
ibm,dynamic-memory
This property describes the dynamically reconfigurable memory. It is a
property encoded array that has an integer N, the number of LMBs followed
by N LMB list entries.
Each LMB list entry consists of the following elements:
- Logical address of the start of the LMB encoded as a 64bit integer. This
corresponds to reg property in /memory node.
- DRC index of the LMB that corresponds to ibm,my-drc-index property
in a /memory node.
- Four bytes reserved for expansion.
- Associativity list index for the LMB that is used as an index into
ibm,associativity-lookup-arrays property described earlier. This
is used to retrieve the right associativity list to be used for this
LMB.
- A 32bit flags word. The bit at bit position 0x00000008 defines whether
the LMB is assigned to the partition as of boot time.
ibm,dynamic-memory-v2
This property describes the dynamically reconfigurable memory. This is
an alternate and newer way to describe dynamically reconfigurable memory.
It is a property encoded array that has an integer N (the number of
LMB set entries) followed by N LMB set entries. There is an LMB set entry
for each sequential group of LMBs that share common attributes.
Each LMB set entry consists of the following elements:
- Number of sequential LMBs in the entry represented by a 32bit integer.
- Logical address of the first LMB in the set encoded as a 64bit integer.
- DRC index of the first LMB in the set.
- Associativity list index that is used as an index into
ibm,associativity-lookup-arrays property described earlier. This
is used to retrieve the right associativity list to be used for all
the LMBs in this set.
- A 32bit flags word that applies to all the LMBs in the set.
[1] http://thread.gmane.org/gmane.linux.ports.ppc.embedded/75350/focus=106867

View File

@ -0,0 +1,89 @@
===================================
Hypervisor calls and the Ultravisor
===================================
On PPC64 systems supporting Protected Execution Facility (PEF), system memory
can be placed in a secured region where only an ultravisor running in firmware
can provide access to. pSeries guests on such systems can communicate with
the ultravisor (via ultracalls) to switch to a secure virtual machine (SVM) mode
where the guest's memory is relocated to this secured region, making its memory
inaccessible to normal processes/guests running on the host.
The various ultracalls/hypercalls relating to SVM mode are currently only
documented internally, but are planned for direct inclusion into the Linux on
Power Architecture Reference document ([LoPAR]_). An internal ACR has been filed
to reserve a hypercall number range specific to this use case to avoid any
future conflicts with the IBM internally maintained Power Architecture Platform
Reference (PAPR+) documentation specification. This document summarizes some of
these details as they relate to QEMU.
Hypercalls needed by the ultravisor
===================================
Switching to SVM mode involves a number of hcalls issued by the ultravisor to
the hypervisor to orchestrate the movement of guest memory to secure memory and
various other aspects of the SVM mode. Numbers are assigned for these hcalls
within the reserved range ``0xEF00-0xEF80``. The below documents the hcalls
relevant to QEMU.
``H_TPM_COMM`` (``0xef10``)
---------------------------
SVM file systems are encrypted using a symmetric key. This key is then
wrapped/encrypted using the public key of a trusted system which has the private
key stored in the system's TPM. An Ultravisor will use this hcall to
unwrap/unseal the symmetric key using the system's TPM device or a TPM Resource
Manager associated with the device.
The Ultravisor sets up a separate session key with the TPM in advance during
host system boot. All sensitive in and out values will be encrypted using the
session key. Though the hypervisor will see the in and out buffers in raw form,
any sensitive contents will generally be encrypted using this session key.
Arguments:
``r3``: ``H_TPM_COMM`` (``0xef10``)
``r4``: ``TPM`` operation, one of:
``TPM_COMM_OP_EXECUTE`` (``0x1``): send a request to a TPM and receive a
response, opening a new TPM session if one has not already been opened.
``TPM_COMM_OP_CLOSE_SESSION`` (``0x2``): close the existing TPM session, if
any.
``r5``: ``in_buffer``, guest physical address of buffer containing the
request. Caller may use the same address for both request and response.
``r6``: ``in_size``, size of the in buffer. Must be less than or equal to
4 KB.
``r7``: ``out_buffer``, guest physical address of buffer to store the
response. Caller may use the same address for both request and response.
``r8``: ``out_size``, size of the out buffer. Must be at least 4 KB, as this
is the maximum request/response size supported by most TPM implementations,
including the TPM Resource Manager in the linux kernel.
Return values:
``r3``: one of the following values:
``H_Success``: request processed successfully.
``H_PARAMETER``: invalid TPM operation.
``H_P2``: ``in_buffer`` is invalid.
``H_P3``: ``in_size`` is invalid.
``H_P4``: ``out_buffer`` is invalid.
``H_P5``: ``out_size`` is invalid.
``H_RESOURCE``: problem communicating with TPM.
``H_FUNCTION``: TPM access is not currently allowed/configured.
``r4``: For ``TPM_COMM_OP_EXECUTE``, the size of the response will be stored
here upon success.

View File

@ -1,76 +0,0 @@
On PPC64 systems supporting Protected Execution Facility (PEF), system
memory can be placed in a secured region where only an "ultravisor"
running in firmware can provide to access it. pseries guests on such
systems can communicate with the ultravisor (via ultracalls) to switch to a
secure VM mode (SVM) where the guest's memory is relocated to this secured
region, making its memory inaccessible to normal processes/guests running on
the host.
The various ultracalls/hypercalls relating to SVM mode are currently
only documented internally, but are planned for direct inclusion into the
public OpenPOWER version of the PAPR specification (LoPAPR/LoPAR). An internal
ACR has been filed to reserve a hypercall number range specific to this
use-case to avoid any future conflicts with the internally-maintained PAPR
specification. This document summarizes some of these details as they relate
to QEMU.
== hypercalls needed by the ultravisor ==
Switching to SVM mode involves a number of hcalls issued by the ultravisor
to the hypervisor to orchestrate the movement of guest memory to secure
memory and various other aspects SVM mode. Numbers are assigned for these
hcalls within the reserved range 0xEF00-0xEF80. The below documents the
hcalls relevant to QEMU.
- H_TPM_COMM (0xef10)
For TPM_COMM_OP_EXECUTE operation:
Send a request to a TPM and receive a response, opening a new TPM session
if one has not already been opened.
For TPM_COMM_OP_CLOSE_SESSION operation:
Close the existing TPM session, if any.
Arguments:
r3 : H_TPM_COMM (0xef10)
r4 : TPM operation, one of:
TPM_COMM_OP_EXECUTE (0x1)
TPM_COMM_OP_CLOSE_SESSION (0x2)
r5 : in_buffer, guest physical address of buffer containing the request
- Caller may use the same address for both request and response
r6 : in_size, size of the in buffer
- Must be less than or equal to 4KB
r7 : out_buffer, guest physical address of buffer to store the response
- Caller may use the same address for both request and response
r8 : out_size, size of the out buffer
- Must be at least 4KB, as this is the maximum request/response size
supported by most TPM implementations, including the TPM Resource
Manager in the linux kernel.
Return values:
r3 : H_Success request processed successfully
H_PARAMETER invalid TPM operation
H_P2 in_buffer is invalid
H_P3 in_size is invalid
H_P4 out_buffer is invalid
H_P5 out_size is invalid
H_RESOURCE problem communicating with TPM
H_FUNCTION TPM access is not currently allowed/configured
r4 : For TPM_COMM_OP_EXECUTE, the size of the response will be stored here
upon success.
Use-case/notes:
SVM filesystems are encrypted using a symmetric key. This key is then
wrapped/encrypted using the public key of a trusted system which has the
private key stored in the system's TPM. An Ultravisor will use this
hcall to unwrap/unseal the symmetric key using the system's TPM device
or a TPM Resource Manager associated with the device.
The Ultravisor sets up a separate session key with the TPM in advance
during host system boot. All sensitive in and out values will be
encrypted using the session key. Though the hypervisor will see the 'in'
and 'out' buffers in raw form, any sensitive contents will generally be
encrypted using this session key.

View File

@ -110,16 +110,12 @@ can also be found in QEMU documentation:
.. toctree::
:maxdepth: 1
../../specs/ppc-spapr-hotplug.rst
../../specs/ppc-spapr-hcalls.rst
../../specs/ppc-spapr-numa.rst
../../specs/ppc-spapr-uv-hcalls.rst
../../specs/ppc-spapr-xive.rst
Other documentation available in QEMU docs directory:
* Hot plug (``/docs/specs/ppc-spapr-hotplug.txt``).
* Hypervisor calls needed by the Ultravisor
(``/docs/specs/ppc-spapr-uv-hcalls.txt``).
Switching between the KVM-PR and KVM-HV kernel module
=====================================================

View File

@ -22,7 +22,6 @@
#include "hw/irq.h"
#include "hw/qdev-properties.h"
#include "qom/object.h"
#include "sysemu/sysemu.h"
#include "trace.h"
#define phb_error(phb, fmt, ...) \
@ -228,16 +227,16 @@ static void pnv_phb4_check_mbt(PnvPHB4 *phb, uint32_t index)
/* TODO: Figure out how to implemet/decode AOMASK */
/* Check if it matches an enabled MMIO region in the PEC stack */
if (memory_region_is_mapped(&phb->stack->mmbar0) &&
base >= phb->stack->mmio0_base &&
(base + size) <= (phb->stack->mmio0_base + phb->stack->mmio0_size)) {
parent = &phb->stack->mmbar0;
base -= phb->stack->mmio0_base;
} else if (memory_region_is_mapped(&phb->stack->mmbar1) &&
base >= phb->stack->mmio1_base &&
(base + size) <= (phb->stack->mmio1_base + phb->stack->mmio1_size)) {
parent = &phb->stack->mmbar1;
base -= phb->stack->mmio1_base;
if (memory_region_is_mapped(&phb->mmbar0) &&
base >= phb->mmio0_base &&
(base + size) <= (phb->mmio0_base + phb->mmio0_size)) {
parent = &phb->mmbar0;
base -= phb->mmio0_base;
} else if (memory_region_is_mapped(&phb->mmbar1) &&
base >= phb->mmio1_base &&
(base + size) <= (phb->mmio1_base + phb->mmio1_size)) {
parent = &phb->mmbar1;
base -= phb->mmio1_base;
} else {
phb_error(phb, "PHB MBAR %d out of parent bounds", index);
return;
@ -673,7 +672,7 @@ static uint64_t pnv_phb4_reg_read(void *opaque, hwaddr off, unsigned size)
switch (off) {
case PHB_VERSION:
return phb->version;
return PNV_PHB4_PEC_GET_CLASS(phb->pec)->version;
/* Read-only */
case PHB_PHB4_GEN_CAP:
@ -861,44 +860,65 @@ const MemoryRegionOps pnv_phb4_xscom_ops = {
static uint64_t pnv_pec_stk_nest_xscom_read(void *opaque, hwaddr addr,
unsigned size)
{
PnvPhb4PecStack *stack = PNV_PHB4_PEC_STACK(opaque);
PnvPHB4 *phb = PNV_PHB4(opaque);
uint32_t reg = addr >> 3;
/* TODO: add list of allowed registers and error out if not */
return stack->nest_regs[reg];
return phb->nest_regs[reg];
}
static void pnv_phb4_update_regions(PnvPhb4PecStack *stack)
/*
* Return the 'stack_no' of a PHB4. 'stack_no' is the order
* the PHB4 occupies in the PEC. This is the reverse of what
* pnv_phb4_pec_get_phb_id() does.
*
* E.g. a phb with phb_id = 4 and pec->index = 1 (PEC1) will
* be the second phb (stack_no = 1) of the PEC.
*/
static int pnv_phb4_get_phb_stack_no(PnvPHB4 *phb)
{
PnvPHB4 *phb = stack->phb;
PnvPhb4PecState *pec = phb->pec;
PnvPhb4PecClass *pecc = PNV_PHB4_PEC_GET_CLASS(pec);
int index = pec->index;
int stack_no = phb->phb_id;
while (index--) {
stack_no -= pecc->num_phbs[index];
}
return stack_no;
}
static void pnv_phb4_update_regions(PnvPHB4 *phb)
{
/* Unmap first always */
if (memory_region_is_mapped(&phb->mr_regs)) {
memory_region_del_subregion(&stack->phbbar, &phb->mr_regs);
memory_region_del_subregion(&phb->phbbar, &phb->mr_regs);
}
if (memory_region_is_mapped(&phb->xsrc.esb_mmio)) {
memory_region_del_subregion(&stack->intbar, &phb->xsrc.esb_mmio);
memory_region_del_subregion(&phb->intbar, &phb->xsrc.esb_mmio);
}
/* Map registers if enabled */
if (memory_region_is_mapped(&stack->phbbar)) {
memory_region_add_subregion(&stack->phbbar, 0, &phb->mr_regs);
if (memory_region_is_mapped(&phb->phbbar)) {
memory_region_add_subregion(&phb->phbbar, 0, &phb->mr_regs);
}
/* Map ESB if enabled */
if (memory_region_is_mapped(&stack->intbar)) {
memory_region_add_subregion(&stack->intbar, 0, &phb->xsrc.esb_mmio);
if (memory_region_is_mapped(&phb->intbar)) {
memory_region_add_subregion(&phb->intbar, 0, &phb->xsrc.esb_mmio);
}
/* Check/update m32 */
pnv_phb4_check_all_mbt(phb);
}
static void pnv_pec_stk_update_map(PnvPhb4PecStack *stack)
static void pnv_pec_phb_update_map(PnvPHB4 *phb)
{
PnvPhb4PecState *pec = stack->pec;
PnvPhb4PecState *pec = phb->pec;
MemoryRegion *sysmem = get_system_memory();
uint64_t bar_en = stack->nest_regs[PEC_NEST_STK_BAR_EN];
uint64_t bar_en = phb->nest_regs[PEC_NEST_STK_BAR_EN];
int stack_no = pnv_phb4_get_phb_stack_no(phb);
uint64_t bar, mask, size;
char name[64];
@ -911,106 +931,106 @@ static void pnv_pec_stk_update_map(PnvPhb4PecStack *stack)
*/
/* Handle unmaps */
if (memory_region_is_mapped(&stack->mmbar0) &&
if (memory_region_is_mapped(&phb->mmbar0) &&
!(bar_en & PEC_NEST_STK_BAR_EN_MMIO0)) {
memory_region_del_subregion(sysmem, &stack->mmbar0);
memory_region_del_subregion(sysmem, &phb->mmbar0);
}
if (memory_region_is_mapped(&stack->mmbar1) &&
if (memory_region_is_mapped(&phb->mmbar1) &&
!(bar_en & PEC_NEST_STK_BAR_EN_MMIO1)) {
memory_region_del_subregion(sysmem, &stack->mmbar1);
memory_region_del_subregion(sysmem, &phb->mmbar1);
}
if (memory_region_is_mapped(&stack->phbbar) &&
if (memory_region_is_mapped(&phb->phbbar) &&
!(bar_en & PEC_NEST_STK_BAR_EN_PHB)) {
memory_region_del_subregion(sysmem, &stack->phbbar);
memory_region_del_subregion(sysmem, &phb->phbbar);
}
if (memory_region_is_mapped(&stack->intbar) &&
if (memory_region_is_mapped(&phb->intbar) &&
!(bar_en & PEC_NEST_STK_BAR_EN_INT)) {
memory_region_del_subregion(sysmem, &stack->intbar);
memory_region_del_subregion(sysmem, &phb->intbar);
}
/* Update PHB */
pnv_phb4_update_regions(stack);
pnv_phb4_update_regions(phb);
/* Handle maps */
if (!memory_region_is_mapped(&stack->mmbar0) &&
if (!memory_region_is_mapped(&phb->mmbar0) &&
(bar_en & PEC_NEST_STK_BAR_EN_MMIO0)) {
bar = stack->nest_regs[PEC_NEST_STK_MMIO_BAR0] >> 8;
mask = stack->nest_regs[PEC_NEST_STK_MMIO_BAR0_MASK];
bar = phb->nest_regs[PEC_NEST_STK_MMIO_BAR0] >> 8;
mask = phb->nest_regs[PEC_NEST_STK_MMIO_BAR0_MASK];
size = ((~mask) >> 8) + 1;
snprintf(name, sizeof(name), "pec-%d.%d-stack-%d-mmio0",
pec->chip_id, pec->index, stack->stack_no);
memory_region_init(&stack->mmbar0, OBJECT(stack), name, size);
memory_region_add_subregion(sysmem, bar, &stack->mmbar0);
stack->mmio0_base = bar;
stack->mmio0_size = size;
snprintf(name, sizeof(name), "pec-%d.%d-phb-%d-mmio0",
pec->chip_id, pec->index, stack_no);
memory_region_init(&phb->mmbar0, OBJECT(phb), name, size);
memory_region_add_subregion(sysmem, bar, &phb->mmbar0);
phb->mmio0_base = bar;
phb->mmio0_size = size;
}
if (!memory_region_is_mapped(&stack->mmbar1) &&
if (!memory_region_is_mapped(&phb->mmbar1) &&
(bar_en & PEC_NEST_STK_BAR_EN_MMIO1)) {
bar = stack->nest_regs[PEC_NEST_STK_MMIO_BAR1] >> 8;
mask = stack->nest_regs[PEC_NEST_STK_MMIO_BAR1_MASK];
bar = phb->nest_regs[PEC_NEST_STK_MMIO_BAR1] >> 8;
mask = phb->nest_regs[PEC_NEST_STK_MMIO_BAR1_MASK];
size = ((~mask) >> 8) + 1;
snprintf(name, sizeof(name), "pec-%d.%d-stack-%d-mmio1",
pec->chip_id, pec->index, stack->stack_no);
memory_region_init(&stack->mmbar1, OBJECT(stack), name, size);
memory_region_add_subregion(sysmem, bar, &stack->mmbar1);
stack->mmio1_base = bar;
stack->mmio1_size = size;
snprintf(name, sizeof(name), "pec-%d.%d-phb-%d-mmio1",
pec->chip_id, pec->index, stack_no);
memory_region_init(&phb->mmbar1, OBJECT(phb), name, size);
memory_region_add_subregion(sysmem, bar, &phb->mmbar1);
phb->mmio1_base = bar;
phb->mmio1_size = size;
}
if (!memory_region_is_mapped(&stack->phbbar) &&
if (!memory_region_is_mapped(&phb->phbbar) &&
(bar_en & PEC_NEST_STK_BAR_EN_PHB)) {
bar = stack->nest_regs[PEC_NEST_STK_PHB_REGS_BAR] >> 8;
bar = phb->nest_regs[PEC_NEST_STK_PHB_REGS_BAR] >> 8;
size = PNV_PHB4_NUM_REGS << 3;
snprintf(name, sizeof(name), "pec-%d.%d-stack-%d-phb",
pec->chip_id, pec->index, stack->stack_no);
memory_region_init(&stack->phbbar, OBJECT(stack), name, size);
memory_region_add_subregion(sysmem, bar, &stack->phbbar);
snprintf(name, sizeof(name), "pec-%d.%d-phb-%d",
pec->chip_id, pec->index, stack_no);
memory_region_init(&phb->phbbar, OBJECT(phb), name, size);
memory_region_add_subregion(sysmem, bar, &phb->phbbar);
}
if (!memory_region_is_mapped(&stack->intbar) &&
if (!memory_region_is_mapped(&phb->intbar) &&
(bar_en & PEC_NEST_STK_BAR_EN_INT)) {
bar = stack->nest_regs[PEC_NEST_STK_INT_BAR] >> 8;
bar = phb->nest_regs[PEC_NEST_STK_INT_BAR] >> 8;
size = PNV_PHB4_MAX_INTs << 16;
snprintf(name, sizeof(name), "pec-%d.%d-stack-%d-int",
stack->pec->chip_id, stack->pec->index, stack->stack_no);
memory_region_init(&stack->intbar, OBJECT(stack), name, size);
memory_region_add_subregion(sysmem, bar, &stack->intbar);
snprintf(name, sizeof(name), "pec-%d.%d-phb-%d-int",
phb->pec->chip_id, phb->pec->index, stack_no);
memory_region_init(&phb->intbar, OBJECT(phb), name, size);
memory_region_add_subregion(sysmem, bar, &phb->intbar);
}
/* Update PHB */
pnv_phb4_update_regions(stack);
pnv_phb4_update_regions(phb);
}
static void pnv_pec_stk_nest_xscom_write(void *opaque, hwaddr addr,
uint64_t val, unsigned size)
{
PnvPhb4PecStack *stack = PNV_PHB4_PEC_STACK(opaque);
PnvPhb4PecState *pec = stack->pec;
PnvPHB4 *phb = PNV_PHB4(opaque);
PnvPhb4PecState *pec = phb->pec;
uint32_t reg = addr >> 3;
switch (reg) {
case PEC_NEST_STK_PCI_NEST_FIR:
stack->nest_regs[PEC_NEST_STK_PCI_NEST_FIR] = val;
phb->nest_regs[PEC_NEST_STK_PCI_NEST_FIR] = val;
break;
case PEC_NEST_STK_PCI_NEST_FIR_CLR:
stack->nest_regs[PEC_NEST_STK_PCI_NEST_FIR] &= val;
phb->nest_regs[PEC_NEST_STK_PCI_NEST_FIR] &= val;
break;
case PEC_NEST_STK_PCI_NEST_FIR_SET:
stack->nest_regs[PEC_NEST_STK_PCI_NEST_FIR] |= val;
phb->nest_regs[PEC_NEST_STK_PCI_NEST_FIR] |= val;
break;
case PEC_NEST_STK_PCI_NEST_FIR_MSK:
stack->nest_regs[PEC_NEST_STK_PCI_NEST_FIR_MSK] = val;
phb->nest_regs[PEC_NEST_STK_PCI_NEST_FIR_MSK] = val;
break;
case PEC_NEST_STK_PCI_NEST_FIR_MSKC:
stack->nest_regs[PEC_NEST_STK_PCI_NEST_FIR_MSK] &= val;
phb->nest_regs[PEC_NEST_STK_PCI_NEST_FIR_MSK] &= val;
break;
case PEC_NEST_STK_PCI_NEST_FIR_MSKS:
stack->nest_regs[PEC_NEST_STK_PCI_NEST_FIR_MSK] |= val;
phb->nest_regs[PEC_NEST_STK_PCI_NEST_FIR_MSK] |= val;
break;
case PEC_NEST_STK_PCI_NEST_FIR_ACT0:
case PEC_NEST_STK_PCI_NEST_FIR_ACT1:
stack->nest_regs[reg] = val;
phb->nest_regs[reg] = val;
break;
case PEC_NEST_STK_PCI_NEST_FIR_WOF:
stack->nest_regs[reg] = 0;
phb->nest_regs[reg] = 0;
break;
case PEC_NEST_STK_ERR_REPORT_0:
case PEC_NEST_STK_ERR_REPORT_1:
@ -1018,39 +1038,39 @@ static void pnv_pec_stk_nest_xscom_write(void *opaque, hwaddr addr,
/* Flag error ? */
break;
case PEC_NEST_STK_PBCQ_MODE:
stack->nest_regs[reg] = val & 0xff00000000000000ull;
phb->nest_regs[reg] = val & 0xff00000000000000ull;
break;
case PEC_NEST_STK_MMIO_BAR0:
case PEC_NEST_STK_MMIO_BAR0_MASK:
case PEC_NEST_STK_MMIO_BAR1:
case PEC_NEST_STK_MMIO_BAR1_MASK:
if (stack->nest_regs[PEC_NEST_STK_BAR_EN] &
if (phb->nest_regs[PEC_NEST_STK_BAR_EN] &
(PEC_NEST_STK_BAR_EN_MMIO0 |
PEC_NEST_STK_BAR_EN_MMIO1)) {
phb_pec_error(pec, "Changing enabled BAR unsupported\n");
}
stack->nest_regs[reg] = val & 0xffffffffff000000ull;
phb->nest_regs[reg] = val & 0xffffffffff000000ull;
break;
case PEC_NEST_STK_PHB_REGS_BAR:
if (stack->nest_regs[PEC_NEST_STK_BAR_EN] & PEC_NEST_STK_BAR_EN_PHB) {
if (phb->nest_regs[PEC_NEST_STK_BAR_EN] & PEC_NEST_STK_BAR_EN_PHB) {
phb_pec_error(pec, "Changing enabled BAR unsupported\n");
}
stack->nest_regs[reg] = val & 0xffffffffffc00000ull;
phb->nest_regs[reg] = val & 0xffffffffffc00000ull;
break;
case PEC_NEST_STK_INT_BAR:
if (stack->nest_regs[PEC_NEST_STK_BAR_EN] & PEC_NEST_STK_BAR_EN_INT) {
if (phb->nest_regs[PEC_NEST_STK_BAR_EN] & PEC_NEST_STK_BAR_EN_INT) {
phb_pec_error(pec, "Changing enabled BAR unsupported\n");
}
stack->nest_regs[reg] = val & 0xfffffff000000000ull;
phb->nest_regs[reg] = val & 0xfffffff000000000ull;
break;
case PEC_NEST_STK_BAR_EN:
stack->nest_regs[reg] = val & 0xf000000000000000ull;
pnv_pec_stk_update_map(stack);
phb->nest_regs[reg] = val & 0xf000000000000000ull;
pnv_pec_phb_update_map(phb);
break;
case PEC_NEST_STK_DATA_FRZ_TYPE:
case PEC_NEST_STK_PBCQ_TUN_BAR:
/* Not used for now */
stack->nest_regs[reg] = val;
phb->nest_regs[reg] = val;
break;
default:
qemu_log_mask(LOG_UNIMP, "phb4_pec: nest_xscom_write 0x%"HWADDR_PRIx
@ -1071,54 +1091,54 @@ static const MemoryRegionOps pnv_pec_stk_nest_xscom_ops = {
static uint64_t pnv_pec_stk_pci_xscom_read(void *opaque, hwaddr addr,
unsigned size)
{
PnvPhb4PecStack *stack = PNV_PHB4_PEC_STACK(opaque);
PnvPHB4 *phb = PNV_PHB4(opaque);
uint32_t reg = addr >> 3;
/* TODO: add list of allowed registers and error out if not */
return stack->pci_regs[reg];
return phb->pci_regs[reg];
}
static void pnv_pec_stk_pci_xscom_write(void *opaque, hwaddr addr,
uint64_t val, unsigned size)
{
PnvPhb4PecStack *stack = PNV_PHB4_PEC_STACK(opaque);
PnvPHB4 *phb = PNV_PHB4(opaque);
uint32_t reg = addr >> 3;
switch (reg) {
case PEC_PCI_STK_PCI_FIR:
stack->pci_regs[reg] = val;
phb->pci_regs[reg] = val;
break;
case PEC_PCI_STK_PCI_FIR_CLR:
stack->pci_regs[PEC_PCI_STK_PCI_FIR] &= val;
phb->pci_regs[PEC_PCI_STK_PCI_FIR] &= val;
break;
case PEC_PCI_STK_PCI_FIR_SET:
stack->pci_regs[PEC_PCI_STK_PCI_FIR] |= val;
phb->pci_regs[PEC_PCI_STK_PCI_FIR] |= val;
break;
case PEC_PCI_STK_PCI_FIR_MSK:
stack->pci_regs[reg] = val;
phb->pci_regs[reg] = val;
break;
case PEC_PCI_STK_PCI_FIR_MSKC:
stack->pci_regs[PEC_PCI_STK_PCI_FIR_MSK] &= val;
phb->pci_regs[PEC_PCI_STK_PCI_FIR_MSK] &= val;
break;
case PEC_PCI_STK_PCI_FIR_MSKS:
stack->pci_regs[PEC_PCI_STK_PCI_FIR_MSK] |= val;
phb->pci_regs[PEC_PCI_STK_PCI_FIR_MSK] |= val;
break;
case PEC_PCI_STK_PCI_FIR_ACT0:
case PEC_PCI_STK_PCI_FIR_ACT1:
stack->pci_regs[reg] = val;
phb->pci_regs[reg] = val;
break;
case PEC_PCI_STK_PCI_FIR_WOF:
stack->pci_regs[reg] = 0;
phb->pci_regs[reg] = 0;
break;
case PEC_PCI_STK_ETU_RESET:
stack->pci_regs[reg] = val & 0x8000000000000000ull;
phb->pci_regs[reg] = val & 0x8000000000000000ull;
/* TODO: Implement reset */
break;
case PEC_PCI_STK_PBAIB_ERR_REPORT:
break;
case PEC_PCI_STK_PBAIB_TX_CMD_CRED:
case PEC_PCI_STK_PBAIB_TX_DAT_CRED:
stack->pci_regs[reg] = val;
phb->pci_regs[reg] = val;
break;
default:
qemu_log_mask(LOG_UNIMP, "phb4_pec_stk: pci_xscom_write 0x%"HWADDR_PRIx
@ -1362,7 +1382,7 @@ int pnv_phb4_pec_get_phb_id(PnvPhb4PecState *pec, int stack_index)
int offset = 0;
while (index--) {
offset += pecc->num_stacks[index];
offset += pecc->num_phbs[index];
}
return offset + stack_index;
@ -1459,9 +1479,9 @@ static AddressSpace *pnv_phb4_dma_iommu(PCIBus *bus, void *opaque, int devfn)
static void pnv_phb4_xscom_realize(PnvPHB4 *phb)
{
PnvPhb4PecStack *stack = phb->stack;
PnvPhb4PecState *pec = stack->pec;
PnvPhb4PecState *pec = phb->pec;
PnvPhb4PecClass *pecc = PNV_PHB4_PEC_GET_CLASS(pec);
int stack_no = pnv_phb4_get_phb_stack_no(phb);
uint32_t pec_nest_base;
uint32_t pec_pci_base;
char name[64];
@ -1469,22 +1489,22 @@ static void pnv_phb4_xscom_realize(PnvPHB4 *phb)
assert(pec);
/* Initialize the XSCOM regions for the stack registers */
snprintf(name, sizeof(name), "xscom-pec-%d.%d-nest-stack-%d",
pec->chip_id, pec->index, stack->stack_no);
pnv_xscom_region_init(&stack->nest_regs_mr, OBJECT(stack),
&pnv_pec_stk_nest_xscom_ops, stack, name,
snprintf(name, sizeof(name), "xscom-pec-%d.%d-nest-phb-%d",
pec->chip_id, pec->index, stack_no);
pnv_xscom_region_init(&phb->nest_regs_mr, OBJECT(phb),
&pnv_pec_stk_nest_xscom_ops, phb, name,
PHB4_PEC_NEST_STK_REGS_COUNT);
snprintf(name, sizeof(name), "xscom-pec-%d.%d-pci-stack-%d",
pec->chip_id, pec->index, stack->stack_no);
pnv_xscom_region_init(&stack->pci_regs_mr, OBJECT(stack),
&pnv_pec_stk_pci_xscom_ops, stack, name,
snprintf(name, sizeof(name), "xscom-pec-%d.%d-pci-phb-%d",
pec->chip_id, pec->index, stack_no);
pnv_xscom_region_init(&phb->pci_regs_mr, OBJECT(phb),
&pnv_pec_stk_pci_xscom_ops, phb, name,
PHB4_PEC_PCI_STK_REGS_COUNT);
/* PHB pass-through */
snprintf(name, sizeof(name), "xscom-pec-%d.%d-pci-stack-%d-phb",
pec->chip_id, pec->index, stack->stack_no);
pnv_xscom_region_init(&stack->phb_regs_mr, OBJECT(phb),
snprintf(name, sizeof(name), "xscom-pec-%d.%d-pci-phb-%d",
pec->chip_id, pec->index, stack_no);
pnv_xscom_region_init(&phb->phb_regs_mr, OBJECT(phb),
&pnv_phb4_xscom_ops, phb, name, 0x40);
pec_nest_base = pecc->xscom_nest_base(pec);
@ -1492,15 +1512,15 @@ static void pnv_phb4_xscom_realize(PnvPHB4 *phb)
/* Populate the XSCOM address space. */
pnv_xscom_add_subregion(pec->chip,
pec_nest_base + 0x40 * (stack->stack_no + 1),
&stack->nest_regs_mr);
pec_nest_base + 0x40 * (stack_no + 1),
&phb->nest_regs_mr);
pnv_xscom_add_subregion(pec->chip,
pec_pci_base + 0x40 * (stack->stack_no + 1),
&stack->pci_regs_mr);
pec_pci_base + 0x40 * (stack_no + 1),
&phb->pci_regs_mr);
pnv_xscom_add_subregion(pec->chip,
pec_pci_base + PNV9_XSCOM_PEC_PCI_STK0 +
0x40 * stack->stack_no,
&stack->phb_regs_mr);
0x40 * stack_no,
&phb->phb_regs_mr);
}
static void pnv_phb4_instance_init(Object *obj)
@ -1513,8 +1533,8 @@ static void pnv_phb4_instance_init(Object *obj)
object_initialize_child(obj, "source", &phb->xsrc, TYPE_XIVE_SOURCE);
}
static PnvPhb4PecStack *pnv_phb4_get_stack(PnvChip *chip, PnvPHB4 *phb,
Error **errp)
static PnvPhb4PecState *pnv_phb4_get_pec(PnvChip *chip, PnvPHB4 *phb,
Error **errp)
{
Pnv9Chip *chip9 = PNV9_CHIP(chip);
int chip_id = phb->chip_id;
@ -1523,14 +1543,14 @@ static PnvPhb4PecStack *pnv_phb4_get_stack(PnvChip *chip, PnvPHB4 *phb,
for (i = 0; i < chip->num_pecs; i++) {
/*
* For each PEC, check the amount of stacks it supports
* and see if the given phb4 index matches a stack.
* For each PEC, check the amount of phbs it supports
* and see if the given phb4 index matches an index.
*/
PnvPhb4PecState *pec = &chip9->pecs[i];
for (j = 0; j < pec->num_stacks; j++) {
for (j = 0; j < pec->num_phbs; j++) {
if (index == pnv_phb4_pec_get_phb_id(pec, j)) {
return &pec->stacks[j];
return pec;
}
}
}
@ -1552,10 +1572,9 @@ static void pnv_phb4_realize(DeviceState *dev, Error **errp)
char name[32];
/* User created PHB */
if (!phb->stack) {
if (!phb->pec) {
PnvMachineState *pnv = PNV_MACHINE(qdev_get_machine());
PnvChip *chip = pnv_get_chip(pnv, phb->chip_id);
PnvPhb4PecClass *pecc;
BusState *s;
if (!chip) {
@ -1563,23 +1582,12 @@ static void pnv_phb4_realize(DeviceState *dev, Error **errp)
return;
}
phb->stack = pnv_phb4_get_stack(chip, phb, &local_err);
phb->pec = pnv_phb4_get_pec(chip, phb, &local_err);
if (local_err) {
error_propagate(errp, local_err);
return;
}
/* All other phb properties but 'version' are already set */
pecc = PNV_PHB4_PEC_GET_CLASS(phb->stack->pec);
object_property_set_int(OBJECT(phb), "version", pecc->version,
&error_fatal);
/*
* Assign stack->phb since pnv_phb4_update_regions() uses it
* to access the phb.
*/
phb->stack->phb = phb;
/*
* Reparent user created devices to the chip to build
* correctly the device tree.
@ -1624,12 +1632,6 @@ static void pnv_phb4_realize(DeviceState *dev, Error **errp)
pci_setup_iommu(pci->bus, pnv_phb4_dma_iommu, phb);
pci->bus->flags |= PCI_BUS_EXTENDED_CONFIG_SPACE;
/* Add a single Root port if running with defaults */
if (defaults_enabled()) {
pnv_phb_attach_root_port(PCI_HOST_BRIDGE(phb),
TYPE_PNV_PHB4_ROOT_PORT);
}
/* Setup XIVE Source */
if (phb->big_phb) {
nr_irqs = PNV_PHB4_MAX_INTs;
@ -1680,9 +1682,8 @@ static void pnv_phb4_xive_notify(XiveNotifier *xf, uint32_t srcno)
static Property pnv_phb4_properties[] = {
DEFINE_PROP_UINT32("index", PnvPHB4, phb_id, 0),
DEFINE_PROP_UINT32("chip-id", PnvPHB4, chip_id, 0),
DEFINE_PROP_UINT64("version", PnvPHB4, version, 0),
DEFINE_PROP_LINK("stack", PnvPHB4, stack, TYPE_PNV_PHB4_PEC_STACK,
PnvPhb4PecStack *),
DEFINE_PROP_LINK("pec", PnvPHB4, pec, TYPE_PNV_PHB4_PEC,
PnvPhb4PecState *),
DEFINE_PROP_END_OF_LIST(),
};

View File

@ -112,15 +112,28 @@ static const MemoryRegionOps pnv_pec_pci_xscom_ops = {
.endianness = DEVICE_BIG_ENDIAN,
};
static void pnv_pec_instance_init(Object *obj)
static void pnv_pec_default_phb_realize(PnvPhb4PecState *pec,
int stack_no,
Error **errp)
{
PnvPhb4PecState *pec = PNV_PHB4_PEC(obj);
int i;
PnvPHB4 *phb = PNV_PHB4(qdev_new(TYPE_PNV_PHB4));
int phb_id = pnv_phb4_pec_get_phb_id(pec, stack_no);
for (i = 0; i < PHB4_PEC_MAX_STACKS; i++) {
object_initialize_child(obj, "stack[*]", &pec->stacks[i],
TYPE_PNV_PHB4_PEC_STACK);
object_property_set_link(OBJECT(phb), "pec", OBJECT(pec),
&error_abort);
object_property_set_int(OBJECT(phb), "chip-id", pec->chip_id,
&error_fatal);
object_property_set_int(OBJECT(phb), "index", phb_id,
&error_fatal);
if (!sysbus_realize(SYS_BUS_DEVICE(phb), errp)) {
return;
}
/* Add a single Root port if running with defaults */
pnv_phb_attach_root_port(PCI_HOST_BRIDGE(phb),
PNV_PHB4_PEC_GET_CLASS(pec)->rp_model);
}
static void pnv_pec_realize(DeviceState *dev, Error **errp)
@ -135,22 +148,14 @@ static void pnv_pec_realize(DeviceState *dev, Error **errp)
return;
}
pec->num_stacks = pecc->num_stacks[pec->index];
pec->num_phbs = pecc->num_phbs[pec->index];
/* Create stacks */
for (i = 0; i < pec->num_stacks; i++) {
PnvPhb4PecStack *stack = &pec->stacks[i];
Object *stk_obj = OBJECT(stack);
object_property_set_int(stk_obj, "stack-no", i, &error_abort);
object_property_set_link(stk_obj, "pec", OBJECT(pec), &error_abort);
if (!qdev_realize(DEVICE(stk_obj), NULL, errp)) {
return;
/* Create PHBs if running with defaults */
if (defaults_enabled()) {
for (i = 0; i < pec->num_phbs; i++) {
pnv_pec_default_phb_realize(pec, i, errp);
}
}
for (; i < PHB4_PEC_MAX_STACKS; i++) {
object_unparent(OBJECT(&pec->stacks[i]));
}
/* Initialize the XSCOM regions for the PEC registers */
snprintf(name, sizeof(name), "xscom-pec-%d.%d-nest", pec->chip_id,
@ -195,7 +200,7 @@ static int pnv_pec_dt_xscom(PnvXScomInterface *dev, void *fdt,
_FDT((fdt_setprop(fdt, offset, "compatible", pecc->compat,
pecc->compat_size)));
for (i = 0; i < pec->num_stacks; i++) {
for (i = 0; i < pec->num_phbs; i++) {
int phb_id = pnv_phb4_pec_get_phb_id(pec, i);
int stk_offset;
@ -231,11 +236,11 @@ static uint32_t pnv_pec_xscom_nest_base(PnvPhb4PecState *pec)
}
/*
* PEC0 -> 1 stack
* PEC1 -> 2 stacks
* PEC2 -> 3 stacks
* PEC0 -> 1 phb
* PEC1 -> 2 phb
* PEC2 -> 3 phbs
*/
static const uint32_t pnv_pec_num_stacks[] = { 1, 2, 3 };
static const uint32_t pnv_pec_num_phbs[] = { 1, 2, 3 };
static void pnv_pec_class_init(ObjectClass *klass, void *data)
{
@ -260,14 +265,14 @@ static void pnv_pec_class_init(ObjectClass *klass, void *data)
pecc->stk_compat = stk_compat;
pecc->stk_compat_size = sizeof(stk_compat);
pecc->version = PNV_PHB4_VERSION;
pecc->num_stacks = pnv_pec_num_stacks;
pecc->num_phbs = pnv_pec_num_phbs;
pecc->rp_model = TYPE_PNV_PHB4_ROOT_PORT;
}
static const TypeInfo pnv_pec_type_info = {
.name = TYPE_PNV_PHB4_PEC,
.parent = TYPE_DEVICE,
.instance_size = sizeof(PnvPhb4PecState),
.instance_init = pnv_pec_instance_init,
.class_init = pnv_pec_class_init,
.class_size = sizeof(PnvPhb4PecClass),
.interfaces = (InterfaceInfo[]) {
@ -276,73 +281,9 @@ static const TypeInfo pnv_pec_type_info = {
}
};
static void pnv_pec_stk_default_phb_realize(PnvPhb4PecStack *stack,
Error **errp)
{
PnvPhb4PecState *pec = stack->pec;
PnvPhb4PecClass *pecc = PNV_PHB4_PEC_GET_CLASS(pec);
int phb_id = pnv_phb4_pec_get_phb_id(pec, stack->stack_no);
stack->phb = PNV_PHB4(qdev_new(TYPE_PNV_PHB4));
object_property_set_int(OBJECT(stack->phb), "chip-id", pec->chip_id,
&error_fatal);
object_property_set_int(OBJECT(stack->phb), "index", phb_id,
&error_fatal);
object_property_set_int(OBJECT(stack->phb), "version", pecc->version,
&error_fatal);
object_property_set_link(OBJECT(stack->phb), "stack", OBJECT(stack),
&error_abort);
if (!sysbus_realize(SYS_BUS_DEVICE(stack->phb), errp)) {
return;
}
}
static void pnv_pec_stk_realize(DeviceState *dev, Error **errp)
{
PnvPhb4PecStack *stack = PNV_PHB4_PEC_STACK(dev);
if (!defaults_enabled()) {
return;
}
pnv_pec_stk_default_phb_realize(stack, errp);
}
static Property pnv_pec_stk_properties[] = {
DEFINE_PROP_UINT32("stack-no", PnvPhb4PecStack, stack_no, 0),
DEFINE_PROP_LINK("pec", PnvPhb4PecStack, pec, TYPE_PNV_PHB4_PEC,
PnvPhb4PecState *),
DEFINE_PROP_END_OF_LIST(),
};
static void pnv_pec_stk_class_init(ObjectClass *klass, void *data)
{
DeviceClass *dc = DEVICE_CLASS(klass);
device_class_set_props(dc, pnv_pec_stk_properties);
dc->realize = pnv_pec_stk_realize;
dc->user_creatable = false;
/* TODO: reset regs ? */
}
static const TypeInfo pnv_pec_stk_type_info = {
.name = TYPE_PNV_PHB4_PEC_STACK,
.parent = TYPE_DEVICE,
.instance_size = sizeof(PnvPhb4PecStack),
.class_init = pnv_pec_stk_class_init,
.interfaces = (InterfaceInfo[]) {
{ TYPE_PNV_XSCOM_INTERFACE },
{ }
}
};
static void pnv_pec_register_types(void)
{
type_register_static(&pnv_pec_type_info);
type_register_static(&pnv_pec_stk_type_info);
}
type_init(pnv_pec_register_types);

View File

@ -84,6 +84,9 @@ struct PnvPHB4 {
uint64_t version;
/* The owner PEC */
PnvPhb4PecState *pec;
char bus_path[8];
/* Main register images */
@ -107,6 +110,29 @@ struct PnvPHB4 {
MemoryRegion pci_mmio;
MemoryRegion pci_io;
/* PCI registers (excluding pass-through) */
#define PHB4_PEC_PCI_STK_REGS_COUNT 0xf
uint64_t pci_regs[PHB4_PEC_PCI_STK_REGS_COUNT];
MemoryRegion pci_regs_mr;
/* Nest registers */
#define PHB4_PEC_NEST_STK_REGS_COUNT 0x17
uint64_t nest_regs[PHB4_PEC_NEST_STK_REGS_COUNT];
MemoryRegion nest_regs_mr;
/* PHB pass-through XSCOM */
MemoryRegion phb_regs_mr;
/* Memory windows from PowerBus to PHB */
MemoryRegion phbbar;
MemoryRegion intbar;
MemoryRegion mmbar0;
MemoryRegion mmbar1;
uint64_t mmio0_base;
uint64_t mmio0_size;
uint64_t mmio1_base;
uint64_t mmio1_size;
/* On-chip IODA tables */
uint64_t ioda_LIST[PNV_PHB4_MAX_LSIs];
uint64_t ioda_MIST[PNV_PHB4_MAX_MIST];
@ -125,8 +151,6 @@ struct PnvPHB4 {
XiveSource xsrc;
qemu_irq *qirqs;
PnvPhb4PecStack *stack;
QLIST_HEAD(, PnvPhb4DMASpace) dma_spaces;
};
@ -140,49 +164,6 @@ extern const MemoryRegionOps pnv_phb4_xscom_ops;
#define TYPE_PNV_PHB4_PEC "pnv-phb4-pec"
OBJECT_DECLARE_TYPE(PnvPhb4PecState, PnvPhb4PecClass, PNV_PHB4_PEC)
#define TYPE_PNV_PHB4_PEC_STACK "pnv-phb4-pec-stack"
OBJECT_DECLARE_SIMPLE_TYPE(PnvPhb4PecStack, PNV_PHB4_PEC_STACK)
/* Per-stack data */
struct PnvPhb4PecStack {
DeviceState parent;
/* My own stack number */
uint32_t stack_no;
/* Nest registers */
#define PHB4_PEC_NEST_STK_REGS_COUNT 0x17
uint64_t nest_regs[PHB4_PEC_NEST_STK_REGS_COUNT];
MemoryRegion nest_regs_mr;
/* PCI registers (excluding pass-through) */
#define PHB4_PEC_PCI_STK_REGS_COUNT 0xf
uint64_t pci_regs[PHB4_PEC_PCI_STK_REGS_COUNT];
MemoryRegion pci_regs_mr;
/* PHB pass-through XSCOM */
MemoryRegion phb_regs_mr;
/* Memory windows from PowerBus to PHB */
MemoryRegion mmbar0;
MemoryRegion mmbar1;
MemoryRegion phbbar;
MemoryRegion intbar;
uint64_t mmio0_base;
uint64_t mmio0_size;
uint64_t mmio1_base;
uint64_t mmio1_size;
/* The owner PEC */
PnvPhb4PecState *pec;
/*
* PHB4 pointer. pnv_phb4_update_regions() needs to access
* the PHB4 via a PnvPhb4PecStack pointer.
*/
PnvPHB4 *phb;
};
struct PnvPhb4PecState {
DeviceState parent;
@ -202,10 +183,8 @@ struct PnvPhb4PecState {
uint64_t pci_regs[PHB4_PEC_PCI_REGS_COUNT];
MemoryRegion pci_regs_mr;
/* Stacks */
#define PHB4_PEC_MAX_STACKS 3
uint32_t num_stacks;
PnvPhb4PecStack stacks[PHB4_PEC_MAX_STACKS];
/* PHBs */
uint32_t num_phbs;
PnvChip *chip;
};
@ -223,7 +202,8 @@ struct PnvPhb4PecClass {
const char *stk_compat;
int stk_compat_size;
uint64_t version;
const uint32_t *num_stacks;
const uint32_t *num_phbs;
const char *rp_model;
};
#endif /* PCI_HOST_PNV_PHB4_H */

View File

@ -636,13 +636,13 @@
"PowerPC 7410 v1.3 (G4)")
POWERPC_DEF("7410_v1.4", CPU_POWERPC_7410_v14, 7410,
"PowerPC 7410 v1.4 (G4)")
POWERPC_DEF("7448_v1.0", CPU_POWERPC_7448_v10, 7400,
POWERPC_DEF("7448_v1.0", CPU_POWERPC_7448_v10, 7445,
"PowerPC 7448 v1.0 (G4)")
POWERPC_DEF("7448_v1.1", CPU_POWERPC_7448_v11, 7400,
POWERPC_DEF("7448_v1.1", CPU_POWERPC_7448_v11, 7445,
"PowerPC 7448 v1.1 (G4)")
POWERPC_DEF("7448_v2.0", CPU_POWERPC_7448_v20, 7400,
POWERPC_DEF("7448_v2.0", CPU_POWERPC_7448_v20, 7445,
"PowerPC 7448 v2.0 (G4)")
POWERPC_DEF("7448_v2.1", CPU_POWERPC_7448_v21, 7400,
POWERPC_DEF("7448_v2.1", CPU_POWERPC_7448_v21, 7445,
"PowerPC 7448 v2.1 (G4)")
POWERPC_DEF("7450_v1.0", CPU_POWERPC_7450_v10, 7450,
"PowerPC 7450 v1.0 (G4)")
@ -750,7 +750,6 @@
/* PowerPC CPU aliases */
PowerPCCPUAlias ppc_cpu_aliases[] = {
{ "403", "403gc" },
{ "405", "405d4" },
{ "405cr", "405crc" },
{ "405gp", "405gpd" },

View File

@ -1133,7 +1133,6 @@ struct CPUPPCState {
int nb_pids; /* Number of available PID registers */
int tlb_type; /* Type of TLB we're dealing with */
ppc_tlb_t tlb; /* TLB is optional. Allocate them only if needed */
target_ulong pb[4]; /* 403 dedicated access protection registers */
bool tlb_dirty; /* Set to non-zero when modifying TLB */
bool kvm_sw_tlb; /* non-zero if KVM SW TLB API is active */
uint32_t tlb_need_flush; /* Delayed flush needed */

View File

@ -703,7 +703,6 @@ DEF_HELPER_FLAGS_2(store_hdecr, TCG_CALL_NO_RWG, void, env, tl)
DEF_HELPER_FLAGS_2(store_vtb, TCG_CALL_NO_RWG, void, env, tl)
DEF_HELPER_FLAGS_2(store_tbu40, TCG_CALL_NO_RWG, void, env, tl)
DEF_HELPER_2(store_hid0_601, void, env, tl)
DEF_HELPER_3(store_403_pbr, void, env, i32, tl)
DEF_HELPER_FLAGS_1(load_40x_pit, TCG_CALL_NO_RWG, tl, env)
DEF_HELPER_FLAGS_2(store_40x_pit, TCG_CALL_NO_RWG, void, env, tl)
DEF_HELPER_FLAGS_2(store_40x_tcr, TCG_CALL_NO_RWG, void, env, tl)

View File

@ -23,117 +23,6 @@ static void post_load_update_msr(CPUPPCState *env)
pmu_update_summaries(env);
}
static int cpu_load_old(QEMUFile *f, void *opaque, int version_id)
{
PowerPCCPU *cpu = opaque;
CPUPPCState *env = &cpu->env;
unsigned int i, j;
target_ulong sdr1;
uint32_t fpscr, vscr;
#if defined(TARGET_PPC64)
int32_t slb_nr;
#endif
target_ulong xer;
for (i = 0; i < 32; i++) {
qemu_get_betls(f, &env->gpr[i]);
}
#if !defined(TARGET_PPC64)
for (i = 0; i < 32; i++) {
qemu_get_betls(f, &env->gprh[i]);
}
#endif
qemu_get_betls(f, &env->lr);
qemu_get_betls(f, &env->ctr);
for (i = 0; i < 8; i++) {
qemu_get_be32s(f, &env->crf[i]);
}
qemu_get_betls(f, &xer);
cpu_write_xer(env, xer);
qemu_get_betls(f, &env->reserve_addr);
qemu_get_betls(f, &env->msr);
for (i = 0; i < 4; i++) {
qemu_get_betls(f, &env->tgpr[i]);
}
for (i = 0; i < 32; i++) {
union {
float64 d;
uint64_t l;
} u;
u.l = qemu_get_be64(f);
*cpu_fpr_ptr(env, i) = u.d;
}
qemu_get_be32s(f, &fpscr);
env->fpscr = fpscr;
qemu_get_sbe32s(f, &env->access_type);
#if defined(TARGET_PPC64)
qemu_get_betls(f, &env->spr[SPR_ASR]);
qemu_get_sbe32s(f, &slb_nr);
#endif
qemu_get_betls(f, &sdr1);
for (i = 0; i < 32; i++) {
qemu_get_betls(f, &env->sr[i]);
}
for (i = 0; i < 2; i++) {
for (j = 0; j < 8; j++) {
qemu_get_betls(f, &env->DBAT[i][j]);
}
}
for (i = 0; i < 2; i++) {
for (j = 0; j < 8; j++) {
qemu_get_betls(f, &env->IBAT[i][j]);
}
}
qemu_get_sbe32s(f, &env->nb_tlb);
qemu_get_sbe32s(f, &env->tlb_per_way);
qemu_get_sbe32s(f, &env->nb_ways);
qemu_get_sbe32s(f, &env->last_way);
qemu_get_sbe32s(f, &env->id_tlbs);
qemu_get_sbe32s(f, &env->nb_pids);
if (env->tlb.tlb6) {
/* XXX assumes 6xx */
for (i = 0; i < env->nb_tlb; i++) {
qemu_get_betls(f, &env->tlb.tlb6[i].pte0);
qemu_get_betls(f, &env->tlb.tlb6[i].pte1);
qemu_get_betls(f, &env->tlb.tlb6[i].EPN);
}
}
for (i = 0; i < 4; i++) {
qemu_get_betls(f, &env->pb[i]);
}
for (i = 0; i < 1024; i++) {
qemu_get_betls(f, &env->spr[i]);
}
if (!cpu->vhyp) {
ppc_store_sdr1(env, sdr1);
}
qemu_get_be32s(f, &vscr);
ppc_store_vscr(env, vscr);
qemu_get_be64s(f, &env->spe_acc);
qemu_get_be32s(f, &env->spe_fscr);
qemu_get_betls(f, &env->msr_mask);
qemu_get_be32s(f, &env->flags);
qemu_get_sbe32s(f, &env->error_code);
qemu_get_be32s(f, &env->pending_interrupts);
qemu_get_be32s(f, &env->irq_input_state);
for (i = 0; i < POWERPC_EXCP_NB; i++) {
qemu_get_betls(f, &env->excp_vectors[i]);
}
qemu_get_betls(f, &env->excp_prefix);
qemu_get_betls(f, &env->ivor_mask);
qemu_get_betls(f, &env->ivpr_mask);
qemu_get_betls(f, &env->hreset_vector);
qemu_get_betls(f, &env->nip);
qemu_get_sbetl(f); /* Discard unused hflags */
qemu_get_sbetl(f); /* Discard unused hflags_nmsr */
qemu_get_sbe32(f); /* Discard unused mmu_idx */
qemu_get_sbe32(f); /* Discard unused power_mode */
post_load_update_msr(env);
return 0;
}
static int get_avr(QEMUFile *f, void *pv, size_t size,
const VMStateField *field)
{
@ -709,25 +598,6 @@ static bool tlbemb_needed(void *opaque)
return env->nb_tlb && (env->tlb_type == TLB_EMB);
}
static bool pbr403_needed(void *opaque)
{
PowerPCCPU *cpu = opaque;
uint32_t pvr = cpu->env.spr[SPR_PVR];
return (pvr & 0xffff0000) == 0x00200000;
}
static const VMStateDescription vmstate_pbr403 = {
.name = "cpu/pbr403",
.version_id = 1,
.minimum_version_id = 1,
.needed = pbr403_needed,
.fields = (VMStateField[]) {
VMSTATE_UINTTL_ARRAY(env.pb, PowerPCCPU, 4),
VMSTATE_END_OF_LIST()
},
};
static const VMStateDescription vmstate_tlbemb = {
.name = "cpu/tlb6xx",
.version_id = 1,
@ -739,13 +609,8 @@ static const VMStateDescription vmstate_tlbemb = {
env.nb_tlb,
vmstate_tlbemb_entry,
ppcemb_tlb_t),
/* 403 protection registers */
VMSTATE_END_OF_LIST()
},
.subsections = (const VMStateDescription*[]) {
&vmstate_pbr403,
NULL
}
};
static const VMStateDescription vmstate_tlbmas_entry = {
@ -808,7 +673,6 @@ const VMStateDescription vmstate_ppc_cpu = {
.version_id = 5,
.minimum_version_id = 5,
.minimum_version_id_old = 4,
.load_state_old = cpu_load_old,
.pre_save = cpu_pre_save,
.post_load = cpu_post_load,
.fields = (VMStateField[]) {

View File

@ -226,15 +226,6 @@ void helper_store_hid0_601(CPUPPCState *env, target_ulong val)
}
}
void helper_store_403_pbr(CPUPPCState *env, uint32_t num, target_ulong value)
{
if (likely(env->pb[num] != value)) {
env->pb[num] = value;
/* Should be optimized */
tlb_flush(env_cpu(env));
}
}
void helper_store_40x_dbcr0(CPUPPCState *env, target_ulong val)
{
/* Bits 26 & 27 affect single-stepping. */

View File

@ -911,22 +911,8 @@ void spr_write_booke_tsr(DisasContext *ctx, int sprn, int gprn)
}
#endif
/* PowerPC 403 specific registers */
/* PBL1 / PBU1 / PBL2 / PBU2 */
/* PIR */
#if !defined(CONFIG_USER_ONLY)
void spr_read_403_pbr(DisasContext *ctx, int gprn, int sprn)
{
tcg_gen_ld_tl(cpu_gpr[gprn], cpu_env,
offsetof(CPUPPCState, pb[sprn - SPR_403_PBL1]));
}
void spr_write_403_pbr(DisasContext *ctx, int sprn, int gprn)
{
TCGv_i32 t0 = tcg_const_i32(sprn - SPR_403_PBL1);
gen_helper_store_403_pbr(cpu_env, t0, cpu_gpr[gprn]);
tcg_temp_free_i32(t0);
}
void spr_write_pir(DisasContext *ctx, int sprn, int gprn)
{
TCGv t0 = tcg_temp_new();

123
tests/avocado/ppc_74xx.py Normal file
View File

@ -0,0 +1,123 @@
# Smoke tests for 74xx cpus (aka G4).
#
# Copyright (c) 2021, IBM Corp.
#
# This work is licensed under the terms of the GNU GPL, version 2 or
# later. See the COPYING file in the top-level directory.
from avocado_qemu import QemuSystemTest
from avocado_qemu import wait_for_console_pattern
class ppc74xxCpu(QemuSystemTest):
"""
:avocado: tags=arch:ppc
"""
timeout = 5
def test_ppc_7400(self):
"""
:avocado: tags=cpu:7400
"""
self.vm.set_console()
self.vm.launch()
wait_for_console_pattern(self, '>> OpenBIOS')
wait_for_console_pattern(self, '>> CPU type PowerPC,G4')
def test_ppc_7410(self):
"""
:avocado: tags=cpu:7410
"""
self.vm.set_console()
self.vm.launch()
wait_for_console_pattern(self, '>> OpenBIOS')
wait_for_console_pattern(self, '>> CPU type PowerPC,74xx')
def test_ppc_7441(self):
"""
:avocado: tags=cpu:7441
"""
self.vm.set_console()
self.vm.launch()
wait_for_console_pattern(self, '>> OpenBIOS')
wait_for_console_pattern(self, '>> CPU type PowerPC,G4')
def test_ppc_7445(self):
"""
:avocado: tags=cpu:7445
"""
self.vm.set_console()
self.vm.launch()
wait_for_console_pattern(self, '>> OpenBIOS')
wait_for_console_pattern(self, '>> CPU type PowerPC,G4')
def test_ppc_7447(self):
"""
:avocado: tags=cpu:7447
"""
self.vm.set_console()
self.vm.launch()
wait_for_console_pattern(self, '>> OpenBIOS')
wait_for_console_pattern(self, '>> CPU type PowerPC,G4')
def test_ppc_7447a(self):
"""
:avocado: tags=cpu:7447a
"""
self.vm.set_console()
self.vm.launch()
wait_for_console_pattern(self, '>> OpenBIOS')
wait_for_console_pattern(self, '>> CPU type PowerPC,G4')
def test_ppc_7448(self):
"""
:avocado: tags=cpu:7448
"""
self.vm.set_console()
self.vm.launch()
wait_for_console_pattern(self, '>> OpenBIOS')
wait_for_console_pattern(self, '>> CPU type PowerPC,MPC86xx')
def test_ppc_7450(self):
"""
:avocado: tags=cpu:7450
"""
self.vm.set_console()
self.vm.launch()
wait_for_console_pattern(self, '>> OpenBIOS')
wait_for_console_pattern(self, '>> CPU type PowerPC,G4')
def test_ppc_7451(self):
"""
:avocado: tags=cpu:7451
"""
self.vm.set_console()
self.vm.launch()
wait_for_console_pattern(self, '>> OpenBIOS')
wait_for_console_pattern(self, '>> CPU type PowerPC,G4')
def test_ppc_7455(self):
"""
:avocado: tags=cpu:7455
"""
self.vm.set_console()
self.vm.launch()
wait_for_console_pattern(self, '>> OpenBIOS')
wait_for_console_pattern(self, '>> CPU type PowerPC,G4')
def test_ppc_7457(self):
"""
:avocado: tags=cpu:7457
"""
self.vm.set_console()
self.vm.launch()
wait_for_console_pattern(self, '>> OpenBIOS')
wait_for_console_pattern(self, '>> CPU type PowerPC,G4')
def test_ppc_7457a(self):
"""
:avocado: tags=cpu:7457a
"""
self.vm.set_console()
self.vm.launch()
wait_for_console_pattern(self, '>> OpenBIOS')
wait_for_console_pattern(self, '>> CPU type PowerPC,G4')