Commit Graph

759 Commits

Author SHA1 Message Date
Igor Mammedov a0258e4afa ppc/{ppc440_bamboo, sam460ex}: drop RAM size fixup
If user provided non-sense RAM size, board will complain and
continue running with max RAM size supported or sometimes
crash like this:
  %QEMU -M bamboo -m 1
    exec.c:1926: find_ram_offset: Assertion `size != 0' failed.
    Aborted (core dumped)
Also RAM is going to be allocated by generic code, so it won't be
possible for board to fix things up for user.

Make it error message and exit to force user fix CLI,
instead of accepting non-sense CLI values.
That also fixes crash issue, since wrongly calculated size
isn't used to allocate RAM

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: BALATON Zoltan <balaton@eik.bme.hu>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20200219160953.13771-66-imammedo@redhat.com>
2020-02-19 16:50:00 +00:00
Aravinda Prasad 2500fb423a migration: Include migration support for machine check handling
This patch includes migration support for machine check
handling. Especially this patch blocks VM migration
requests until the machine check error handling is
complete as these errors are specific to the source
hardware and is irrelevant on the target hardware.

Signed-off-by: Aravinda Prasad <arawinda.p@gmail.com>
[Do not set FWNMI cap in post_load, now its done in .apply hook]
Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
Message-Id: <20200130184423.20519-7-ganeshgr@linux.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-02-03 11:33:11 +11:00
Aravinda Prasad f03496bc12 ppc: spapr: Handle "ibm,nmi-register" and "ibm,nmi-interlock" RTAS calls
This patch adds support in QEMU to handle "ibm,nmi-register"
and "ibm,nmi-interlock" RTAS calls.

The machine check notification address is saved when the
OS issues "ibm,nmi-register" RTAS call.

This patch also handles the case when multiple processors
experience machine check at or about the same time by
handling "ibm,nmi-interlock" call. In such cases, as per
PAPR, subsequent processors serialize waiting for the first
processor to issue the "ibm,nmi-interlock" call. The second
processor that also received a machine check error waits
till the first processor is done reading the error log.
The first processor issues "ibm,nmi-interlock" call
when the error log is consumed.

Signed-off-by: Aravinda Prasad <arawinda.p@gmail.com>
[Register fwnmi RTAS calls in core_rtas_register_types()
 where other RTAS calls are registered]
Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
Message-Id: <20200130184423.20519-6-ganeshgr@linux.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-02-03 11:33:11 +11:00
Aravinda Prasad 81fe70e443 target/ppc: Build rtas error log upon an MCE
Upon a machine check exception (MCE) in a guest address space,
KVM causes a guest exit to enable QEMU to build and pass the
error to the guest in the PAPR defined rtas error log format.

This patch builds the rtas error log, copies it to the rtas_addr
and then invokes the guest registered machine check handler. The
handler in the guest takes suitable action(s) depending on the type
and criticality of the error. For example, if an error is
unrecoverable memory corruption in an application inside the
guest, then the guest kernel sends a SIGBUS to the application.
For recoverable errors, the guest performs recovery actions and
logs the error.

Signed-off-by: Aravinda Prasad <arawinda.p@gmail.com>
[Assume SLOF has allocated enough room for rtas error log]
Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20200130184423.20519-5-ganeshgr@linux.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-02-03 11:33:10 +11:00
Aravinda Prasad 9ac703ac5f target/ppc: Handle NMI guest exit
Memory error such as bit flips that cannot be corrected
by hardware are passed on to the kernel for handling.
If the memory address in error belongs to guest then
the guest kernel is responsible for taking suitable action.
Patch [1] enhances KVM to exit guest with exit reason
set to KVM_EXIT_NMI in such cases. This patch handles
KVM_EXIT_NMI exit.

[1] https://www.spinics.net/lists/kvm-ppc/msg12637.html
    (e20bbd3d and related commits)

Signed-off-by: Aravinda Prasad <arawinda.p@gmail.com>
Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <20200130184423.20519-4-ganeshgr@linux.ibm.com>
[dwg: #ifdefs to fix compile for 32-bit target]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-02-03 11:33:10 +11:00
Aravinda Prasad 9d953ce447 ppc: spapr: Introduce FWNMI capability
Introduce fwnmi an spapr capability and add a helper function
which tries to enable it, which would be used by following patch
of the series. This patch by itself does not change the existing
behavior.

Signed-off-by: Aravinda Prasad <arawinda.p@gmail.com>
[eliminate cap_ppc_fwnmi, add fwnmi cap to migration state
 and reprhase the commit message]
Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20200130184423.20519-3-ganeshgr@linux.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-02-03 11:33:10 +11:00
Cédric Le Goater 9ae1329ee2 ppc/pnv: Add models for POWER8 PHB3 PCIe Host bridge
This is a model of the PCIe Host Bridge (PHB3) found on a POWER8
processor. It includes the PowerBus logic interface (PBCQ), IOMMU
support, a single PCIe Gen.3 Root Complex, and support for MSI and LSI
interrupt sources as found on a POWER8 system using the XICS interrupt
controller.

The POWER8 processor comes in different flavors: Venice, Murano,
Naple, each having a different number of PHBs. To make things simpler,
the models provides 3 PHB3 per chip. Some platforms, like the
Firestone, can also couple PHBs on the first chip to provide more
bandwidth but this is too specific to model in QEMU.

XICS requires some adjustment to support the PHB3 MSI. The changes are
provided here but they could be decoupled in prereq patches.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20200127144506.11132-3-clg@kaod.org>
[dwg: Use device_class_set_props()]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-02-02 14:07:57 +11:00
Benjamin Herrenschmidt 4f9924c4d4 ppc/pnv: Add models for POWER9 PHB4 PCIe Host bridge
These changes introduces models for the PCIe Host Bridge (PHB4) of the
POWER9 processor. It includes the PowerBus logic interface (PBCQ),
IOMMU support, a single PCIe Gen.4 Root Complex, and support for MSI
and LSI interrupt sources as found on a POWER9 system using the XIVE
interrupt controller.

POWER9 processor comes with 3 PHB4 PEC (PCI Express Controller) and
each PEC can have several PHBs. By default,

  * PEC0 provides 1 PHB  (PHB0)
  * PEC1 provides 2 PHBs (PHB1 and PHB2)
  * PEC2 provides 3 PHBs (PHB3, PHB4 and PHB5)

Each PEC has a set  "global" registers and some "per-stack" (per-PHB)
registers. Those are organized in two XSCOM ranges, the "Nest" range
and the "PCI" range, each range contains both some "PEC" registers and
some "per-stack" registers.

No default device layout is provided and PCI devices can be added on
any of the available PCIe Root Port (pcie.0 .. 2 of a Power9 chip)
with address 0x0 as the firwware (skiboot) only accepts a single
device per root port. To run a simple system with a network and a
storage adapters, use a command line options such as :

  -device e1000e,netdev=net0,mac=C0:FF:EE:00:00:02,bus=pcie.0,addr=0x0
  -netdev bridge,id=net0,helper=/usr/libexec/qemu-bridge-helper,br=virbr0,id=hostnet0

  -device megasas,id=scsi0,bus=pcie.1,addr=0x0
  -drive file=$disk,if=none,id=drive-scsi0-0-0-0,format=qcow2,cache=none
  -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=2

If more are needed, include a bridge.

Multi chip is supported, each chip adding its set of PHB4 controllers
and its PCI busses. The model doesn't emulate the EEH error handling.

This model is not ready for hotplug yet.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[ clg: - numerous cleanups
       - commit log
       - fix for broken LSI support
       - PHB pic printinfo
       - large QOM rework ]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20200127144506.11132-2-clg@kaod.org>
[dwg: Use device_class_set_props()]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-02-02 14:07:57 +11:00
Stefan Berger 864674fa29 spapr: Implement get_dt_compatible() callback
For devices that cannot be statically initialized, implement a
get_dt_compatible() callback that allows us to ask the device for
the 'compatible' value.

Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20200121152935.649898-3-stefanb@linux.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-02-02 14:07:57 +11:00
Cédric Le Goater 08c3f3a734 ppc/pnv: Add support for "hostboot" mode
When the "hb-mode" option is activated on the powernv machine, the
firmware is mapped at 0x8000000 and the HRMOR of the HW threads are
set to the same address.

The PNOR mapping on the FW address space of the LPC bus is left enabled
to let the firmware load any other images required to boot the host.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20200127144154.10170-4-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-02-02 14:07:57 +11:00
Thomas Huth b2ce76a073 hw/ppc/prep: Remove the deprecated "prep" machine and the OpenHackware BIOS
It's been deprecated since QEMU v3.1. The 40p machine should be
used nowadays instead.

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Hervé Poussineau <hpoussin@reactos.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20200114114617.28854-1-thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-02-02 14:07:57 +11:00
Cédric Le Goater fc2527fb02 ppc/pnv: fix check on return value of blk_getlength()
blk_getlength() returns an int64_t but the result is stored in a
uint32_t. Errors (negative values) won't be caught by the check in
pnv_pnor_realize() and blk_blockalign() will allocate a very large
buffer in such cases.

Fixes Coverity issue CID 1412226.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20200107171809.15556-3-clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-01-08 12:01:14 +11:00
Greg Kurz 806fed593d pnv/xive: Deduce the PnvXive pointer from XiveTCTX::xptr
And use it instead of reaching out to the machine. This allows to get
rid of pnv_get_chip().

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20200106145645.4539-11-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-01-08 11:01:59 +11:00
Cédric Le Goater 479509463b xive: Add a "presenter" link property to the TCTX object
This will be used in subsequent patches to access the XIVE associated to
a TCTX without reaching out to the machine through qdev_get_machine().

Signed-off-by: Cédric Le Goater <clg@kaod.org>
[ groug: - split patch
         - write subject and changelog ]
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20200106145645.4539-9-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-01-08 11:01:59 +11:00
Greg Kurz d8137bb729 ppc/pnv: Add a "pnor" const link property to the BMC internal simulator
This allows to get rid of a call to qdev_get_machine().

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20200106145645.4539-8-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-01-08 11:01:59 +11:00
Greg Kurz 764f9b2559 ppc/pnv: Add an "nr-threads" property to the base chip class
Set it at chip creation and forward it to the cores. This allows to drop
a call to qdev_get_machine().

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20200106145645.4539-7-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-01-08 11:01:59 +11:00
Greg Kurz d1214b819f spapr, pnv, xive: Add a "xive-fabric" link to the XIVE router
In order to get rid of qdev_get_machine(), first add a pointer to the
XIVE fabric under the XIVE router and make it configurable through a
QOM link property.

Configure it in the spapr and pnv machine. In the case of pnv, the XIVE
routers are under the chip, so this is done with a QOM alias property of
the POWER9 pnv chip.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20200106145645.4539-5-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-01-08 11:01:59 +11:00
Greg Kurz 0da41d3c5a pnv/xive: Use device_class_set_parent_realize()
The XIVE router base class currently inherits an empty realize hook
from the sysbus device base class, but it will soon implement one
of its own to perform some sanity checks. Do the preliminary plumbing
to have it called.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20200106145645.4539-4-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-01-08 11:01:59 +11:00
Cédric Le Goater 245cdb7f54 ppc/pnv: Introduce a "xics" property under the POWER8 chip
POWER8 is the only chip using the XICS interface. Add a "xics" link
and a XICSFabric attribute under this chip to remove the use of
qdev_get_machine()

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20200106145645.4539-3-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-01-08 11:01:59 +11:00
Greg Kurz 6cc64796f2 spapr/xive: Use device_class_set_parent_realize()
The XIVE router base class currently inherits an empty realize hook
from the sysbus device base class, but it will soon implement one
of its own to perform some sanity checks. Do the preliminary plumbing
to have it called.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191219181155.32530-6-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-01-08 11:01:59 +11:00
Greg Kurz 5084c8b763 ppc/pnv: Drop PnvChipClass::type
It isn't used anymore.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <157623844102.360005.12070225703151669294.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-12-17 10:59:11 +11:00
Greg Kurz 70c059e926 ppc/pnv: Introduce PnvChipClass::xscom_pcba() method
The XSCOM bus is implemented with a QOM interface, which is mostly
generic from a CPU type standpoint, except for the computation of
addresses on the Pervasive Connect Bus (PCB) network. This is handled
by the pnv_xscom_pcba() function with a switch statement based on
the chip_type class level attribute of the CPU chip.

This can be achieved using QOM. Also the address argument is masked with
PNV_XSCOM_SIZE - 1, which is for POWER8 only. Addresses may have different
sizes with other CPU types. Have each CPU chip type handle the appropriate
computation with a QOM xscom_pcba() method.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <157623843543.360005.13996472463887521794.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-12-17 10:59:11 +11:00
Greg Kurz 3caf7bd0a2 ppc/pnv: Drop pnv_chip_is_power9() and pnv_chip_is_power10() helpers
They aren't used anymore.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <157623842986.360005.1787401623906380181.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-12-17 10:59:11 +11:00
Greg Kurz c396c58a02 ppc/pnv: Pass content of the "compatible" property to pnv_dt_xscom()
Since pnv_dt_xscom() is called from chip specific dt_populate() hooks,
it shouldn't have to guess the chip type in order to populate the
"compatible" property. Just pass the compat string and its size as
arguments.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <157623842430.360005.9513965612524265862.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-12-17 10:59:11 +11:00
Greg Kurz 3f5b45ca4f ppc/pnv: Pass XSCOM base address and address size to pnv_dt_xscom()
Since pnv_dt_xscom() is called from chip specific dt_populate() hooks,
it shouldn't have to guess the chip type in order to populate the "reg"
property. Just pass the base address and address size as arguments.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <157623841868.360005.17577624823547136435.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-12-17 10:59:11 +11:00
Greg Kurz c4b2c40c0e ppc/pnv: Introduce PnvChipClass::xscom_core_base() method
The pnv_chip_core_realize() function configures the XSCOM MMIO subregion
for each core of a single chip. The base address of the subregion depends
on the CPU type. Its computation is currently open-code using the
pnv_chip_is_powerXX() helpers. This can be achieved with QOM. Introduce
a method for this in the base chip class and implement it in child classes.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <157623841311.360005.4705705734873339545.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-12-17 10:59:11 +11:00
Greg Kurz 85913070a6 ppc/pnv: Introduce PnvChipClass::intc_print_info() method
The pnv_pic_print_info() callback checks the type of the chip in order
to forward to the request appropriate interrupt controller. This can
be achieved with QOM. Introduce a method for this in the base chip class
and implement it in child classes.

This also prepares ground for the upcoming interrupt controller of POWER10
chips.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <157623840755.360005.5002022339473369934.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-12-17 10:59:10 +11:00
Greg Kurz acc39abb31 ppc/pnv: Drop pnv_is_power9() and pnv_is_power10() helpers
They aren't used anymore.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <157623840200.360005.1300941274565357363.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-12-17 10:59:10 +11:00
Greg Kurz 7a90c6a1b6 ppc/pnv: Introduce PnvMachineClass::dt_power_mgt()
We add an extra node to advertise power management on some machines,
namely powernv9 and powernv10. This is achieved by using the
pnv_is_power9() and pnv_is_power10() helpers.

This can be achieved with QOM. Add a method to the base class for
powernv machines and have it implemented by machine types that
support power management instead.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <157623839642.360005.9243510140436689941.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-12-17 10:59:10 +11:00
Greg Kurz d76f2da7a5 ppc/pnv: Introduce PnvMachineClass and PnvMachineClass::compat
The pnv_dt_create() function generates different contents for the
"compatible" property of the root node in the DT, depending on the
CPU type. This is open coded with multiple ifs using pnv_is_powerXX()
helpers.

It seems cleaner to achieve with QOM. Introduce a base class for the
powernv machine and a compat attribute that each child class can use
to provide the value for the "compatible" property.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <157623839085.360005.4046508784077843216.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
[dwg: Folded in small fix Greg spotted after posting]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-12-17 10:58:49 +11:00
Greg Kurz 248e4e924e ppc/pnv: Drop PnvPsiClass::chip_type
It isn't used anymore.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <157623838530.360005.15470128760871845396.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-12-17 10:39:48 +11:00
Greg Kurz 41c4ef7009 ppc/pnv: Introduce PnvPsiClass::compat
The Processor Service Interface (PSI) model has a chip_type class level
attribute, which is used to generate the content of the "compatible" DT
property according to the CPU type.

Since the PSI model already has specialized classes for each supported
CPU type, it seems cleaner to achieve this with QOM. Provide the content
of the "compatible" property with a new class level attribute.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <157623837974.360005.14706607446188964477.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-12-17 10:39:48 +11:00
Greg Kurz aeb7a330f4 ppc: Drop useless extern annotation for functions
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <157623837421.360005.412120366652768311.stgit@bahia.lan>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-12-17 10:39:48 +11:00
Cédric Le Goater 3a1b70b66b ppc/pnv: Fix OCC common area region mapping
The OCC common area is mapped at a unique address on the system and
each OCC is assigned a segment to expose its sensor data :

  -------------------------------------------------------------------------
  | Start (Offset from | End           | Size     |Description            |
  | BAR2 base address) |               |          |                       |
  -------------------------------------------------------------------------
  |    0x00580000      |  0x005A57FF   |150kB     |OCC 0 Sensor Data Block|
  |    0x005A5800      |  0x005CAFFF   |150kB     |OCC 1 Sensor Data Block|
  |        :           |       :       |  :       |            :          |
  |    0x00686800      |  0x006ABFFF   |150kB     |OCC 7 Sensor Data Block|
  |    0x006AC000      |  0x006FFFFF   |336kB     |Reserved               |
  -------------------------------------------------------------------------

Maximum size is 1.5MB.

We could define a "OCC common area" memory region at the machine level
and sub regions for each OCC. But it adds some extra complexity to the
models. Fix the current layout with a simpler model.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191211082912.2625-3-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-12-17 10:39:48 +11:00
Cédric Le Goater 8f09231631 ppc/pnv: Introduce PBA registers
The PBA bridge unit (Power Bus Access) connects the OCC (On Chip
Controller) to the Power bus and System Memory. The PBA is used to
gather sensor data, for power management, for sleep states, for
initial boot, among other things.

The PBA logic provides a set of four registers PowerBus Access Base
Address Registers (PBABAR0..3) which map the OCC address space to the
PowerBus space. These registers are setup by the initial FW and define
the PowerBus Range of system memory that can be accessed by PBA.

The current modeling of the PBABAR registers is done under the common
XSCOM handlers. We introduce a specific XSCOM regions for these
registers and fix :

 - BAR sizes and BAR masks
 - The mapping of the OCC common area. It is common to all chips and
   should be mapped once.  We will address per-OCC area in the next
   change.
 - OCC common area is in BAR 3 on P8

Inspired by previous work of Balamuruhan S <bala24@linux.ibm.com>

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191211082912.2625-2-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-12-17 10:39:48 +11:00
Greg Kurz 90cce00c7b ppc/pnv: Make PnvXScomInterface an incomplete type
PnvXScomInterface is an interface instance. It should never be
dereferenced. Drop the dummy type definition for extra safety,
which is the common practice with QOM interfaces.

While here also convert the bogus OBJECT_CHECK() to INTERFACE_CHECK().

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <157608025541.186670.1577861507610404326.stgit@bahia.lan>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-12-17 10:39:48 +11:00
Suraj Jitindar Singh 5cc7e69f6d target/ppc: Work [S]PURR implementation and add HV support
The Processor Utilisation of Resources Register (PURR) and Scaled
Processor Utilisation of Resources Register (SPURR) provide an estimate
of the resources used by the thread, present on POWER7 and later
processors.

Currently the [S]PURR registers simply count at the rate of the
timebase.

Preserve this behaviour but rework the implementation to store an offset
like the timebase rather than doing the calculation manually. Also allow
hypervisor write access to the register along with the currently
available read access.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
[ clg: rebased on current ppc tree ]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191128134700.16091-3-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-12-17 10:39:48 +11:00
Suraj Jitindar Singh 5d62725b2f target/ppc: Implement the VTB for HV access
The virtual timebase register (VTB) is a 64-bit register which
increments at the same rate as the timebase register, present on POWER8
and later processors.

The register is able to be read/written by the hypervisor and read by
the supervisor. All other accesses are illegal.

Currently the VTB is just an alias for the timebase (TB) register.

Implement the VTB so that is can be read/written independent of the TB.
Make use of the existing method for accessing timebase facilities where
by the compensation is stored and used to compute the value on reads/is
updated on writes.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
[ clg: rebased on current ppc tree ]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191128134700.16091-2-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-12-17 10:39:48 +11:00
Cédric Le Goater 2661f6ab2b ppc/pnv: add a LPC Controller model for POWER10
Same a POWER9, only the MMIO window changes.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191205184454.10722-6-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-12-17 10:39:48 +11:00
Cédric Le Goater 8b50ce8505 ppc/pnv: add a PSI bridge model for POWER10
The POWER10 PSIHB controller is very similar to the one on POWER9. We
should probably introduce a common PnvPsiXive object.

The ESB page size should be changed to 64k when P10 support is ready.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191205184454.10722-5-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-12-17 10:39:48 +11:00
Cédric Le Goater 2b548a4255 ppc/pnv: Introduce a POWER10 PnvChip and a powernv10 machine
This is an empty shell with the XSCOM bus and cores. The chip controllers
will come later.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191205184454.10722-3-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-12-17 10:39:48 +11:00
Greg Kurz 401774387a ppc: Deassert the external interrupt pin in KVM on reset
When a CPU is reset, QEMU makes sure no interrupt is pending by clearing
CPUPPCstate::pending_interrupts in ppc_cpu_reset(). In the case of a
complete machine emulation, eg. a sPAPR machine, an external interrupt
request could still be pending in KVM though, eg. an IPI. It will be
eventually presented to the guest, which is supposed to acknowledge it at
the interrupt controller. If the interrupt controller is emulated in QEMU,
either XICS or XIVE, ppc_set_irq() won't deassert the external interrupt
pin in KVM since it isn't pending anymore for QEMU. When the vCPU re-enters
the guest, the interrupt request is still pending and the vCPU will try
again to acknowledge it. This causes an infinite loop and eventually hangs
the guest.

The code has been broken since the beginning. The issue wasn't hit before
because accel=kvm,kernel-irqchip=off is an awkward setup that never got
used until recently with the LC92x IBM systems (aka, Boston).

Add a ppc_irq_reset() function to do the necessary cleanup, ie. deassert
the IRQ pins of the CPU in QEMU and most importantly the external interrupt
pin for this vCPU in KVM.

Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <157548861740.3650476.16879693165328764758.stgit@bahia.lan>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-12-17 10:39:48 +11:00
David Gibson d1d32d6255 spapr: Simplify ovec diff
spapr_ovec_diff(ov, old, new) has somewhat complex semantics.  ov is set
to those bits which are in new but not old, and it returns as a boolean
whether or not there are any bits in old but not new.

It turns out that both callers only care about the second, not the first.
This is basically equivalent to a bitmap subset operation, which is easier
to understand and implement.  So replace spapr_ovec_diff() with
spapr_ovec_subset().

Cc: Mike Roth <mdroth@linux.vnet.ibm.com>

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cedric Le Goater <clg@fr.ibm.com>
2019-12-17 10:39:48 +11:00
David Gibson 0c21e07354 spapr: Fold h_cas_compose_response() into h_client_architecture_support()
spapr_h_cas_compose_response() handles the last piece of the PAPR feature
negotiation process invoked via the ibm,client-architecture-support OF
call.  Its only caller is h_client_architecture_support() which handles
most of the rest of that process.

I believe it was placed in a separate file originally to handle some
fiddly dependencies between functions, but mostly it's just confusing
to have the CAS process split into two pieces like this.  Now that
compose response is simplified (by just generating the whole device
tree anew), it's cleaner to just fold it into
h_client_architecture_support().

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cedric Le Goater <clg@fr.ibm.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
2019-12-17 10:39:48 +11:00
Cédric Le Goater d302e00080 ppc/pnv: Dump the XIVE NVT table
This is useful to dump the saved contexts of the vCPUs : configuration
of the base END index of the vCPU and the Interrupt Pending Buffer
register, which is updated when an interrupt can not be presented.

When dumping the NVT table, we skip empty indirect pages which are not
necessarily allocated.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191125065820.927-21-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-12-17 10:39:48 +11:00
Cédric Le Goater f22f56dd48 ppc/pnv: Extend XiveRouter with a get_block_id() handler
When doing CAM line compares, fetch the block id from the interrupt
controller which can have set the PC_TCTXT_CHIPID field.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191125065820.927-20-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-12-17 10:39:48 +11:00
Cédric Le Goater dc2526e45a ppc/pnv: Introduce a pnv_xive_block_id() helper
When PC_TCTXT_CHIPID_OVERRIDE is configured, the PC_TCTXT_CHIPID field
overrides the hardwired chip ID in the Powerbus operations and for CAM
compares. This is typically used in the one block-per-chip configuration
to associate a unique block id number to each IC of the system.

Simplify the model with a pnv_xive_block_id() helper and remove
'tctx_chipid' which becomes useless.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191125065820.927-19-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-12-17 10:39:48 +11:00
Cédric Le Goater a5b841f18c ppc/xive: Introduce a xive_tctx_ipb_update() helper
We will use it to resend missed interrupts when a vCPU context is
pushed on a HW thread.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191125065820.927-17-clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-12-17 10:39:48 +11:00
Cédric Le Goater 8b3aaaa1a9 ppc/xive: Remove the get_tctx() XiveRouter handler
It is now unused.

Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191125065820.927-16-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-12-17 10:39:48 +11:00
Cédric Le Goater d024a2c111 ppc/xive: Move the TIMA operations to the controller model
On the P9 Processor, the thread interrupt context registers of a CPU
can be accessed "directly" when by load/store from the CPU or
"indirectly" by the IC through an indirect TIMA page. This requires to
configure first the PC_TCTXT_INDIRx registers.

Today, we rely on the get_tctx() handler to deduce from the CPU PIR
the chip from which the TIMA access is being done. By handling the
TIMA memory ops under the interrupt controller model of each machine,
we can uniformize the TIMA direct and indirect ops under PowerNV. We
can also check that the CPUs have been enabled in the XIVE controller.

This prepares ground for the future versions of XIVE.

Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191125065820.927-15-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-12-17 10:39:48 +11:00
Cédric Le Goater 5373c61d6a ppc/pnv: Clarify how the TIMA is accessed on a multichip system
The TIMA region gives access to the thread interrupt context registers
of a CPU. It is mapped at the same address on all chips and can be
accessed by any CPU of the system. To identify the chip from which the
access is being done, the PowerBUS uses a 'chip' field in the
load/store messages. QEMU does not model these messages, instead, we
extract the chip id from the CPU PIR and do a lookup at the machine
level to fetch the targeted interrupt controller.

Introduce pnv_get_chip() and pnv_xive_tm_get_xive() helpers to clarify
this process in pnv_xive_get_tctx(). The latter will be removed in the
subsequent patches but the same principle will be kept.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191125065820.927-14-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-12-17 10:39:48 +11:00
Greg Kurz 4ffb749688 spapr: Pass the maximum number of vCPUs to the KVM interrupt controller
The XIVE and XICS-on-XIVE KVM devices on POWER9 hosts can greatly reduce
their consumption of some scarce HW resources, namely Virtual Presenter
identifiers, if they know the maximum number of vCPUs that may run in the
VM.

Prepare ground for this by passing the value down to xics_kvm_connect()
and kvmppc_xive_connect(). This is purely mechanical, no functional
change.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <157478678301.67101.2717368060417156338.stgit@bahia.tlslab.ibm.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-12-17 10:39:48 +11:00
Cédric Le Goater 4fb42350dc ppc/xive: Extend the TIMA operation with a XivePresenter parameter
The TIMA operations are performed on behalf of the XIVE IVPE sub-engine
(Presenter) on the thread interrupt context registers. The current
operations supported by the model are simple and do not require access
to the controller but more complex operations will need access to the
controller NVT table and to its configuration.

Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191125065820.927-13-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-12-17 10:39:48 +11:00
Cédric Le Goater d3eb47a2a1 ppc/xive: Introduce a XiveFabric interface
The XiveFabric QOM interface acts as the PowerBUS interface between
the interrupt controller and the system and should be implemented by
the QEMU machine. On HW, the XIVE sub-engine is responsible for the
communication with the other chip is the Common Queue (CQ) bridge
unit.

This interface offers a 'match_nvt' handler to perform the CAM line
matching when looking for a XIVE Presenter with a dispatched NVT.

Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191125065820.927-9-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-12-17 10:39:48 +11:00
Cédric Le Goater 119eaa9d11 ppc/pnv: Fix TIMA indirect access
When the TIMA of a CPU needs to be accessed from the indirect page,
the thread id of the target CPU is first stored in the PC_TCTXT_INDIR0
register. This thread id is relative to the chip and not to the system.

Introduce a helper routine to look for a CPU of a given PIR and fix
pnv_xive_get_indirect_tctx() to scan only the threads of the local
chip and not the whole machine.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191125065820.927-8-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-12-17 10:39:48 +11:00
Cédric Le Goater 5014c60261 ppc/pnv: Introduce a pnv_xive_is_cpu_enabled() helper
and use this helper to exclude CPUs which are not enabled in the XIVE
controller.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191125065820.927-7-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-12-17 10:39:47 +11:00
Cédric Le Goater 4a89e20458 ppc: Introduce a ppc_cpu_pir() helper
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191125065820.927-6-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-12-17 10:39:47 +11:00
Greg Kurz 4fa28f2390 ppc/pnv: Instantiate cores separately
Allocating a big void * array to store multiple objects isn't a
recommended practice for various reasons:
 - no compile time type checking
 - potential dangling pointers if a reference on an individual is
  taken and the array is freed later on
 - duplicate boiler plate everywhere the array is browsed through

Allocate an array of pointers and populate it instead.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191125065820.927-4-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-12-17 10:39:47 +11:00
Cédric Le Goater 13bee8521c ppc/xive: Introduce a XivePresenter interface
When the XIVE IVRE sub-engine (XiveRouter) looks for a Notification
Virtual Target (NVT) to notify, it broadcasts a message on the
PowerBUS to find an XIVE IVPE sub-engine (Presenter) with the NVT
dispatched on one of its HW threads, and then forwards the
notification if any response was received.

The current XIVE presenter model is sufficient for the pseries machine
because it has a single interrupt controller device, but the PowerNV
machine can have multiple chips each having its own interrupt
controller. In this case, the XIVE presenter model is too simple and
the CAM line matching should scan all chips of the system.

To start fixing this issue, we first extend the XIVE Router model with
a new XivePresenter QOM interface representing the XIVE IVPE
sub-engine. This interface exposes a 'match_nvt' handler which the
sPAPR and PowerNV XIVE Router models will need to implement to perform
the CAM line matching.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191125065820.927-2-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-12-17 10:39:47 +11:00
Cédric Le Goater e2392d4395 ppc/pnv: Create BMC devices at machine init
The BMC of the OpenPOWER systems monitors the machine state using
sensors, controls the power and controls the access to the PNOR flash
device containing the firmware image required to boot the host.

QEMU models the power cycle process, access to the sensors and access
to the PNOR device. But, for these features to be available, the QEMU
PowerNV machine needs two extras devices on the command line, an IPMI
BT device for communication and a BMC backend device:

  -device ipmi-bmc-sim,id=bmc0 -device isa-ipmi-bt,bmc=bmc0,irq=10

The BMC properties are then defined accordingly in the device tree and
OPAL self adapts. If a BMC device and an IPMI BT device are not
available, OPAL does not try to communicate with the BMC in any
manner. This is not how real systems behave.

To be closer to the default behavior, create an IPMI BMC simulator
device and an IPMI BT device at machine initialization time. We loose
the ability to define an external BMC device but there are benefits:

  - a better match with real systems,
  - a better test coverage of the OPAL code,
  - system powerdown and reset commands that work,
  - a QEMU device tree compliant with the specifications (*).

(*) Still needs a MBOX device.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191121162340.11049-1-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-12-17 10:39:47 +11:00
Cédric Le Goater ca661fae81 ppc/pnv: Add HIOMAP commands
This activates HIOMAP support on the QEMU PowerNV machine. The PnvPnor
model is used to access the flash contents. The model simply maps the
contents at a fix offset and enables or disables the mapping.

HIOMAP Protocol description :

  https://github.com/openbmc/hiomapd/blob/master/Documentation/protocol.md

Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191028070027.22752-3-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-12-17 10:39:47 +11:00
Cédric Le Goater e6488eeba8 ppc/xive: Introduce helpers for the NVT id
Each vCPU in the system is identified with an NVT identifier which is
pushed in the OS CAM line (QW1W2) of the HW thread interrupt context
register when the vCPU is dispatched on a HW thread. This identifier
is used by the presenter subengine to find a matching target to notify
of an event. It is also used to fetch the associate NVT structure
which may contain pending interrupts that need a resend.

Add a couple of helpers for the NVT ids. The NVT space is 19 bits
wide, giving a maximum of 512K per chip.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191115162436.30548-3-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-12-17 10:39:47 +11:00
Cédric Le Goater 516883c2f1 ppc/xive: Record the IPB in the associated NVT
When an interrupt can not be presented to a vCPU, because it is not
running on any of the HW treads, the XIVE presenter updates the
Interrupt Pending Buffer register of the associated XIVE NVT
structure. This is only done if backlog is activated in the END but
this is generally the case.

The current code assumes that the fields of the NVT structure is
architected with the same layout of the thread interrupt context
registers. Fix this assumption and define an offset for the IPB
register backup value in the NVT.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191115162436.30548-2-clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-12-17 10:39:47 +11:00
Cédric Le Goater 35dde57662 ppc/pnv: Add a PNOR model
On a POWERPC PowerNV system, the host firmware is stored in a PNOR
flash chip which contents is mapped on the LPC bus. This model adds a
simple dummy device to map the contents of a block device in the host
address space.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191021131215.3693-2-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-12-17 10:39:47 +11:00
Greg Kurz 0990ce6a2e ppc: Add intc_destroy() handlers to SpaprInterruptController/PnvChip
SpaprInterruptControllerClass and PnvChipClass have an intc_create() method
that calls the appropriate routine, ie. icp_create() or xive_tctx_create(),
to establish the link between the VCPU and the presenter component of the
interrupt controller during realize.

There aren't any symmetrical call to be called when the VCPU gets unrealized
though. It is assumed that object_unparent() is the only thing to do.

This is questionable because the parenting logic around the CPU and
presenter objects is really an implementation detail of the interrupt
controller. It shouldn't be open-coded in the machine code.

Fix this by adding an intc_destroy() method that undoes what was done in
intc_create(). Also NULLify the presenter pointers to avoid having
stale pointers around. This will allow to reliably check if a vCPU has
a valid presenter.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <157192724208.3146912.7254684777515287626.stgit@bahia.lan>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
2019-11-18 11:49:11 +01:00
Cédric Le Goater 97c00c5444 spapr/xive: Set the OS CAM line at reset
When a Virtual Processor is scheduled to run on a HW thread, the
hypervisor pushes its identifier in the OS CAM line. When running with
kernel_irqchip=off, QEMU needs to emulate the same behavior.

Set the OS CAM line when the interrupt presenter of the sPAPR core is
reset. This will also cover the case of hot-plugged CPUs.

This change also has the benefit to remove the use of CPU_FOREACH()
which can be unsafe.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <20191022163812.330-8-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-10-24 13:34:15 +11:00
Cédric Le Goater d49e8a9b46 ppc: Reset the interrupt presenter from the CPU reset handler
On the sPAPR machine and PowerNV machine, the interrupt presenters are
created by a machine handler at the core level and are reset
independently. This is not consistent and it raises issues when it
comes to handle hot-plugged CPUs. In that case, the presenters are not
reset. This is less of an issue in XICS, although a zero MFFR could
be a concern, but in XIVE, the OS CAM line is not set and this breaks
the presenting algorithm. The current code has workarounds which need
a global cleanup.

Extend the sPAPR IRQ backend and the PowerNV Chip class with a new
cpu_intc_reset() handler called by the CPU reset handler and remove
the XiveTCTX reset handler which is now redundant.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191022163812.330-6-clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-10-24 13:33:45 +11:00
Cédric Le Goater aa5ac64b23 ppc/pnv: Add a PnvChip pointer to PnvCore
We will use it to reset the interrupt presenter from the CPU reset
handler.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <20191022163812.330-5-clg@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-10-24 13:33:33 +11:00
David Gibson 54255c1f65 spapr: Move SpaprIrq::nr_xirqs to SpaprMachineClass
For the benefit of peripheral device allocation, the number of available
irqs really wants to be the same on a given machine type version,
regardless of what irq backends we are using.  That's the case now, but
only because we make sure the different SpaprIrq instances have the same
value except for the special legacy one.

Since this really only depends on machine type version, move the value to
SpaprMachineClass instead of SpaprIrq.  This also puts the code to set it
to the lower value on old machine types right next to setting
legacy_irq_allocation, which needs to go hand in hand.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
2019-10-24 09:36:55 +11:00
David Gibson 8cbe71ecb8 spapr: Remove SpaprIrq::nr_msis
The nr_msis value we use here has to line up with whether we're using
legacy or modern irq allocation.  Therefore it's safer to derive it based
on legacy_irq_allocation rather than having SpaprIrq contain a canned
value.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
2019-10-24 09:36:55 +11:00
David Gibson 605994e5b7 spapr, xics, xive: Move SpaprIrq::post_load hook to backends
The remaining logic in the post_load hook really belongs to the interrupt
controller backends, and just needs to be called on the active controller
(after the active controller is set to the right thing based on the
incoming migration in the generic spapr_irq_post_load() logic).

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
2019-10-24 09:36:55 +11:00
David Gibson 567192d486 spapr, xics, xive: Move SpaprIrq::reset hook logic into activate/deactivate
It turns out that all the logic in the SpaprIrq::reset hooks (and some in
the SpaprIrq::post_load hooks) isn't really related to resetting the irq
backend (that's handled by the backends' own reset routines).  Rather its
about getting the backend ready to be the active interrupt controller or
stopping being the active interrupt controller - reset (and post_load) is
just the only time that changes at present.

To make this flow clearer, move the logic into the explicit backend
activate and deactivate hooks.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
2019-10-24 09:36:55 +11:00
David Gibson 0a17e0c39f spapr: Remove SpaprIrq::init_kvm hook
This hook is a bit odd.  The only caller is spapr_irq_init_kvm(), but
it explicitly takes an SpaprIrq *, so it's never really called through the
current SpaprIrq.  Essentially this is just a way of passing through a
function pointer so that spapr_irq_init_kvm() can handle some
configuration and error handling logic without duplicating it between the
xics and xive reset paths.

So, make it just take that function pointer.  Because of earlier reworks
to the KVM connect/disconnect code in the xics and xive backends we can
also eliminate some wrapper functions and streamline error handling a bit.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
2019-10-24 09:36:55 +11:00
David Gibson 98a39a7927 spapr, xics, xive: Match signatures for XICS and XIVE KVM connect routines
Both XICS and XIVE have routines to connect and disconnect KVM with
similar but not identical signatures.  This adjusts them to match
exactly, which will be useful for further cleanups later.

While we're there, we add an explicit return value to the connect path
to streamline error reporting in the callers.  We remove error
reporting the disconnect path.  In the XICS case this wasn't used at
all.  In the XIVE case the only error case was if the KVM device was
set up, but KVM didn't have the capability to do so which is pretty
obviously impossible.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
2019-10-24 09:36:55 +11:00
David Gibson 05289273c0 spapr, xics, xive: Move dt_populate from SpaprIrq to SpaprInterruptController
This method depends only on the active irq controller.  Now that we've
formalized the notion of active controller we can dispatch directly
through that, rather than dispatching via SpaprIrq with the dual
version having to do a second conditional dispatch.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
2019-10-24 09:36:55 +11:00
David Gibson 328d8eb24d spapr, xics, xive: Move print_info from SpaprIrq to SpaprInterruptController
This method depends only on the active irq controller.  Now that we've
formalized the notion of active controller we can dispatch directly
through that, rather than dispatching via SpaprIrq with the dual
version having to do a second conditional dispatch.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
2019-10-24 09:36:55 +11:00
David Gibson 7bcdbcca2f spapr, xics, xive: Move set_irq from SpaprIrq to SpaprInterruptController
This method depends only on the active irq controller.  Now that we've
formalized the notion of active controller we can dispatch directly through
that, rather than dispatching via SpaprIrq with the dual version having
to do a second conditional dispatch.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
2019-10-24 09:36:55 +11:00
David Gibson 81106ddd1a spapr: Formalize notion of active interrupt controller
spapr now has the mechanism of constructing both XICS and XIVE instances of
the SpaprInterruptController interface.  However, only one of the interrupt
controllers will actually be active at any given time, depending on feature
negotiation with the guest.  This is handled in the current code via
spapr_irq_current() which checks the OV5 vector from feature negotiation to
determine the current backend.

Determining the active controller at the point we need it like this
can be pretty confusing, because it makes it very non obvious at what
points the active controller can change.  This can make it difficult
to reason about the code and where a change of active controller could
appear in sequence with other events.

Make this mechanism more explicit by adding an 'active_intc' pointer
and an explicit spapr_irq_update_active_intc() function to update it
from the CAS state.  We also add hooks on the intc backend which will
get called when it is activated or deactivated.

For now we just introduce the switch and hooks, later patches will
actually start using them.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
2019-10-24 09:36:55 +11:00
David Gibson 0b0e52b131 spapr, xics, xive: Move irq claim and free from SpaprIrq to SpaprInterruptController
These methods, like cpu_intc_create, really belong to the interrupt
controller, but need to be called on all possible intcs.

Like cpu_intc_create, therefore, make them methods on the intc and
always call it for all existing intcs.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
2019-10-24 09:36:55 +11:00
David Gibson ebd6be089b spapr, xics, xive: Move cpu_intc_create from SpaprIrq to SpaprInterruptController
This method essentially represents code which belongs to the interrupt
controller, but needs to be called on all possible intcs, rather than
just the currently active one.  The "dual" version therefore calls
into the xics and xive versions confusingly.

Handle this more directly, by making it instead a method on the intc
backend, and always calling it on every backend that exists.

While we're there, streamline the error reporting a bit.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
2019-10-24 09:36:55 +11:00
David Gibson 150e25f85b spapr, xics, xive: Introduce SpaprInterruptController QOM interface
The SpaprIrq structure is used to represent ths spapr machine's irq
backend.  Except that it kind of conflates two concepts: one is the
backend proper - a specific interrupt controller that we might or
might not be using, the other is the irq configuration which covers
the layout of irq space and which interrupt controllers are allowed.

This leads to some pretty confusing code paths for the "dual"
configuration where its hooks redirect to other SpaprIrq structures
depending on the currently active irq controller.

To clean this up, we start by introducing a new
SpaprInterruptController QOM interface to represent strictly an
interrupt controller backend, not counting anything configuration
related.  We implement this interface in the XICs and XIVE interrupt
controllers, and in future we'll move relevant methods from SpaprIrq
into it.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
2019-10-24 09:36:55 +11:00
Greg Kurz 29cb418749 spapr: Set VSMT to smp_threads by default
Support for setting VSMT is available in KVM since linux-4.13. Most distros
that support KVM on POWER already have it. It thus seem reasonable enough
to have the default machine to set VSMT to smp_threads.

This brings contiguous VCPU ids and thus brings their upper bound down to
the machine's max_cpus. This is especially useful for XIVE KVM devices,
which may thus allocate only one VP descriptor per VCPU.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <157010411885.246126.12610015369068227139.stgit@bahia.lan>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-10-24 09:36:55 +11:00
Cédric Le Goater 106695ab12 ppc/pnv: Improve trigger data definition
The trigger data is used for both triggers of a HW source interrupts,
PHB, PSI, and triggers for rerouting interrupts between interrupt
controllers.

When an interrupt is rerouted, the trigger data follows an "END
trigger" format. In that case, the remote IC needs EAS containing an
END index to perform a lookup of an END.

An END trigger, bit0 of word0 set to '1', is defined as :

             |0123|4567|0123|4567|0123|4567|0123|4567|
    W0 E=1   |1P--|BLOC|          END IDX            |
    W1 E=1   |M   |           END DATA               |

An EAS is defined as :

             |0123|4567|0123|4567|0123|4567|0123|4567|
    W0       |V---|BLOC|          END IDX            |
    W1       |M   |          END DATA                |

The END trigger adds an extra 'PQ' bit, bit1 of word0 set to '1',
signaling that the PQ bits have been checked. That bit is unused in
the initial EAS definition.

When a HW device performs the trigger, the trigger data follows an
"EAS trigger" format because the trigger data in that case contains an
EAS index which the IC needs to look for.

An EAS trigger, bit0 of word0 set to '0', is defined as :

             |0123|4567|0123|4567|0123|4567|0123|4567|
    W0 E=0   |0P--|---- ---- ---- ---- ---- ---- ----|
    W1 E=0   |BLOC|            EAS INDEX             |

There is also a 'PQ' bit, bit1 of word0 to '1', signaling that the
PQ bits have been checked.

Introduce these new trigger bits and rename the XIVE_SRCNO macros in
XIVE_EAS to reflect better the nature of the data.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191007084102.29776-2-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-10-24 09:36:55 +11:00
David Gibson f478d9af21 spapr: Eliminate SpaprIrq::init hook
This method is used to set up the interrupt backends for the current
configuration.  However, this means some confusing redirection between
the "dual" mode init and the init hooks for xics only and xive only modes.

Since we now have simple flags indicating whether XICS and/or XIVE are
supported, it's easier to just open code each initialization directly in
spapr_irq_init().  This will also make some future cleanups simpler.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
2019-10-04 19:08:23 +10:00
David Gibson ca62823b79 spapr: Use less cryptic representation of which irq backends are supported
SpaprIrq::ov5 stores the value for a particular byte in PAPR option vector
5 which indicates whether XICS, XIVE or both interrupt controllers are
available.  As usual for PAPR, the encoding is kind of overly complicated
and confusing (though to be fair there are some backwards compat things it
has to handle).

But to make our internal code clearer, have SpaprIrq encode more directly
which backends are available as two booleans, and derive the OV5 value from
that at the point we need it.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
2019-10-04 19:08:23 +10:00
David Gibson e594c2ad1c xive: Improve irq claim/free path
spapr_xive_irq_claim() returns a bool to indicate if it succeeded.
But most of the callers and one callee use int return values and/or an
Error * with more information instead.  In any case, ints are a more
common idiom for success/failure states than bools (one never knows
what sense they'll be in).

So instead change to an int return value to indicate presence of error
+ an Error * to describe the details through that call chain.

It also didn't actually check if the irq was already claimed, which is
one of the primary purposes of the claim path, so do that.

spapr_xive_irq_free() also returned a bool... which no callers checked
and was always true, so just drop it.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
2019-10-04 19:08:23 +10:00
David Gibson f233cee97b spapr: Handle freeing of multiple irqs in frontend only
spapr_irq_free() can be used to free multiple irqs at once. That's useful
for its callers, but there's no need to make the individual backend hooks
handle this.  We can loop across the irqs in spapr_irq_free() itself and
have the hooks just do one at time.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
2019-10-04 19:08:23 +10:00
David Gibson 14789694cd spapr: Eliminate SpaprIrq:get_nodename method
This method is used to determine the name of the irq backend's node in the
device tree, so that we can find its phandle (after SLOF may have modified
it from the phandle we initially gave it).

But, in the two cases the only difference between the node name is the
presence of a unit address.  Searching for a node name without considering
unit address is standard practice for the device tree, and
fdt_subnode_offset() will do exactly that, making this method unecessary.

While we're there, remove the XICS_NODENAME define.  The name
"interrupt-controller" is required by PAPR (and IEEE1275), and a bunch of
places assume it already.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
2019-10-04 19:08:22 +10:00
David Gibson af1861511d spapr: Simplify spapr_qirq() handling
Currently spapr_qirq(), whic is used to find the qemu_irq for an spapr
global irq number, redirects through the SpaprIrq::qirq method.  But
the array of qemu_irqs is allocated in the PAPR layer, not the
backends, and so the method implementations all return the same thing,
just differing in the preliminary checks they make.

So, we can remove the method, and just implement spapr_qirq() directly,
including all the relevant checks in one place.  We change all those
checks into assert()s as well, since a failure here indicates an error in
the calling code.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2019-10-04 19:08:22 +10:00
David Gibson fe9b61b246 spapr: Eliminate nr_irqs parameter to SpaprIrq::init
The only reason this parameter was needed was to work around the
inconsistent meaning of nr_irqs between xics and xive.  Now that we've
fixed that, we can consistently use the number directly in the SpaprIrq
configuration.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
2019-10-04 19:08:22 +10:00
David Gibson ad8de98636 spapr: Clarify and fix handling of nr_irqs
Both the XICS and XIVE interrupt backends have a "nr-irqs" property, but
it means slightly different things.  For XICS (or, strictly, the ICS) it
indicates the number of "real" external IRQs.  Those start at XICS_IRQ_BASE
(0x1000) and don't include the special IPI vector.  For XIVE, however, it
includes the whole IRQ space, including XIVE's many IPI vectors.

The spapr code currently doesn't handle this sensibly, with the
nr_irqs value in SpaprIrq having different meanings depending on the
backend.  We fix this by renaming nr_irqs to nr_xirqs and making it
always indicate just the number of external irqs, adjusting the value
we pass to XIVE accordingly.  We also move to using common constants
in most of the irq configurations, to make it clearer that the IRQ
space looks the same to the guest (and emulated devices), even if the
backend is different.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
2019-10-04 19:08:22 +10:00
David Gibson 7678b74a94 spapr: Replace spapr_vio_qirq() helper with spapr_vio_irq_pulse() helper
Every caller of spapr_vio_qirq() immediately calls qemu_irq_pulse() with
the result, so we might as well just fold that into the helper.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2019-10-04 19:08:22 +10:00
David Gibson 9db8c551c9 xics: Create sPAPR specific ICS subtype
We create a subtype of TYPE_ICS specifically for sPAPR.  For now all this
does is move the setup of the PAPR specific hcalls and RTAS calls to
the realize() function for this, rather than requiring the PAPR code to
explicitly call xics_spapr_init().  In future it will have some more
function.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
2019-10-04 19:08:22 +10:00
David Gibson 642e92719e xics: Merge TYPE_ICS_BASE and TYPE_ICS_SIMPLE classes
TYPE_ICS_SIMPLE is the only subtype of TYPE_ICS_BASE that's ever
instantiated.  The existence of different classes is mostly a hang
over from when we (misguidedly) had separate subtypes for the KVM and
non-KVM version of the device.

There could be some call for an abstract base type for ICS variants
that use a different representation of their state (PowerNV PHB3 might
want this).  The current split isn't really in the right place for
that though.  If we need this in future, we can re-implement it more
in line with what we actually need.

So, collapse the two classes together into just TYPE_ICS.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
2019-10-04 19:08:22 +10:00
David Gibson da2ef5b2f2 xics: Eliminate reset hook
Currently TYPE_XICS_BASE and TYPE_XICS_SIMPLE have their own reset methods,
using the standard technique for having the subtype call the supertype's
methods before doing its own thing.

But TYPE_XICS_SIMPLE is the only subtype of TYPE_XICS_BASE ever
instantiated, so there's no point having the split here.  Merge them
together into just an ics_reset() function.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
2019-10-04 19:08:22 +10:00
David Gibson 28976c99cf xics: Rename misleading ics_simple_*() functions
There are a number of ics_simple_*() functions that aren't actually
specific to TYPE_XICS_SIMPLE at all, and are equally valid on
TYPE_XICS_BASE.  Rename them to ics_*() accordingly.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
2019-10-04 19:08:22 +10:00
David Gibson d5803c7319 xics: Eliminate 'reject', 'resend' and 'eoi' class hooks
Currently ics_reject(), ics_resend() and ics_eoi() indirect through
class methods.  But there's only one implementation of each method,
the one in TYPE_ICS_SIMPLE.  TYPE_ICS_BASE has no implementation, but
it's never instantiated, and has no other subtypes.

So clean up by eliminating the method and just having ics_reject(),
ics_resend() and ics_eoi() contain the logic directly.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
2019-10-04 19:08:21 +10:00
David Gibson 00ed3da9b5 xics: Minor fixes for XICSFabric interface
Interface instances should never be directly dereferenced.  So, the common
practice is to make them incomplete types to make sure no-one does that.
XICSFrabric, however, had a dummy type which is less safe.

We were also using OBJECT_CHECK() where we should have been using
INTERFACE_CHECK().

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
2019-10-04 19:08:21 +10:00
Alexey Kardashevskiy 744a928cce spapr: Stop providing RTAS blob
SLOF implements one itself so let's remove it from QEMU. It is one less
image and simpler setup as the RTAS blob never stays in its initial place
anyway as the guest OS always decides where to put it.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-10-04 10:25:23 +10:00
David Gibson daa36379ce spapr: Simplify handling of pre ISA 3.0 guest workaround handling
Certain old guest versions don't understand the radix MMU introduced with
POWER ISA 3.0, but incorrectly select it if presented with the option at
CAS time.  We workaround this in qemu by explicitly excluding the radix
(and other ISA 3.0 linked) options if the guest doesn't explicitly note
support for ISA 3.0.

This is handled by the 'cas_legacy_guest_workaround' flag, which is pretty
vague.  Rename it to 'cas_pre_isa3_guest' to be clearer about what it's for.

In addition, we unnecessarily call spapr_populate_pa_features() with
different options when initially constructing the device tree and when
adjusting it at CAS time.  At the initial construct time cas_pre_isa3_guest
is already false, so we can still use the flag, rather than explicitly
overriding it to be false at the callsite.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2019-10-04 10:25:23 +10:00
Cédric Le Goater 4a99d40551 spapr/irq: Introduce an ics_irq_free() helper
It will help us to discard interrupt numbers which have not been
claimed in the next patch.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190911133937.2716-2-clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-10-04 10:25:23 +10:00
Balamuruhan S 3887d24123 hw/ppc/pnv_homer: add PowerNV homer device model
add PnvHomer device model to emulate homer memory access
for pstate table, occ-sensors, slw, occ static and dynamic
values for Power8 and Power9 chips.

Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Message-Id: <20190912093056.4516-4-bala24@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-10-04 10:25:23 +10:00
Balamuruhan S f3db82660d hw/ppc/pnv_occ: add sram device model for occ common area
emulate occ common area region with occ sram device model which
occ and skiboot uses it to communicate regarding sensors, slw
and HWMON in PowerNV emulated host.

Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Message-Id: <20190912093056.4516-3-bala24@linux.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-10-04 10:25:23 +10:00
Balamuruhan S 7454558c69 hw/ppc/pnv_xscom: retrieve homer/occ base address from PBA BARs
During PowerNV boot skiboot populates the device tree by
retrieving base address of homer/occ common area from
PBA BARs and prd ipoll mask by accessing xscom read/write
accesses.

Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Message-Id: <20190912093056.4516-2-bala24@linux.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-10-04 10:25:23 +10:00
Alexey Kardashevskiy 6c3829a265 spapr_pci: Advertise BAR reallocation capability
The pseries guests do not normally allocate PCI resources and rely on
the system firmware doing so. Furthermore at least at some point in
the past the pseries guests won't even allowed to change BARs, probably
it is still the case for phyp. So since the initial commit we have [1]
which prevents resource reallocation.

This is not a problem until we want specific BAR alignments, for example,
PAGE_SIZE==64k to make sure we can still map MMIO BARs directly. For
the boot time devices we handle this in SLOF [2] but since QEMU's RTAS
does not allocate BARs, the guest does this instead and does not align
BARs even if Linux is given pci=resource_alignment=16@pci:0:0 as
PCI_PROBE_ONLY makes Linux ignore alignment requests.

ARM folks added a dial to control PCI_PROBE_ONLY via the device tree [3].
This makes use of the dial to advertise to the guest that we can handle
BAR reassignments. This limits the change to the latest pseries machine
to avoid old guests explosion.

We do not remove the flag from [1] as pseries guests are still supported
under phyp so having that removed may cause problems.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/powerpc/platforms/pseries/setup.c?h=v5.1#n773
[2] https://git.qemu.org/?p=SLOF.git;a=blob;f=board-qemu/slof/pci-phb.fs;h=06729bcf77a0d4e900c527adcd9befe2a269f65d;hb=HEAD#l338
[3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f81c11af
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20190719043734.108462-1-aik@ozlabs.ru>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-08-29 09:46:07 +10:00
Peter Maydell f3b8f18ebf Monitor patches for 2019-08-21
-----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAl1dZKsSHGFybWJydUBy
 ZWRoYXQuY29tAAoJEDhwtADrkYZTJ4QP/10izA+dSofQ9404GRq3TNzwRCKugU44
 nES9CqDh6x5emx+ADQWYkugblgfH9GOvUaAUNtY+uFaEr55yC/F+VWeVXvyjt5U6
 ZpPZqIRDOHo2+PZrddr/KcKmiomS6plz03m9bzb3pYN1yIl2ZzgClAhAqWQLk0WB
 wwiY+YsJ83YR4sdiRMZkuF+UL7N8fSqYvIIj0yzM8+8ONDor9n16PoPeFg3JSsyG
 aMxXIUnSBZAVtClaNkUPtS0Wf9XEuqoG1rvMRV4Vv+eeb7fwA414DqanRJdLlGMA
 yNRtFcVyztCfjgVEXnY9JJlFe6pDkoe8ycoimQ4YA60C9c1DIMHqyjFWXRHfDwk8
 bYMSX6CTpfoEvbTfmwqYR6KSkb/KuXiFDmcYlTYFvIt3grhhdHQbru9vy+E5sm/b
 j3CPV2DTCkeGY+oZFfKIaQT9yoWZOhmMY5doMTYyinXygPTGQROUrHtzUeRXKmJZ
 arqDRmh+mlEiGETNeYQCI45eYCSDYxO+UNrhszxhmv6B1+ixhIrV2oXhi61vVBeY
 yngY4EILbuA2Z/E4BevJk91ESWJTr3UP13c6p7yf21iN4BD1KkHy5HoXCgYfQDeV
 4kar49g6WQ/VQEiwhi65Xd0OwstynkcV69F+kMagVMgaLeRsdU5ikGJQzxTeWJRl
 SPpc7oDwuAS+
 =2F3E
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/armbru/tags/pull-monitor-2019-08-21' into staging

Monitor patches for 2019-08-21

# gpg: Signature made Wed 21 Aug 2019 16:35:07 BST
# gpg:                using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653
# gpg:                issuer "armbru@redhat.com"
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full]
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>" [full]
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867  4E5F 3870 B400 EB91 8653

* remotes/armbru/tags/pull-monitor-2019-08-21:
  monitor/qmp: Update comment for commit 4eaca8de26
  qdev: Collect HMP handlers command handlers in qdev-monitor.c
  qapi: Move query-target from misc.json to machine.json
  hw/core: Move cpu.c, cpu.h from qom/ to hw/core/

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-08-22 10:31:21 +01:00
Markus Armbruster 2e5b09fd0e hw/core: Move cpu.c, cpu.h from qom/ to hw/core/
Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190709152053.16670-2-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
[Rebased onto merge commit 95a9457fd44; missed instances of qom/cpu.h
in comments replaced]
2019-08-21 13:24:01 +02:00
Greg Kurz e1588bcdd2 spapr/irq: Drop spapr_irq_msi_reset()
PHBs already take care of clearing the MSIs from the bitmap during reset
or unplug. No need to do this globally from the machine code. Rather add
an assert to ensure that PHBs have acted as expected.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <156415228966.1064338.190189424190233355.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
[dwg: Fix crash in qtest case where spapr->irq_map can be NULL at the
 new assert()]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-08-21 17:17:39 +10:00
Nicholas Piggin 93eac7b8f4 spapr: Implement ibm,suspend-me
This has been useful to modify and test the Linux pseries suspend
code but it requires modification to the guest to call it (due to
being gated by other unimplemented features). It is not otherwise
used by Linux yet, but work is slowly progressing there.

This allows a (lightly modified) guest kernel to suspend with
`echo mem > /sys/power/state` and be resumed with system_wakeup
monitor command.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20190722061752.22114-2-npiggin@gmail.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-08-21 17:17:39 +10:00
Cédric Le Goater c5e760e0f2 ppc/xive: Improve 'info pic' support
Provide a better output of the XIVE END structures including the
escalation information and extend the PowerNV machine 'info pic'
command with a dump of the END EAS table used for escalations.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190718115420.19919-9-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-08-21 17:17:39 +10:00
Cédric Le Goater ad31e2d242 ppc/xive: Provide silent escalation support
When the 's' bit is set the escalation is said to be 'silent' or
'silent/gather'. In such configuration, the notification sequence is
skipped and only the escalation sequence is performed. This is used to
configure all the EQs of a vCPU to escalate on a single EQ which will
then target the hypervisor.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190718115420.19919-8-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-08-21 17:17:39 +10:00
Cédric Le Goater 53e934921d ppc/xive: Provide unconditional escalation support
When the 'u' bit is set the escalation is said to be 'unconditional'
which means that the ESe PQ bits are not used. Introduce a
xive_router_end_es_notify() routine to share code with the ESn
notification.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190718115420.19919-7-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-08-21 17:17:39 +10:00
Cédric Le Goater 1994d3aa47 ppc/xive: use an abstract type for XiveNotifier
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190718115420.19919-2-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-08-21 17:17:39 +10:00
Michael Roth 0fb6bd0732 spapr: initial implementation for H_TPM_COMM/spapr-tpm-proxy
This implements the H_TPM_COMM hypercall, which is used by an
Ultravisor to pass TPM commands directly to the host's TPM device, or
a TPM Resource Manager associated with the device.

This also introduces a new virtual device, spapr-tpm-proxy, which
is used to configure the host TPM path to be used to service
requests sent by H_TPM_COMM hcalls, for example:

  -device spapr-tpm-proxy,id=tpmp0,host-path=/dev/tpmrm0

By default, no spapr-tpm-proxy will be created, and hcalls will return
H_FUNCTION.

The full specification for this hypercall can be found in
docs/specs/ppc-spapr-uv-hcalls.txt

Since SVM-related hcalls like H_TPM_COMM use a reserved range of
0xEF00-0xEF80, we introduce a separate hcall table here to handle
them.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com
Message-Id: <20190717205842.17827-3-mdroth@linux.vnet.ibm.com>
[dwg: Corrected #include for upstream change]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-08-21 17:17:12 +10:00
Nicholas Piggin 3a6e6224a9 spapr: Implement H_PROD
H_PROD is added, and H_CEDE is modified to test the prod bit
according to PAPR.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20190718034214.14948-3-npiggin@gmail.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-08-21 17:17:12 +10:00
Nicholas Piggin 03ef074c04 spapr: Implement dispatch tracking for tcg
Implement cpu_exec_enter/exit on ppc which calls into new methods of
the same name in PPCVirtualHypervisorClass. These are used by spapr
to implement the splpar VPA dispatch counter initially.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20190718034214.14948-2-npiggin@gmail.com>
[dwg: Removed unnecessary CONFIG_USER_ONLY checks as suggested by gkurz]
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-08-21 17:17:11 +10:00
Markus Armbruster 54d31236b9 sysemu: Split sysemu/runstate.h off sysemu/sysemu.h
sysemu/sysemu.h is a rather unfocused dumping ground for stuff related
to the system-emulator.  Evidence:

* It's included widely: in my "build everything" tree, changing
  sysemu/sysemu.h still triggers a recompile of some 1100 out of 6600
  objects (not counting tests and objects that don't depend on
  qemu/osdep.h, down from 5400 due to the previous two commits).

* It pulls in more than a dozen additional headers.

Split stuff related to run state management into its own header
sysemu/runstate.h.

Touching sysemu/sysemu.h now recompiles some 850 objects.  qemu/uuid.h
also drops from 1100 to 850, and qapi/qapi-types-run-state.h from 4400
to 4200.  Touching new sysemu/runstate.h recompiles some 500 objects.

Since I'm touching MAINTAINERS to add sysemu/runstate.h anyway, also
add qemu/main-loop.h.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190812052359.30071-30-armbru@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
[Unbreak OS-X build]
2019-08-16 13:37:36 +02:00
Markus Armbruster 2f780b6a91 sysemu: Move the VMChangeStateEntry typedef to qemu/typedefs.h
In my "build everything" tree, changing sysemu/sysemu.h triggers a
recompile of some 1800 out of 6600 objects (not counting tests and
objects that don't depend on qemu/osdep.h, down from 5400 due to the
previous commit).

Several headers include sysemu/sysemu.h just to get typedef
VMChangeStateEntry.  Move it from sysemu/sysemu.h to qemu/typedefs.h.
Spell its structure tag the same while there.  Drop the now
superfluous includes of sysemu/sysemu.h from headers.

Touching sysemu/sysemu.h now recompiles some 1100 objects.
qemu/uuid.h also drops from 1800 to 1100, and
qapi/qapi-types-run-state.h from 5000 to 4400.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190812052359.30071-29-armbru@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2019-08-16 13:31:53 +02:00
Markus Armbruster a27bd6c779 Include hw/qdev-properties.h less
In my "build everything" tree, changing hw/qdev-properties.h triggers
a recompile of some 2700 out of 6600 objects (not counting tests and
objects that don't depend on qemu/osdep.h).

Many places including hw/qdev-properties.h (directly or via hw/qdev.h)
actually need only hw/qdev-core.h.  Include hw/qdev-core.h there
instead.

hw/qdev.h is actually pointless: all it does is include hw/qdev-core.h
and hw/qdev-properties.h, which in turn includes hw/qdev-core.h.
Replace the remaining uses of hw/qdev.h by hw/qdev-properties.h.

While there, delete a few superfluous inclusions of hw/qdev-core.h.

Touching hw/qdev-properties.h now recompiles some 1200 objects.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Daniel P. Berrangé" <berrange@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20190812052359.30071-22-armbru@redhat.com>
2019-08-16 13:31:53 +02:00
Markus Armbruster d484205210 Include exec/memory.h slightly less
Drop unnecessary inclusions from headers.  Downgrade a few more to
exec/hwaddr.h.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20190812052359.30071-17-armbru@redhat.com>
2019-08-16 13:31:52 +02:00
Markus Armbruster d645427057 Include migration/vmstate.h less
In my "build everything" tree, changing migration/vmstate.h triggers a
recompile of some 2700 out of 6600 objects (not counting tests and
objects that don't depend on qemu/osdep.h).

hw/hw.h supposedly includes it for convenience.  Several other headers
include it just to get VMStateDescription.  The previous commit made
that unnecessary.

Include migration/vmstate.h only where it's still needed.  Touching it
now recompiles only some 1600 objects.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20190812052359.30071-16-armbru@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2019-08-16 13:31:52 +02:00
Markus Armbruster 6a0acfff99 Clean up inclusion of exec/cpu-common.h
migration/qemu-file.h neglects to include it even though it needs
ram_addr_t.  Fix that.  Drop a few superfluous inclusions elsewhere.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20190812052359.30071-14-armbru@redhat.com>
2019-08-16 13:31:52 +02:00
Markus Armbruster 64552b6be4 Include hw/irq.h a lot less
In my "build everything" tree, changing hw/irq.h triggers a recompile
of some 5400 out of 6600 objects (not counting tests and objects that
don't depend on qemu/osdep.h).

hw/hw.h supposedly includes it for convenience.  Several other headers
include it just to get qemu_irq and.or qemu_irq_handler.

Move the qemu_irq and qemu_irq_handler typedefs from hw/irq.h to
qemu/typedefs.h, and then include hw/irq.h only where it's still
needed.  Touching it now recompiles only some 500 objects.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20190812052359.30071-13-armbru@redhat.com>
2019-08-16 13:31:52 +02:00
Markus Armbruster 2ae16a6aa4 Include generated QAPI headers less
Some of the generated qapi-types-MODULE.h are included all over the
place.  Changing a QAPI type can trigger massive recompiling.  Top
scorers recompile more than 1000 out of some 6600 objects (not
counting tests and objects that don't depend on qemu/osdep.h):

    6300 qapi/qapi-builtin-types.h
    5700 qapi/qapi-types-run-state.h
    3900 qapi/qapi-types-common.h
    3300 qapi/qapi-types-sockets.h
    3000 qapi/qapi-types-misc.h
    3000 qapi/qapi-types-crypto.h
    3000 qapi/qapi-types-job.h
    3000 qapi/qapi-types-block-core.h
    2800 qapi/qapi-types-block.h
    1300 qapi/qapi-types-net.h

Clean up headers to include generated QAPI headers only where needed.
Impact is negligible except for hw/qdev-properties.h.

This header includes qapi/qapi-types-block.h and
qapi/qapi-types-misc.h.  They are used only in expansions of property
definition macros such as DEFINE_PROP_BLOCKDEV_ON_ERROR() and
DEFINE_PROP_OFF_AUTO().  Moving their inclusion from
hw/qdev-properties.h to the users of these macros avoids pointless
recompiles.  This is how other property definition macros, such as
DEFINE_PROP_NETDEV(), already work.

Improves things for some of the top scorers:

    3600 qapi/qapi-types-common.h
    2800 qapi/qapi-types-sockets.h
     900 qapi/qapi-types-misc.h
    2200 qapi/qapi-types-crypto.h
    2100 qapi/qapi-types-job.h
    2100 qapi/qapi-types-block-core.h
     270 qapi/qapi-types-block.h

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20190812052359.30071-3-armbru@redhat.com>
2019-08-16 13:31:51 +02:00
Markus Armbruster ec150c7e09 include: Make headers more self-contained
Back in 2016, we discussed[1] rules for headers, and these were
generally liked:

1. Have a carefully curated header that's included everywhere first.  We
   got that already thanks to Peter: osdep.h.

2. Headers should normally include everything they need beyond osdep.h.
   If exceptions are needed for some reason, they must be documented in
   the header.  If all that's needed from a header is typedefs, put
   those into qemu/typedefs.h instead of including the header.

3. Cyclic inclusion is forbidden.

This patch gets include/ closer to obeying 2.

It's actually extracted from my "[RFC] Baby steps towards saner
headers" series[2], which demonstrates a possible path towards
checking 2 automatically.  It passes the RFC test there.

[1] Message-ID: <87h9g8j57d.fsf@blackfin.pond.sub.org>
    https://lists.nongnu.org/archive/html/qemu-devel/2016-03/msg03345.html
[2] Message-Id: <20190711122827.18970-1-armbru@redhat.com>
    https://lists.nongnu.org/archive/html/qemu-devel/2019-07/msg02715.html

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20190812052359.30071-2-armbru@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2019-08-16 13:31:51 +02:00
Cédric Le Goater 310cda5b5e spapr/xive: Fix migration of hot-plugged CPUs
The migration sequence of a guest using the XIVE exploitation mode
relies on the fact that the states of all devices are restored before
the machine is. This is not true for hot-plug devices such as CPUs
which state come after the machine. This breaks migration because the
thread interrupt context registers are not correctly set.

Fix migration of hotplugged CPUs by restoring their context in the
'post_load' handler of the XiveTCTX model.

Fixes: 277dd3d771 ("spapr/xive: add migration support for KVM")
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190813064853.29310-1-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-08-13 16:50:30 +10:00
Cédric Le Goater d0e9bc0407 spapr/xive: simplify spapr_irq_init_device() to remove the emulated init
The init_emu() handles are now empty. Remove them and rename
spapr_irq_init_device() to spapr_irq_init_kvm().

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190614165920.12670-3-clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-07-02 09:43:58 +10:00
Cédric Le Goater 981b1c6266 spapr/xive: rework the mapping the KVM memory regions
Today, the interrupt device is fully initialized at reset when the CAS
negotiation process has completed. Depending on the KVM capabilities,
the SpaprXive memory regions (ESB, TIMA) are initialized with a host
MMIO backend or a QEMU emulated backend. This results in a complex
initialization sequence partially done at realize and later at reset,
and some memory region leaks.

To simplify this sequence and to remove of the late initialization of
the emulated device which is required to be done only once, we
introduce new memory regions specific for KVM. These regions are
mapped as overlaps on top of the emulated device to make use of the
host MMIOs. Also provide proper cleanups of these regions when the
XIVE KVM device is destroyed to fix the leaks.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190614165920.12670-2-clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-07-02 09:43:58 +10:00
Greg Kurz 330a21e3c4 xics/kvm: Add error propagation to ic*_set_kvm_state() functions
This allows errors happening there to be propagated up to spapr_irq,
just like XIVE already does.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <156077921763.433243.4614327010172954196.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-07-02 09:43:58 +10:00
Greg Kurz eab9f191a0 xics/spapr: Rename xics_kvm_init()
Switch to using the connect/disconnect terminology like we already do for
XIVE.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <156077920102.433243.6605099291134598170.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-07-02 09:43:58 +10:00
Greg Kurz 2fb4c6528e xics/spapr: Drop unused function declaration
Commit 9fb6eb7ca50c added the declaration of xics_spapr_connect(), which
has no implementation and no users.

This is a leftover from a previous iteration of this patch. Drop it.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <156077919546.433243.8748677531446035746.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-07-02 09:43:58 +10:00
Greg Kurz 7abc0c6d35 xics/spapr: Detect old KVM XICS on POWER9 hosts
Older KVMs on POWER9 don't support destroying/recreating a KVM XICS
device, which is required by 'dual' interrupt controller mode. This
causes QEMU to emit a warning when the guest is rebooted and to fall
back on XICS emulation:

qemu-system-ppc64: warning: kernel_irqchip allowed but unavailable:
 Error on KVM_CREATE_DEVICE for XICS: File exists

If kernel irqchip is required, QEMU will thus exit when the guest is
first rebooted. Failing QEMU this late may be a painful experience
for the user.

Detect that and exit at machine init instead.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <156044430517.125694.6207865998817342638.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-07-02 09:43:58 +10:00
Greg Kurz d9293c4843 xics/spapr: Register RTAS/hypercalls once at machine init
QEMU may crash when running a spapr machine in 'dual' interrupt controller
mode on some older (but not that old, eg. ubuntu 18.04.2) KVMs with partial
XIVE support:

qemu-system-ppc64: hw/ppc/spapr_rtas.c:411: spapr_rtas_register:
 Assertion `!name || !rtas_table[token].name' failed.

XICS is controlled by the guest thanks to a set of RTAS calls. Depending
on whether KVM XICS is used or not, the RTAS calls are handled by KVM or
QEMU. In both cases, QEMU needs to expose the RTAS calls to the guest
through the "rtas" node of the device tree.

The spapr_rtas_register() helper takes care of all of that: it adds the
RTAS call token to the "rtas" node and registers a QEMU callback to be
invoked when the guest issues the RTAS call. In the KVM XICS case, QEMU
registers a dummy callback that just prints an error since it isn't
supposed to be invoked, ever.

Historically, the XICS controller was setup during machine init and
released during final teardown. This changed when the 'dual' interrupt
controller mode was added to the spapr machine: in this case we need
to tear the XICS down and set it up again during machine reset. The
crash happens because we indeed have an incompatibility with older
KVMs that forces QEMU to fallback on emulated XICS, which tries to
re-registers the same RTAS calls.

This could be fixed by adding proper rollback that would unregister
RTAS calls on error. But since the emulated RTAS calls in QEMU can
now detect when they are mistakenly called while KVM XICS is in
use, it seems simpler to register them once and for all at machine
init. This fixes the crash and allows to remove some now useless
lines of code.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <156044429963.125694.13710679451927268758.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-07-02 09:43:58 +10:00
Cédric Le Goater c29a0b0fb3 ppc/pnv: remove xscom_base field from PnvChip
It has now became useless with the previous patch.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190612174345.9799-3-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-07-02 09:43:58 +10:00
Cédric Le Goater 709044fd2d ppc/pnv: fix XSCOM MMIO base address for P9 machines with multiple chips
The PNV_XSCOM_BASE and PNV_XSCOM_SIZE macros are specific to POWER8
and they are used when the device tree is populated and the MMIO
region created, even for POWER9 chips. This is not too much of a
problem today because we don't have important devices on the second
chip, but we might have oneday (PHBs).

Fix by using the appropriate macros in case of P9.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190612174345.9799-2-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-07-02 09:43:58 +10:00
Peter Maydell a050901d4b ppc patch queue 2019-06-12
Next pull request against qemu-4.1.  The big thing here is adding
 support for hot plug of P2P bridges, and PCI devices under P2P bridges
 on the "pseries" machine (which doesn't use SHPC).  Other than that
 there's just a handful of fixes and small enhancements.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAl0AkgwACgkQbDjKyiDZ
 s5Jyug//cwxP+t1t2CNHtffKwiXFzuEKx9YSNE1V0wog6aB40EbPKU72FzCq6FfA
 lev+pZWV9AwVMzFYe4VM/7Lqh7WFMYDT3DOXaZwfANs4471vYtgvPi21L2TBj80d
 hMszlyLWMLY9ByOzCxIq3xnbivGpA94G2q9rKbwXdK4T/5i62Pe3SIfgG+gXiiwW
 +YlHWCPX0I1cJz2bBs9ElXdl7ONWnn+7uDf7gNfWkTKuiUq6Ps7mxzy3GhJ1T7nz
 OFKmQ5dKzLJsgOULSSun8kWpXBmnPffkM3+fCE07edrWZVor09fMCk4HvtfaRy2K
 FFa2Kvzn/V/70TL+44dsSX4QcwdcHQztiaMO7UGPq9CMswx5L7gsNmfX6zvK1Nrb
 1t7ORZKNJ72hMyvDPSMiGU2DpVjO3ZbBlSL4/xG8Qeal4An0kgkN5NcFlB/XEfnz
 dsKu9XzuGSeD1bWz1Mgcf1x7lPDBoHIKLcX6notZ8epP/otu4ywNFvAkPu4fk8s0
 4jQGajIT7328SmzpjXClsmiEskpKsEr7hQjPRhu0hFGrhVc+i9PjkmbDl0TYRAf6
 N6k6gJQAi+StJde2rcua1iS7Ra+Tka6QRKy+EctLqfqOKPb2VmkZ6fswQ3nfRRlT
 LgcTHt2iJcLeud2klVXs1e4pKXzXchkVyFL4ucvmyYG5VeimMzU=
 =ERgu
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-4.1-20190612' into staging

ppc patch queue 2019-06-12

Next pull request against qemu-4.1.  The big thing here is adding
support for hot plug of P2P bridges, and PCI devices under P2P bridges
on the "pseries" machine (which doesn't use SHPC).  Other than that
there's just a handful of fixes and small enhancements.

# gpg: Signature made Wed 12 Jun 2019 06:47:56 BST
# gpg:                using RSA key 75F46586AE61A66CC44E87DC6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" [full]
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>" [full]
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" [full]
# gpg:                 aka "David Gibson (kernel.org) <dwg@kernel.org>" [unknown]
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-4.1-20190612:
  ppc/xive: Make XIVE generate the proper interrupt types
  ppc/pnv: activate the "dumpdtb" option on the powernv machine
  target/ppc: Use tcg_gen_gvec_bitsel
  spapr: Allow hot plug/unplug of PCI bridges and devices under PCI bridges
  spapr: Direct all PCI hotplug to host bridge, rather than P2P bridge
  spapr: Don't use bus number for building DRC ids
  spapr: Clean up DRC index construction
  spapr: Clean up spapr_drc_populate_dt()
  spapr: Clean up dt creation for PCI buses
  spapr: Clean up device tree construction for PCI devices
  spapr: Clean up device node name generation for PCI devices
  target/ppc: Fix lxvw4x, lxvh8x and lxvb16x
  spapr_pci: Improve error message

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-06-12 14:43:47 +01:00
Markus Armbruster a8d2532645 Include qemu-common.h exactly where needed
No header includes qemu-common.h after this commit, as prescribed by
qemu-common.h's file comment.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190523143508.25387-5-armbru@redhat.com>
[Rebased with conflicts resolved automatically, except for
include/hw/arm/xlnx-zynqmp.h hw/arm/nrf51_soc.c hw/arm/msf2-soc.c
block/qcow2-refcount.c block/qcow2-cluster.c block/qcow2-cache.c
target/arm/cpu.h target/lm32/cpu.h target/m68k/cpu.h target/mips/cpu.h
target/moxie/cpu.h target/nios2/cpu.h target/openrisc/cpu.h
target/riscv/cpu.h target/tilegx/cpu.h target/tricore/cpu.h
target/unicore32/cpu.h target/xtensa/cpu.h; bsd-user/main.c and
net/tap-bsd.c fixed up]
2019-06-12 13:20:20 +02:00
Benjamin Herrenschmidt 4aca978654 ppc/xive: Make XIVE generate the proper interrupt types
It should be generic Hypervisor Virtualization interrupts for HV
directed rings and traditional External Interrupts for the OS directed
ring.

Don't generate anything for the user ring as it isn't actually
supported.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20190606174409.12502-1-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-06-12 10:41:50 +10:00
David Gibson 9e7d38e8a3 spapr: Clean up spapr_drc_populate_dt()
This makes some minor cleanups to spapr_drc_populate_dt(), renaming it to
the shorter and more idiomatic spapr_dt_drc() along the way.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
2019-06-12 10:41:49 +10:00
Greg Kurz 3725ef1a94 spapr: Don't migrate the hpt_maxpagesize cap to older machine types
Commit 0b8c89be7f7b added the hpt_maxpagesize capability to the migration
stream. This is okay for new machine types but it breaks backward migration
to older QEMUs, which don't expect the extra subsection.

Add a compatibility boolean flag to the sPAPR machine class and use it to
skip migration of the capability for machine types 4.0 and older. This
fixes migration to an older QEMU. Note that the destination will emit a
warning:

qemu-system-ppc64: warning: cap-hpt-max-page-size lower level (16) in incoming stream than on destination (24)

This is expected and harmless though. It is okay to migrate from a lower
HPT maximum page size (64k) to a greater one (16M).

Fixes: 0b8c89be7f7b "spapr: Add forgotten capability to migration stream"
Based-on: <20190522074016.10521-3-clg@kaod.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155853262675.1158324.17301777846476373459.stgit@bahia.lan>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-05-29 11:39:47 +10:00
Cédric Le Goater 3f777abc71 spapr/irq: add KVM support to the 'dual' machine
The interrupt mode is chosen by the CAS negotiation process and
activated after a reset to take into account the required changes in
the machine. This brings new constraints on how the associated KVM IRQ
device is initialized.

Currently, each model takes care of the initialization of the KVM
device in their realize method but this is not possible anymore as the
initialization needs to be done globaly when the interrupt mode is
known, i.e. when machine is reseted. It also means that we need a way
to delete a KVM device when another mode is chosen.

Also, to support migration, the QEMU objects holding the state to
transfer should always be available but not necessarily activated.

The overall approach of this proposal is to initialize both interrupt
mode at the QEMU level to keep the IRQ number space in sync and to
allow switching from one mode to another. For the KVM side of things,
the whole initialization of the KVM device, sources and presenters, is
grouped in a single routine. The XICS and XIVE sPAPR IRQ reset
handlers are modified accordingly to handle the init and the delete
sequences of the KVM device.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20190513084245.25755-15-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-05-29 11:39:46 +10:00
Cédric Le Goater cf435df697 spapr/irq: initialize the IRQ device only once
Add a check to make sure that the routine initializing the emulated
IRQ device is called once. We don't have much to test on the XICS
side, so we introduce a 'init' boolean under ICSState.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190513084245.25755-13-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-05-29 11:39:46 +10:00
Cédric Le Goater ae805ea907 spapr/irq: introduce a spapr_irq_init_device() helper
The way the XICS and the XIVE devices are initialized follows the same
pattern. First, try to connect to the KVM device and if not possible
fallback on the emulated device, unless a kernel_irqchip is required.
The spapr_irq_init_device() routine implements this sequence in
generic way using new sPAPR IRQ handlers ->init_emu() and ->init_kvm().

The XIVE init sequence is moved under the associated sPAPR IRQ
->init() handler. This will change again when KVM support is added for
the dual interrupt mode.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20190513084245.25755-12-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-05-29 11:39:46 +10:00
Cédric Le Goater 56b11587df spapr: introduce routines to delete the KVM IRQ device
If a new interrupt mode is chosen by CAS, the machine generates a
reset to reconfigure. At this point, the connection with the previous
KVM device needs to be closed and a new connection needs to opened
with the KVM device operating the chosen interrupt mode.

New routines are introduced to destroy the XICS and the XIVE KVM
devices. They make use of a new KVM device ioctl which destroys the
device and also disconnects the IRQ presenters from the vCPUs.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20190513084245.25755-10-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-05-29 11:39:46 +10:00
Cédric Le Goater 277dd3d771 spapr/xive: add migration support for KVM
When the VM is stopped, the VM state handler stabilizes the XIVE IC
and marks the EQ pages dirty. These are then transferred to destination
before the transfer of the device vmstates starts.

The SpaprXive interrupt controller model captures the XIVE internal
tables, EAT and ENDT and the XiveTCTX model does the same for the
thread interrupt context registers.

At restart, the SpaprXive 'post_load' method restores all the XIVE
states. It is called by the sPAPR machine 'post_load' method, when all
XIVE states have been transferred and loaded.

Finally, the source states are restored in the VM change state handler
when the machine reaches the running state.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20190513084245.25755-7-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-05-29 11:39:46 +10:00
Cédric Le Goater 9b88cd7673 spapr/xive: introduce a VM state change handler
This handler is in charge of stabilizing the flow of event notifications
in the XIVE controller before migrating a guest. This is a requirement
before transferring the guest EQ pages to a destination.

When the VM is stopped, the handler sets the source PQs to PENDING to
stop the flow of events and to possibly catch a triggered interrupt
occuring while the VM is stopped. Their previous state is saved. The
XIVE controller is then synced through KVM to flush any in-flight
event notification and to stabilize the EQs. At this stage, the EQ
pages are marked dirty to make sure the EQ pages are transferred if a
migration sequence is in progress.

The previous configuration of the sources is restored when the VM
resumes, after a migration or a stop. If an interrupt was queued while
the VM was stopped, the handler simply generates the missing trigger.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20190513084245.25755-6-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-05-29 11:39:46 +10:00
Cédric Le Goater 7bfc759c02 spapr/xive: add state synchronization with KVM
This extends the KVM XIVE device backend with 'synchronize_state'
methods used to retrieve the state from KVM. The HW state of the
sources, the KVM device and the thread interrupt contexts are
collected for the monitor usage and also migration.

These get operations rely on their KVM counterpart in the host kernel
which acts as a proxy for OPAL, the host firmware. The set operations
will be added for migration support later.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190513084245.25755-5-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-05-29 11:39:46 +10:00
Cédric Le Goater 0c575703e4 spapr/xive: add hcall support when under KVM
XIVE hcalls are all redirected to QEMU as none are on a fast path.
When necessary, QEMU invokes KVM through specific ioctls to perform
host operations. QEMU should have done the necessary checks before
calling KVM and, in case of failure, H_HARDWARE is simply returned.

H_INT_ESB is a special case that could have been handled under KVM
but the impact on performance was low when under QEMU. Here are some
figures :

    kernel irqchip      OFF          ON
    H_INT_ESB                    KVM   QEMU

    rtl8139 (LSI )      1.19     1.24  1.23  Gbits/sec
    virtio             31.80    42.30   --   Gbits/sec

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20190513084245.25755-4-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-05-29 11:39:45 +10:00
Cédric Le Goater 38afd772f8 spapr/xive: add KVM support
This introduces a set of helpers when KVM is in use, which create the
KVM XIVE device, initialize the interrupt sources at a KVM level and
connect the interrupt presenters to the vCPU.

They also handle the initialization of the TIMA and the source ESB
memory regions of the controller. These have a different type under
KVM. They are 'ram device' memory mappings, similarly to VFIO, exposed
to the guest and the associated VMAs on the host are populated
dynamically with the appropriate pages using a fault handler.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20190513084245.25755-3-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-05-29 11:39:45 +10:00
David Gibson 64d4a53431 spapr: Add forgotten capability to migration stream
spapr machine capabilities are supposed to be sent in the migration stream
so that we can sanity check the source and destination have compatible
configuration.  Unfortunately, when we added the hpt-max-page-size
capability, we forgot to add it to the migration state.  This means that we
can generate spurious warnings when both ends are configured for large
pages, or potentially fail to warn if the source is configured for huge
pages, but the destination is not.

Fixes: 2309832afd "spapr: Maximum (HPT) pagesize property"

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
2019-05-29 11:39:45 +10:00
Cédric Le Goater 13df93244e spapr/xive: fix EQ page addresses above 64GB
The high order bits of the address of the OS event queue is stored in
bits [4-31] of word2 of the XIVE END internal structures and the low
order bits in word3. This structure is using Big Endian ordering and
computing the value requires some simple arithmetic which happens to
be wrong. The mask removing bits [0-3] of word2 is applied to the
wrong value and the resulting address is bogus when above 64GB.

Guests with more than 64GB of RAM will allocate pages for the OS event
queues which will reside above the 64GB limit. In this case, the XIVE
device model will wake up the CPUs in case of a notification, such as
IPIs, but the update of the event queue will be written at the wrong
place in memory. The result is uncertain as the guest memory is
trashed and IPI are not delivered.

Introduce a helper xive_end_qaddr() to compute this value correctly in
all places where it is used.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190508171946.657-3-clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-05-29 11:39:44 +10:00
Markus Armbruster a8b991b52d Clean up ill-advised or unusual header guards
Leading underscores are ill-advised because such identifiers are
reserved.  Trailing underscores are merely ugly.  Strip both.

Our header guards commonly end in _H.  Normalize the exceptions.

Done with scripts/clean-header-guards.pl.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190315145123.28030-7-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
[Changes to slirp/ dropped, as we're about to spin it off]
2019-05-13 08:58:55 +02:00
Benjamin Herrenschmidt a2dd4e83e7 ppc/hash64: Rework R and C bit updates
With MT-TCG, we are now running translation in a racy way, thus
we need to mimic hardware when it comes to updating the R and
C bits, by doing byte stores.

The current "store_hpte" abstraction is ill suited for this, we
replace it with two separate callbacks for setting R and C.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190411080004.8690-4-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-04-26 11:37:57 +10:00
Cédric Le Goater 64db6c70dc spapr/rtas: modify spapr_rtas_register() to remove RTAS handlers
Removing RTAS handlers will become necessary when the new pseries
machine supporting multiple interrupt mode is introduced.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190321144914.19934-9-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-04-26 10:41:23 +10:00
Alexey Kardashevskiy ec132efaa8 spapr: Support NVIDIA V100 GPU with NVLink2
NVIDIA V100 GPUs have on-board RAM which is mapped into the host memory
space and accessible as normal RAM via an NVLink bus. The VFIO-PCI driver
implements special regions for such GPUs and emulates an NVLink bridge.
NVLink2-enabled POWER9 CPUs also provide address translation services
which includes an ATS shootdown (ATSD) register exported via the NVLink
bridge device.

This adds a quirk to VFIO to map the GPU memory and create an MR;
the new MR is stored in a PCI device as a QOM link. The sPAPR PCI uses
this to get the MR and map it to the system address space.
Another quirk does the same for ATSD.

This adds additional steps to sPAPR PHB setup:

1. Search for specific GPUs and NPUs, collect findings in
sPAPRPHBState::nvgpus, manage system address space mappings;

2. Add device-specific properties such as "ibm,npu", "ibm,gpu",
"memory-block", "link-speed" to advertise the NVLink2 function to
the guest;

3. Add "mmio-atsd" to vPHB to advertise the ATSD capability;

4. Add new memory blocks (with extra "linux,memory-usable" to prevent
the guest OS from accessing the new memory until it is onlined) and
npuphb# nodes representing an NPU unit for every vPHB as the GPU driver
uses it for link discovery.

This allocates space for GPU RAM and ATSD like we do for MMIOs by
adding 2 new parameters to the phb_placement() hook. Older machine types
set these to zero.

This puts new memory nodes in a separate NUMA node to as the GPU RAM
needs to be configured equally distant from any other node in the system.
Unlike the host setup which assigns numa ids from 255 downwards, this
adds new NUMA nodes after the user configures nodes or from 1 if none
were configured.

This adds requirement similar to EEH - one IOMMU group per vPHB.
The reason for this is that ATSD registers belong to a physical NPU
so they cannot invalidate translations on GPUs attached to another NPU.
It is guaranteed by the host platform as it does not mix NVLink bridges
or GPUs from different NPU in the same IOMMU group. If more than one
IOMMU group is detected on a vPHB, this disables ATSD support for that
vPHB and prints a warning.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[aw: for vfio portions]
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Message-Id: <20190312082103.130561-1-aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-04-26 10:41:23 +10:00
David Gibson 0a794529bd spapr: Simplify handling of host-serial and host-model values
27461d69a0 "ppc: add host-serial and host-model machine attributes
(CVE-2019-8934)" introduced 'host-serial' and 'host-model' machine
properties for spapr to explicitly control the values advertised to the
guest in device tree properties with the same names.

The previous behaviour on KVM was to unconditionally populate the device
tree with the real host serial number and model, which leaks possibly
sensitive information about the host to the guest.

To maintain compatibility for old machine types, we allowed those props
to be set to "passthrough" to take the value from the host as before.  Or
they could be set to "none" to explicitly omit the device tree items.

Special casing specific values on what's otherwise a user supplied string
is very ugly.  So, this patch simplifies things by implementing the
backwards compatibility in a different way: we have a machine class flag
set for the older machines, and we only load the host values into the
device tree if A) they're not set by the user and B) we have that flag set.

This does mean that the "passthrough" functionality is no longer available
with the current machine type.  That's ok though: if a user or management
layer really wants the information passed through they can read it
themselves (OpenStack Nova already does something similar for x86).

It also means the user can't explicitly ask for the values to be omitted
on the old machine types.  I think that's an acceptable trade-off: if you
care enough about not leaking the host information you can either move to
the new machine type, or use a dummy value for the properties.

For the new machine type, this also removes an odd inconsistency
between running on a POWER and non-POWER (or non-Linux) hosts: if the
host information couldn't be read from where we expect (in the host's
device tree as exposed by Linux), we'd fallback to omitting the guest
device tree items.

While we're there, improve some poorly worded comments, and the help text
for the properties.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Greg Kurz <groug@kaod.org>
2019-03-29 10:25:50 +11:00
David Gibson ce2918cbc3 spapr: Use CamelCase properly
The qemu coding standard is to use CamelCase for type and structure names,
and the pseries code follows that... sort of.  There are quite a lot of
places where we bend the rules in order to preserve the capitalization of
internal acronyms like "PHB", "TCE", "DIMM" and most commonly "sPAPR".

That was a bad idea - it frequently leads to names ending up with hard to
read clusters of capital letters, and means they don't catch the eye as
type identifiers, which is kind of the point of the CamelCase convention in
the first place.

In short, keeping type identifiers look like CamelCase is more important
than preserving standard capitalization of internal "words".  So, this
patch renames a heap of spapr internal type names to a more standard
CamelCase.

In addition to case changes, we also make some other identifier renames:
  VIOsPAPR* -> SpaprVio*
    The reverse word ordering was only ever used to mitigate the capital
    cluster, so revert to the natural ordering.
  VIOsPAPRVTYDevice -> SpaprVioVty
  VIOsPAPRVLANDevice -> SpaprVioVlan
    Brevity, since the "Device" didn't add useful information
  sPAPRDRConnector -> SpaprDrc
  sPAPRDRConnectorClass -> SpaprDrcClass
    Brevity, and makes it clearer this is the same thing as a "DRC"
    mentioned in many other places in the code

This is 100% a mechanical search-and-replace patch.  It will, however,
conflict with essentially any and all outstanding patches touching the
spapr code.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:05 +11:00
Cédric Le Goater 5dad902ce0 ppc/pnv: POWER9 XSCOM quad support
The POWER9 processor does not support per-core frequency control. The
cores are arranged in groups of four, along with their respective L2
and L3 caches, into a structure known as a Quad. The frequency must be
managed at the Quad level.

Provide a basic Quad model to fake the settings done by the firmware
on the Non-Cacheable Unit (NCU). Each core pair (EX) needs a special
BAR setting for the TIMA area of XIVE because it resides on the same
address on all chips.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190307223548.20516-12-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
Cédric Le Goater 90ef386c74 ppc/pnv: extend XSCOM core support for POWER9
Provide a new class attribute to define XSCOM operations per CPU
family and add a couple of XSCOM addresses controlling the power
management states of the core on POWER9.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190307223548.20516-11-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
Cédric Le Goater 6598a70d00 ppc/pnv: add a OCC model for POWER9
The OCC on POWER9 is very similar to the one found on POWER8. Provide
the same routines with P9 values for the registers and IRQ number.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190307223548.20516-10-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
Cédric Le Goater 3233838cd1 ppc/pnv: add a OCC model class
To ease the introduction of the OCC model for POWER9, provide a new
class attributes to define XSCOM operations per CPU family and a PSI
IRQ number.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20190307223548.20516-9-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
Cédric Le Goater 8207b90604 ppc/pnv: add SerIRQ routing registers
This is just a simple reminder that SerIRQ routing should be
addressed.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190307223548.20516-8-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
Cédric Le Goater 15376c66fa ppc/pnv: add a LPC Controller model for POWER9
The LPC Controller on POWER9 is very similar to the one found on
POWER8 but accesses are now done via on MMIOs, without the XSCOM and
ECCB logic. The device tree is populated differently so we add a
specific POWER9 routine for the purpose.

SerIRQ routing is yet to be done.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190307223548.20516-7-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
Cédric Le Goater 64d011d56e ppc/pnv: add a 'dt_isa_nodename' to the chip
The ISA bus has a different DT nodename on POWER9. Compute the name
when the PnvChip is realized, that is before it is used by the machine
to populate the device tree with the ISA devices.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190307223548.20516-6-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
Cédric Le Goater 82514be28b ppc/pnv: add a LPC Controller class model
It will ease the introduction of the LPC Controller model for POWER9.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20190307223548.20516-5-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
Cédric Le Goater c38536bc80 ppc/pnv: add a PSI bridge model for POWER9
The PSI bridge on POWER9 is very similar to POWER8. The BAR is still
set through XSCOM but the controls are now entirely done with MMIOs.
More interrupts are defined and the interrupt controller interface has
changed to XIVE. The POWER9 model is a first example of the usage of
the notify() handler of the XiveNotifier interface, linking the PSI
XiveSource to its owning device model.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190307223548.20516-3-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
Cédric Le Goater ae85605531 ppc/pnv: add a PSI bridge class model
To ease the introduction of the PSI bridge model for POWER9, abstract
the POWER chip differences in a PnvPsi class model and introduce a
specific Pnv8Psi type for POWER8. POWER8 interface to the interrupt
controller is still XICS whereas POWER9 uses the new XIVE model.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190307223548.20516-2-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
Alexey Kardashevskiy 5f36666722 spapr_iommu: Do not replay mappings from just created DMA window
On sPAPR vfio_listener_region_add() is called in 2 situations:
1. a new listener is registered from vfio_connect_container();
2. a new IOMMU Memory Region is added from rtas_ibm_create_pe_dma_window().

In both cases vfio_listener_region_add() calls
memory_region_iommu_replay() to notify newly registered IOMMU notifiers
about existing mappings which is totally desirable for case 1.

However for case 2 it is nothing but noop as the window has just been
created and has no valid mappings so replaying those does not do anything.
It is barely noticeable with usual guests but if the window happens to be
really big, such no-op replay might take minutes and trigger RCU stall
warnings in the guest.

For example, a upcoming GPU RAM memory region mapped at 64TiB (right
after SPAPR_PCI_LIMIT) causes a 64bit DMA window to be at least 128TiB
which is (128<<40)/0x10000=2.147.483.648 TCEs to replay.

This mitigates the problem by adding an "skipping_replay" flag to
sPAPRTCETable and defining sPAPR own IOMMU MR replay() hook which does
exactly the same thing as the generic one except it returns early if
@skipping_replay==true.

Another way of fixing this would be delaying replay till the very first
H_PUT_TCE but this does not work if in-kernel H_PUT_TCE handler is
enabled (a likely case).

When "ibm,create-pe-dma-window" is complete, the guest will map only
required regions of the huge DMA window.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20190307050518.64968-2-aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
Cédric Le Goater d8e4aad533 ppc/pnv: introduce a new pic_print_info() operation to the chip model
The POWER9 and POWER8 processors have different interrupt controllers,
and reporting their state requires calling different helper routines.

However, the interrupt presenters are still handled in the higher
level pic_print_info() routine because they are not related to the
chip.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190306085032.15744-9-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
Cédric Le Goater eb859a27e1 ppc/pnv: introduce a new dt_populate() operation to the chip model
The POWER9 and POWER8 processors have a different set of devices and a
different device tree layout.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190306085032.15744-8-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
Cédric Le Goater 2dfa91a2aa ppc/pnv: add a XIVE interrupt controller model for POWER9
This is a simple model of the POWER9 XIVE interrupt controller for the
PowerNV machine which only addresses the needs of the skiboot
firmware. The PowerNV model reuses the common XIVE framework developed
for sPAPR as the fundamentals aspects are quite the same. The
difference are outlined below.

The controller initial BAR configuration is performed using the XSCOM
bus from there, MMIO are used for further configuration.

The MMIO regions exposed are :

 - Interrupt controller registers
 - ESB pages for IPIs and ENDs
 - Presenter MMIO (Not used)
 - Thread Interrupt Management Area MMIO, direct and indirect

The virtualization controller MMIO region containing the IPI ESB pages
and END ESB pages is sub-divided into "sets" which map portions of the
VC region to the different ESB pages. These are modeled with custom
address spaces and the XiveSource and XiveENDSource objects are sized
to the maximum allowed by HW. The memory regions are resized at
run-time using the configuration of EDT set translation table provided
by the firmware.

The XIVE virtualization structure tables (EAT, ENDT, NVTT) are now in
the machine RAM and not in the hypervisor anymore. The firmware
(skiboot) configures these tables using Virtual Structure Descriptor
defining the characteristics of each table : SBE, EAS, END and
NVT. These are later used to access the virtual interrupt entries. The
internal cache of these tables in the interrupt controller is updated
and invalidated using a set of registers.

Still to address to complete the model but not fully required is the
support for block grouping. Escalation support will be necessary for
KVM guests.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190306085032.15744-7-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
Cédric Le Goater 956b8f468d ppc/pnv: change the CPU machine_data presenter type to Object *
The POWER9 PowerNV machine will use a XIVE interrupt presenter type.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190306085032.15744-6-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
Cédric Le Goater a58a18adee ppc/pnv: export the xive_router_notify() routine
The PowerNV machine with need to encode the block id in the source
interrupt number before forwarding the source event notification to
the Router.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190306085032.15744-5-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
Cédric Le Goater f9b9db3860 ppc/xive: export the TIMA memory accessors
The PowerNV machine can perform indirect loads and stores on the TIMA
on behalf of another CPU. Give the controller the possibility to call
the TIMA memory accessors with a XiveTCTX of its choice.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190306085032.15744-4-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
Cédric Le Goater 051e2973bf ppc: externalize ppc_get_vcpu_by_pir()
We will use it to get the CPU interrupt presenter in XIVE when the
TIMA is accessed from the indirect page.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190306085032.15744-3-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
David Gibson e075623aa5 spapr: Force SPAPR_MEMORY_BLOCK_SIZE to be a hwaddr (64-bit)
SPAPR_MEMORY_BLOCK_SIZE is logically a difference in memory addresses, and
hence of type hwaddr which is 64-bit.  Previously it wasn't marked as such
which means that it could be treated as 32-bit.  That will work in some
circumstances but if multiplied by another 32-bit value it could lead to
a 32-bit overflow and an incorrect result.

One specific instance of this in spapr_lmb_dt_populate() was spotted by
Coverity (CID 1399145).

Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
Suraj Jitindar Singh 8ff43ee404 target/ppc/spapr: Add SPAPR_CAP_CCF_ASSIST
Introduce a new spapr_cap SPAPR_CAP_CCF_ASSIST to be used to indicate
the requirement for a hw-assisted version of the count cache flush
workaround.

The count cache flush workaround is a software workaround which can be
used to flush the count cache on context switch. Some revisions of
hardware may have a hardware accelerated flush, in which case the
software flush can be shortened. This cap is used to set the
availability of such hardware acceleration for the count cache flush
routine.

The availability of such hardware acceleration is indicated by the
H_CPU_CHAR_BCCTR_FLUSH_ASSIST flag being set in the characteristics
returned from the KVM_PPC_GET_CPU_CHAR ioctl.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Message-Id: <20190301031912.28809-2-sjitindarsingh@gmail.com>
[dwg: Small style fixes]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 12:07:49 +11:00
Suraj Jitindar Singh 399b2896d4 target/ppc/spapr: Add workaround option to SPAPR_CAP_IBS
The spapr_cap SPAPR_CAP_IBS is used to indicate the level of capability
for mitigations for indirect branch speculation. Currently the available
values are broken (default), fixed-ibs (fixed by serialising indirect
branches) and fixed-ccd (fixed by diabling the count cache).

Introduce a new value for this capability denoted workaround, meaning that
software can work around the issue by flushing the count cache on
context switch. This option is available if the hypervisor sets the
H_CPU_BEHAV_FLUSH_COUNT_CACHE flag in the cpu behaviours returned from
the KVM_PPC_GET_CPU_CHAR ioctl.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Message-Id: <20190301031912.28809-1-sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 12:07:49 +11:00
Suraj Jitindar Singh c982f5cf9a target/ppc/spapr: Add SPAPR_CAP_LARGE_DECREMENTER
Add spapr_cap SPAPR_CAP_LARGE_DECREMENTER to be used to control the
availability of the large decrementer for a guest.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Message-Id: <20190301024317.22137-1-sjitindarsingh@gmail.com>
[dwg: Trivial style fix]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 12:07:49 +11:00
Greg Kurz bb2bdd812e spapr: add hotplug hooks for PHB hotplug
Hotplugging PHBs is a machine-level operation, but PHBs reside on the
main system bus, so we register spapr machine as the handler for the
main system bus.

Provide the usual pre-plug, plug and unplug-request handlers.

Move the checking of the PHB index to the pre-plug handler. It is okay
to do that and assert in the realize function because the pre-plug
handler is always called, even for the oldest machine types we support.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
(Fixed interrupt controller phandle in "interrupt-map" and
 TCE table size in "ibm,dma-window" FDT fragment, Greg Kurz)
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155059672926.1466090.13612804072190051439.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Michael Roth 962b6c3650 spapr: create DR connectors for PHBs
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155059670389.1466090.10015601248906623076.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Greg Kurz ad62bff638 spapr_irq: Expose the phandle of the interrupt controller
This will be used by PHB hotplug in order to create the "interrupt-map"
property of the PHB node.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155059669374.1466090.12943228478046223856.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Greg Kurz 743ed566c1 spapr: Expose the name of the interrupt controller node
This will be needed by PHB hotplug in order to access the "phandle"
property of the interrupt controller node.

Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <155059668867.1466090.6339199751719123386.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Greg Kurz 6cead90c5c xics: Write source state to KVM at claim time
The pseries machine only uses LSIs to support legacy PCI devices. Every
PHB claims 4 LSIs at realize time. When using in-kernel XICS (or upcoming
in-kernel XIVE), QEMU synchronizes the state of all irqs, including these
LSIs, later on at machine reset.

In order to support PHB hotplug, we need a way to tell KVM about the LSIs
that doesn't require a machine reset. An easy way to do that is to always
inform KVM when an interrupt is claimed, which really isn't a performance
path.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155059668360.1466090.5969630516627776426.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Greg Kurz 09d876ce2c spapr/drc: Drop spapr_drc_attach() fdt argument
All DRC subtypes have been converted to generate the FDT fragment at
configure connector time instead of attach time. The fdt and fdt_offset
arguments of spapr_drc_attach() aren't needed anymore. Drop them and
make the implementation of the dt_populate() method mandatory.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155059667853.1466090.16527852453054217565.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Greg Kurz 345b12b99e spapr: Generate FDT fragment for CPUs at configure connector time
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155059666839.1466090.3833376527523126752.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Greg Kurz 62d38c9bd3 spapr: Generate FDT fragment for LMBs at configure connector time
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155059666331.1466090.6766540766297333313.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Greg Kurz d9c95c71ac spapr_drc: Allow FDT fragment to be added later
The current logic is to provide the FDT fragment when attaching a device
to a DRC. This works perfectly fine for our current hotplug support, but
soon we will add support for PHB hotplug which has some constraints, that
CPU, PCI and LMB devices don't seem to have.

The first constraint is that the "ibm,dma-window" property of the PHB
node requires the IOMMU to be configured, ie, spapr_tce_table_enable()
has been called, which happens during PHB reset. It is okay in the case
of hotplug since the device is reset before the hotplug handler is
called. On the contrary with coldplug, the hotplug handler is called
first and device is only reset during the initial system reset. Trying
to create the FDT fragment on the hotplug path in this case, would
result in somthing like this:

ibm,dma-window = < 0x80000000 0x00 0x00 0x00 0x00 >;

This will cause linux in the guest to panic, by simply removing and
re-adding the PHB using the drmgr command:

	page = alloc_pages_node(nid, GFP_KERNEL, get_order(sz));
	if (!page)
		panic("iommu_init_table: Can't allocate %ld bytes\n", sz);

The second and maybe more problematic constraint is that the
"interrupt-map" property needs to reference the interrupt controller
node using the very same phandle that SLOF has already exposed to the
guest. QEMU requires SLOF to call the private KVMPPC_H_UPDATE_DT hcall
at some point to know about this phandle. With the latest QEMU and SLOF,
this happens when SLOF gets quiesced. This means that if the PHB gets
hotplugged after CAS but before SLOF quiesce, then we're sure that the
phandle is not known when the hotplug handler is called.

The FDT is only needed when the guest first invokes RTAS to configure
the connector actually, long after SLOF quiesce. Let's postpone the
creation of FDT fragments for PHBs to rtas_ibm_configure_connector().

Since we only need this for PHBs, introduce a new method in the base
DRC class for that. DRC subtypes will be converted to use it in
subsequent patches.

Allow spapr_drc_attach() to be passed a NULL fdt argument if the method
is available. When all DRC subtypes have been converted, the fdt argument
will eventually disappear.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155059665823.1466090.18358845122627355537.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Benjamin Herrenschmidt 00fd075e18 target/ppc/spapr: Set LPCR:HR when using Radix mode
The HW relies on LPCR:HR along with the PATE to determine whether
to use Radix or Hash mode. In fact it uses LPCR:HR more commonly
than the PATE.

For us, it's also more efficient to do so, especially since unlike
the HW we do not maintain a cache of the current PATE and HV PATE
in a generic place.

Prepare the grounds for that by ensuring that LPCR:HR is set
properly on SPAPR machines.

Another option would have been to use a callback to get the PATE
but this gets messy when implementing bare metal support, it's
much simpler (and faster) to use LPCR.

Since existing migration streams may not have it, fix it up in
spapr_post_load() as well based on the pseudo-PATE entry that
we keep.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190215170029.15641-2-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Prasad J Pandit 27461d69a0 ppc: add host-serial and host-model machine attributes (CVE-2019-8934)
On ppc hosts, hypervisor shares following system attributes

  - /proc/device-tree/system-id
  - /proc/device-tree/model

with a guest. This could lead to information leakage and misuse.[*]
Add machine attributes to control such system information exposure
to a guest.

[*] https://wiki.openstack.org/wiki/OSSN/OSSN-0028

Reported-by: Daniel P. Berrangé <berrange@redhat.com>
Fix-suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-Id: <20190218181349.23885-1-ppandit@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Benjamin Herrenschmidt 67afe7759d target/ppc: Add POWER9 external interrupt model
Adds support for the Hypervisor directed interrupts in addition to the
OS ones.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[clg: - modified the icp_realize() and xive_tctx_realize() to take
        into account explicitely the POWER9 interrupt model
      - introduced a specific power9_set_irq for POWER9 ]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190215161648.9600-10-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:24 +11:00
Greg Kurz 3272752a8b xics: Drop the KVM ICS class
The KVM ICS class isn't used anymore. Drop it.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155023084177.1011724.14693955932559990358.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-18 10:52:08 +11:00
Greg Kurz 557b456729 xics: Handle KVM interrupt presentation from "simple" ICS code
We want to use the "simple" ICS type in both KVM and non-KVM setups.
Teach the "simple" ICS how to present interrupts to KVM and adapt
sPAPR accordingly.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155023082996.1011724.16237920586343905010.stgit@bahia.lan>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-18 10:43:19 +11:00
Greg Kurz d80b2ccfa7 xics: Explicitely call KVM ICS methods from the common code
The pre_save(), post_load() and synchronize_state() methods of the
ICSStateClass type are really KVM only things. Make that obvious
by dropping the indirections and directly calling the KVM functions
instead.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155023081817.1011724.14078777320394028836.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-18 10:39:24 +11:00
Greg Kurz 8c1ced677d xics: Drop the KVM ICP class
The KVM ICP class isn't used anymore. Drop it.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155023081228.1011724.12474992370439652538.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-18 10:37:33 +11:00
Greg Kurz 56af66566d spapr/irq: Use the base ICP class for KVM
The base ICP class knows how to interact with KVM. Adapt sPAPR to use it
instead of the ICP KVM class.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155023080638.1011724.792095453419098948.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-18 10:34:56 +11:00
Greg Kurz 8e6e6efef7 xics: Handle KVM ICP realize from the common code
The realization of KVM ICP currently follows the parent_realize logic,
which is a bit overkill here. Also we want to get rid of the KVM ICP
class. Explicitely call icp_kvm_realize() from the base ICP realize
function.

Note that ICPStateClass::parent_realize is retained because powernv
needs it.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155023080049.1011724.15423463482790260696.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-18 10:34:05 +11:00
Greg Kurz d82f397183 xics: Handle KVM ICP reset from the common code
The KVM ICP reset handler simply writes the ICP state to KVM. This
doesn't need the overkill parent_reset logic we have today. Call
icp_set_kvm_state() from the base ICP reset function instead.

Since there are no other users for ICPStateClass::parent_reset, and
it isn't currently expected to change, drop it as well.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155023079461.1011724.12644984391500635645.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-18 10:29:55 +11:00
Greg Kurz 0e5c7fad9c xics: Explicitely call KVM ICP methods from the common code
The pre_save(), post_load() and synchronize_state() methods of the
ICPStateClass type are really KVM only things. Make that obvious
by dropping the indirections and directly calling the KVM functions
instead.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155023078871.1011724.3083923389814185598.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-18 10:14:37 +11:00
Cédric Le Goater 2e66cdb715 spapr/irq: add an 'nr_irq' parameter to initialize the backend.
When using the 'dual' interrupt mode, the source numbers of both sPAPR
IRQ backends are aligned to share a common IRQ number space and to use
a similar mapping of the machine qemu_irq array which is indexed by
the source number.

The XICS IRQ number range initially being [ 0x1000 - 0x2000 ], this
requires to change the XICS ICSState offset to 0 and to provision for
an extra 4K of source numbers and qemu_irqs which will never be used
by the machine when running under the XICS interrupt mode. This is not
an optimal solution.

Change the init() method to allocate an IRQ number space of the
expected size for the XICS sPAPR IRQ backend. It breaks the interrupt
signaling when under the 'dual' mode because source numbers have
unexpected values but next patch will fix that.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190213210756.27032-2-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-17 21:54:02 +11:00