Commit Graph

758 Commits

Author SHA1 Message Date
Markus Armbruster a8b991b52d Clean up ill-advised or unusual header guards
Leading underscores are ill-advised because such identifiers are
reserved.  Trailing underscores are merely ugly.  Strip both.

Our header guards commonly end in _H.  Normalize the exceptions.

Done with scripts/clean-header-guards.pl.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190315145123.28030-7-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
[Changes to slirp/ dropped, as we're about to spin it off]
2019-05-13 08:58:55 +02:00
Benjamin Herrenschmidt a2dd4e83e7 ppc/hash64: Rework R and C bit updates
With MT-TCG, we are now running translation in a racy way, thus
we need to mimic hardware when it comes to updating the R and
C bits, by doing byte stores.

The current "store_hpte" abstraction is ill suited for this, we
replace it with two separate callbacks for setting R and C.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190411080004.8690-4-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-04-26 11:37:57 +10:00
Cédric Le Goater 64db6c70dc spapr/rtas: modify spapr_rtas_register() to remove RTAS handlers
Removing RTAS handlers will become necessary when the new pseries
machine supporting multiple interrupt mode is introduced.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190321144914.19934-9-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-04-26 10:41:23 +10:00
Alexey Kardashevskiy ec132efaa8 spapr: Support NVIDIA V100 GPU with NVLink2
NVIDIA V100 GPUs have on-board RAM which is mapped into the host memory
space and accessible as normal RAM via an NVLink bus. The VFIO-PCI driver
implements special regions for such GPUs and emulates an NVLink bridge.
NVLink2-enabled POWER9 CPUs also provide address translation services
which includes an ATS shootdown (ATSD) register exported via the NVLink
bridge device.

This adds a quirk to VFIO to map the GPU memory and create an MR;
the new MR is stored in a PCI device as a QOM link. The sPAPR PCI uses
this to get the MR and map it to the system address space.
Another quirk does the same for ATSD.

This adds additional steps to sPAPR PHB setup:

1. Search for specific GPUs and NPUs, collect findings in
sPAPRPHBState::nvgpus, manage system address space mappings;

2. Add device-specific properties such as "ibm,npu", "ibm,gpu",
"memory-block", "link-speed" to advertise the NVLink2 function to
the guest;

3. Add "mmio-atsd" to vPHB to advertise the ATSD capability;

4. Add new memory blocks (with extra "linux,memory-usable" to prevent
the guest OS from accessing the new memory until it is onlined) and
npuphb# nodes representing an NPU unit for every vPHB as the GPU driver
uses it for link discovery.

This allocates space for GPU RAM and ATSD like we do for MMIOs by
adding 2 new parameters to the phb_placement() hook. Older machine types
set these to zero.

This puts new memory nodes in a separate NUMA node to as the GPU RAM
needs to be configured equally distant from any other node in the system.
Unlike the host setup which assigns numa ids from 255 downwards, this
adds new NUMA nodes after the user configures nodes or from 1 if none
were configured.

This adds requirement similar to EEH - one IOMMU group per vPHB.
The reason for this is that ATSD registers belong to a physical NPU
so they cannot invalidate translations on GPUs attached to another NPU.
It is guaranteed by the host platform as it does not mix NVLink bridges
or GPUs from different NPU in the same IOMMU group. If more than one
IOMMU group is detected on a vPHB, this disables ATSD support for that
vPHB and prints a warning.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[aw: for vfio portions]
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Message-Id: <20190312082103.130561-1-aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-04-26 10:41:23 +10:00
David Gibson 0a794529bd spapr: Simplify handling of host-serial and host-model values
27461d69a0 "ppc: add host-serial and host-model machine attributes
(CVE-2019-8934)" introduced 'host-serial' and 'host-model' machine
properties for spapr to explicitly control the values advertised to the
guest in device tree properties with the same names.

The previous behaviour on KVM was to unconditionally populate the device
tree with the real host serial number and model, which leaks possibly
sensitive information about the host to the guest.

To maintain compatibility for old machine types, we allowed those props
to be set to "passthrough" to take the value from the host as before.  Or
they could be set to "none" to explicitly omit the device tree items.

Special casing specific values on what's otherwise a user supplied string
is very ugly.  So, this patch simplifies things by implementing the
backwards compatibility in a different way: we have a machine class flag
set for the older machines, and we only load the host values into the
device tree if A) they're not set by the user and B) we have that flag set.

This does mean that the "passthrough" functionality is no longer available
with the current machine type.  That's ok though: if a user or management
layer really wants the information passed through they can read it
themselves (OpenStack Nova already does something similar for x86).

It also means the user can't explicitly ask for the values to be omitted
on the old machine types.  I think that's an acceptable trade-off: if you
care enough about not leaking the host information you can either move to
the new machine type, or use a dummy value for the properties.

For the new machine type, this also removes an odd inconsistency
between running on a POWER and non-POWER (or non-Linux) hosts: if the
host information couldn't be read from where we expect (in the host's
device tree as exposed by Linux), we'd fallback to omitting the guest
device tree items.

While we're there, improve some poorly worded comments, and the help text
for the properties.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Greg Kurz <groug@kaod.org>
2019-03-29 10:25:50 +11:00
David Gibson ce2918cbc3 spapr: Use CamelCase properly
The qemu coding standard is to use CamelCase for type and structure names,
and the pseries code follows that... sort of.  There are quite a lot of
places where we bend the rules in order to preserve the capitalization of
internal acronyms like "PHB", "TCE", "DIMM" and most commonly "sPAPR".

That was a bad idea - it frequently leads to names ending up with hard to
read clusters of capital letters, and means they don't catch the eye as
type identifiers, which is kind of the point of the CamelCase convention in
the first place.

In short, keeping type identifiers look like CamelCase is more important
than preserving standard capitalization of internal "words".  So, this
patch renames a heap of spapr internal type names to a more standard
CamelCase.

In addition to case changes, we also make some other identifier renames:
  VIOsPAPR* -> SpaprVio*
    The reverse word ordering was only ever used to mitigate the capital
    cluster, so revert to the natural ordering.
  VIOsPAPRVTYDevice -> SpaprVioVty
  VIOsPAPRVLANDevice -> SpaprVioVlan
    Brevity, since the "Device" didn't add useful information
  sPAPRDRConnector -> SpaprDrc
  sPAPRDRConnectorClass -> SpaprDrcClass
    Brevity, and makes it clearer this is the same thing as a "DRC"
    mentioned in many other places in the code

This is 100% a mechanical search-and-replace patch.  It will, however,
conflict with essentially any and all outstanding patches touching the
spapr code.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:05 +11:00
Cédric Le Goater 5dad902ce0 ppc/pnv: POWER9 XSCOM quad support
The POWER9 processor does not support per-core frequency control. The
cores are arranged in groups of four, along with their respective L2
and L3 caches, into a structure known as a Quad. The frequency must be
managed at the Quad level.

Provide a basic Quad model to fake the settings done by the firmware
on the Non-Cacheable Unit (NCU). Each core pair (EX) needs a special
BAR setting for the TIMA area of XIVE because it resides on the same
address on all chips.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190307223548.20516-12-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
Cédric Le Goater 90ef386c74 ppc/pnv: extend XSCOM core support for POWER9
Provide a new class attribute to define XSCOM operations per CPU
family and add a couple of XSCOM addresses controlling the power
management states of the core on POWER9.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190307223548.20516-11-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
Cédric Le Goater 6598a70d00 ppc/pnv: add a OCC model for POWER9
The OCC on POWER9 is very similar to the one found on POWER8. Provide
the same routines with P9 values for the registers and IRQ number.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190307223548.20516-10-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
Cédric Le Goater 3233838cd1 ppc/pnv: add a OCC model class
To ease the introduction of the OCC model for POWER9, provide a new
class attributes to define XSCOM operations per CPU family and a PSI
IRQ number.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20190307223548.20516-9-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
Cédric Le Goater 8207b90604 ppc/pnv: add SerIRQ routing registers
This is just a simple reminder that SerIRQ routing should be
addressed.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190307223548.20516-8-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
Cédric Le Goater 15376c66fa ppc/pnv: add a LPC Controller model for POWER9
The LPC Controller on POWER9 is very similar to the one found on
POWER8 but accesses are now done via on MMIOs, without the XSCOM and
ECCB logic. The device tree is populated differently so we add a
specific POWER9 routine for the purpose.

SerIRQ routing is yet to be done.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190307223548.20516-7-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
Cédric Le Goater 64d011d56e ppc/pnv: add a 'dt_isa_nodename' to the chip
The ISA bus has a different DT nodename on POWER9. Compute the name
when the PnvChip is realized, that is before it is used by the machine
to populate the device tree with the ISA devices.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190307223548.20516-6-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
Cédric Le Goater 82514be28b ppc/pnv: add a LPC Controller class model
It will ease the introduction of the LPC Controller model for POWER9.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20190307223548.20516-5-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
Cédric Le Goater c38536bc80 ppc/pnv: add a PSI bridge model for POWER9
The PSI bridge on POWER9 is very similar to POWER8. The BAR is still
set through XSCOM but the controls are now entirely done with MMIOs.
More interrupts are defined and the interrupt controller interface has
changed to XIVE. The POWER9 model is a first example of the usage of
the notify() handler of the XiveNotifier interface, linking the PSI
XiveSource to its owning device model.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190307223548.20516-3-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
Cédric Le Goater ae85605531 ppc/pnv: add a PSI bridge class model
To ease the introduction of the PSI bridge model for POWER9, abstract
the POWER chip differences in a PnvPsi class model and introduce a
specific Pnv8Psi type for POWER8. POWER8 interface to the interrupt
controller is still XICS whereas POWER9 uses the new XIVE model.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190307223548.20516-2-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
Alexey Kardashevskiy 5f36666722 spapr_iommu: Do not replay mappings from just created DMA window
On sPAPR vfio_listener_region_add() is called in 2 situations:
1. a new listener is registered from vfio_connect_container();
2. a new IOMMU Memory Region is added from rtas_ibm_create_pe_dma_window().

In both cases vfio_listener_region_add() calls
memory_region_iommu_replay() to notify newly registered IOMMU notifiers
about existing mappings which is totally desirable for case 1.

However for case 2 it is nothing but noop as the window has just been
created and has no valid mappings so replaying those does not do anything.
It is barely noticeable with usual guests but if the window happens to be
really big, such no-op replay might take minutes and trigger RCU stall
warnings in the guest.

For example, a upcoming GPU RAM memory region mapped at 64TiB (right
after SPAPR_PCI_LIMIT) causes a 64bit DMA window to be at least 128TiB
which is (128<<40)/0x10000=2.147.483.648 TCEs to replay.

This mitigates the problem by adding an "skipping_replay" flag to
sPAPRTCETable and defining sPAPR own IOMMU MR replay() hook which does
exactly the same thing as the generic one except it returns early if
@skipping_replay==true.

Another way of fixing this would be delaying replay till the very first
H_PUT_TCE but this does not work if in-kernel H_PUT_TCE handler is
enabled (a likely case).

When "ibm,create-pe-dma-window" is complete, the guest will map only
required regions of the huge DMA window.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20190307050518.64968-2-aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
Cédric Le Goater d8e4aad533 ppc/pnv: introduce a new pic_print_info() operation to the chip model
The POWER9 and POWER8 processors have different interrupt controllers,
and reporting their state requires calling different helper routines.

However, the interrupt presenters are still handled in the higher
level pic_print_info() routine because they are not related to the
chip.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190306085032.15744-9-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
Cédric Le Goater eb859a27e1 ppc/pnv: introduce a new dt_populate() operation to the chip model
The POWER9 and POWER8 processors have a different set of devices and a
different device tree layout.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190306085032.15744-8-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
Cédric Le Goater 2dfa91a2aa ppc/pnv: add a XIVE interrupt controller model for POWER9
This is a simple model of the POWER9 XIVE interrupt controller for the
PowerNV machine which only addresses the needs of the skiboot
firmware. The PowerNV model reuses the common XIVE framework developed
for sPAPR as the fundamentals aspects are quite the same. The
difference are outlined below.

The controller initial BAR configuration is performed using the XSCOM
bus from there, MMIO are used for further configuration.

The MMIO regions exposed are :

 - Interrupt controller registers
 - ESB pages for IPIs and ENDs
 - Presenter MMIO (Not used)
 - Thread Interrupt Management Area MMIO, direct and indirect

The virtualization controller MMIO region containing the IPI ESB pages
and END ESB pages is sub-divided into "sets" which map portions of the
VC region to the different ESB pages. These are modeled with custom
address spaces and the XiveSource and XiveENDSource objects are sized
to the maximum allowed by HW. The memory regions are resized at
run-time using the configuration of EDT set translation table provided
by the firmware.

The XIVE virtualization structure tables (EAT, ENDT, NVTT) are now in
the machine RAM and not in the hypervisor anymore. The firmware
(skiboot) configures these tables using Virtual Structure Descriptor
defining the characteristics of each table : SBE, EAS, END and
NVT. These are later used to access the virtual interrupt entries. The
internal cache of these tables in the interrupt controller is updated
and invalidated using a set of registers.

Still to address to complete the model but not fully required is the
support for block grouping. Escalation support will be necessary for
KVM guests.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190306085032.15744-7-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
Cédric Le Goater 956b8f468d ppc/pnv: change the CPU machine_data presenter type to Object *
The POWER9 PowerNV machine will use a XIVE interrupt presenter type.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190306085032.15744-6-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
Cédric Le Goater a58a18adee ppc/pnv: export the xive_router_notify() routine
The PowerNV machine with need to encode the block id in the source
interrupt number before forwarding the source event notification to
the Router.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190306085032.15744-5-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
Cédric Le Goater f9b9db3860 ppc/xive: export the TIMA memory accessors
The PowerNV machine can perform indirect loads and stores on the TIMA
on behalf of another CPU. Give the controller the possibility to call
the TIMA memory accessors with a XiveTCTX of its choice.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190306085032.15744-4-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
Cédric Le Goater 051e2973bf ppc: externalize ppc_get_vcpu_by_pir()
We will use it to get the CPU interrupt presenter in XIVE when the
TIMA is accessed from the indirect page.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190306085032.15744-3-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
David Gibson e075623aa5 spapr: Force SPAPR_MEMORY_BLOCK_SIZE to be a hwaddr (64-bit)
SPAPR_MEMORY_BLOCK_SIZE is logically a difference in memory addresses, and
hence of type hwaddr which is 64-bit.  Previously it wasn't marked as such
which means that it could be treated as 32-bit.  That will work in some
circumstances but if multiplied by another 32-bit value it could lead to
a 32-bit overflow and an incorrect result.

One specific instance of this in spapr_lmb_dt_populate() was spotted by
Coverity (CID 1399145).

Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
Suraj Jitindar Singh 8ff43ee404 target/ppc/spapr: Add SPAPR_CAP_CCF_ASSIST
Introduce a new spapr_cap SPAPR_CAP_CCF_ASSIST to be used to indicate
the requirement for a hw-assisted version of the count cache flush
workaround.

The count cache flush workaround is a software workaround which can be
used to flush the count cache on context switch. Some revisions of
hardware may have a hardware accelerated flush, in which case the
software flush can be shortened. This cap is used to set the
availability of such hardware acceleration for the count cache flush
routine.

The availability of such hardware acceleration is indicated by the
H_CPU_CHAR_BCCTR_FLUSH_ASSIST flag being set in the characteristics
returned from the KVM_PPC_GET_CPU_CHAR ioctl.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Message-Id: <20190301031912.28809-2-sjitindarsingh@gmail.com>
[dwg: Small style fixes]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 12:07:49 +11:00
Suraj Jitindar Singh 399b2896d4 target/ppc/spapr: Add workaround option to SPAPR_CAP_IBS
The spapr_cap SPAPR_CAP_IBS is used to indicate the level of capability
for mitigations for indirect branch speculation. Currently the available
values are broken (default), fixed-ibs (fixed by serialising indirect
branches) and fixed-ccd (fixed by diabling the count cache).

Introduce a new value for this capability denoted workaround, meaning that
software can work around the issue by flushing the count cache on
context switch. This option is available if the hypervisor sets the
H_CPU_BEHAV_FLUSH_COUNT_CACHE flag in the cpu behaviours returned from
the KVM_PPC_GET_CPU_CHAR ioctl.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Message-Id: <20190301031912.28809-1-sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 12:07:49 +11:00
Suraj Jitindar Singh c982f5cf9a target/ppc/spapr: Add SPAPR_CAP_LARGE_DECREMENTER
Add spapr_cap SPAPR_CAP_LARGE_DECREMENTER to be used to control the
availability of the large decrementer for a guest.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Message-Id: <20190301024317.22137-1-sjitindarsingh@gmail.com>
[dwg: Trivial style fix]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 12:07:49 +11:00
Greg Kurz bb2bdd812e spapr: add hotplug hooks for PHB hotplug
Hotplugging PHBs is a machine-level operation, but PHBs reside on the
main system bus, so we register spapr machine as the handler for the
main system bus.

Provide the usual pre-plug, plug and unplug-request handlers.

Move the checking of the PHB index to the pre-plug handler. It is okay
to do that and assert in the realize function because the pre-plug
handler is always called, even for the oldest machine types we support.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
(Fixed interrupt controller phandle in "interrupt-map" and
 TCE table size in "ibm,dma-window" FDT fragment, Greg Kurz)
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155059672926.1466090.13612804072190051439.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Michael Roth 962b6c3650 spapr: create DR connectors for PHBs
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155059670389.1466090.10015601248906623076.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Greg Kurz ad62bff638 spapr_irq: Expose the phandle of the interrupt controller
This will be used by PHB hotplug in order to create the "interrupt-map"
property of the PHB node.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155059669374.1466090.12943228478046223856.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Greg Kurz 743ed566c1 spapr: Expose the name of the interrupt controller node
This will be needed by PHB hotplug in order to access the "phandle"
property of the interrupt controller node.

Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <155059668867.1466090.6339199751719123386.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Greg Kurz 6cead90c5c xics: Write source state to KVM at claim time
The pseries machine only uses LSIs to support legacy PCI devices. Every
PHB claims 4 LSIs at realize time. When using in-kernel XICS (or upcoming
in-kernel XIVE), QEMU synchronizes the state of all irqs, including these
LSIs, later on at machine reset.

In order to support PHB hotplug, we need a way to tell KVM about the LSIs
that doesn't require a machine reset. An easy way to do that is to always
inform KVM when an interrupt is claimed, which really isn't a performance
path.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155059668360.1466090.5969630516627776426.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Greg Kurz 09d876ce2c spapr/drc: Drop spapr_drc_attach() fdt argument
All DRC subtypes have been converted to generate the FDT fragment at
configure connector time instead of attach time. The fdt and fdt_offset
arguments of spapr_drc_attach() aren't needed anymore. Drop them and
make the implementation of the dt_populate() method mandatory.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155059667853.1466090.16527852453054217565.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Greg Kurz 345b12b99e spapr: Generate FDT fragment for CPUs at configure connector time
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155059666839.1466090.3833376527523126752.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Greg Kurz 62d38c9bd3 spapr: Generate FDT fragment for LMBs at configure connector time
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155059666331.1466090.6766540766297333313.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Greg Kurz d9c95c71ac spapr_drc: Allow FDT fragment to be added later
The current logic is to provide the FDT fragment when attaching a device
to a DRC. This works perfectly fine for our current hotplug support, but
soon we will add support for PHB hotplug which has some constraints, that
CPU, PCI and LMB devices don't seem to have.

The first constraint is that the "ibm,dma-window" property of the PHB
node requires the IOMMU to be configured, ie, spapr_tce_table_enable()
has been called, which happens during PHB reset. It is okay in the case
of hotplug since the device is reset before the hotplug handler is
called. On the contrary with coldplug, the hotplug handler is called
first and device is only reset during the initial system reset. Trying
to create the FDT fragment on the hotplug path in this case, would
result in somthing like this:

ibm,dma-window = < 0x80000000 0x00 0x00 0x00 0x00 >;

This will cause linux in the guest to panic, by simply removing and
re-adding the PHB using the drmgr command:

	page = alloc_pages_node(nid, GFP_KERNEL, get_order(sz));
	if (!page)
		panic("iommu_init_table: Can't allocate %ld bytes\n", sz);

The second and maybe more problematic constraint is that the
"interrupt-map" property needs to reference the interrupt controller
node using the very same phandle that SLOF has already exposed to the
guest. QEMU requires SLOF to call the private KVMPPC_H_UPDATE_DT hcall
at some point to know about this phandle. With the latest QEMU and SLOF,
this happens when SLOF gets quiesced. This means that if the PHB gets
hotplugged after CAS but before SLOF quiesce, then we're sure that the
phandle is not known when the hotplug handler is called.

The FDT is only needed when the guest first invokes RTAS to configure
the connector actually, long after SLOF quiesce. Let's postpone the
creation of FDT fragments for PHBs to rtas_ibm_configure_connector().

Since we only need this for PHBs, introduce a new method in the base
DRC class for that. DRC subtypes will be converted to use it in
subsequent patches.

Allow spapr_drc_attach() to be passed a NULL fdt argument if the method
is available. When all DRC subtypes have been converted, the fdt argument
will eventually disappear.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155059665823.1466090.18358845122627355537.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Benjamin Herrenschmidt 00fd075e18 target/ppc/spapr: Set LPCR:HR when using Radix mode
The HW relies on LPCR:HR along with the PATE to determine whether
to use Radix or Hash mode. In fact it uses LPCR:HR more commonly
than the PATE.

For us, it's also more efficient to do so, especially since unlike
the HW we do not maintain a cache of the current PATE and HV PATE
in a generic place.

Prepare the grounds for that by ensuring that LPCR:HR is set
properly on SPAPR machines.

Another option would have been to use a callback to get the PATE
but this gets messy when implementing bare metal support, it's
much simpler (and faster) to use LPCR.

Since existing migration streams may not have it, fix it up in
spapr_post_load() as well based on the pseudo-PATE entry that
we keep.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190215170029.15641-2-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Prasad J Pandit 27461d69a0 ppc: add host-serial and host-model machine attributes (CVE-2019-8934)
On ppc hosts, hypervisor shares following system attributes

  - /proc/device-tree/system-id
  - /proc/device-tree/model

with a guest. This could lead to information leakage and misuse.[*]
Add machine attributes to control such system information exposure
to a guest.

[*] https://wiki.openstack.org/wiki/OSSN/OSSN-0028

Reported-by: Daniel P. Berrangé <berrange@redhat.com>
Fix-suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-Id: <20190218181349.23885-1-ppandit@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Benjamin Herrenschmidt 67afe7759d target/ppc: Add POWER9 external interrupt model
Adds support for the Hypervisor directed interrupts in addition to the
OS ones.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[clg: - modified the icp_realize() and xive_tctx_realize() to take
        into account explicitely the POWER9 interrupt model
      - introduced a specific power9_set_irq for POWER9 ]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190215161648.9600-10-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:24 +11:00
Greg Kurz 3272752a8b xics: Drop the KVM ICS class
The KVM ICS class isn't used anymore. Drop it.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155023084177.1011724.14693955932559990358.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-18 10:52:08 +11:00
Greg Kurz 557b456729 xics: Handle KVM interrupt presentation from "simple" ICS code
We want to use the "simple" ICS type in both KVM and non-KVM setups.
Teach the "simple" ICS how to present interrupts to KVM and adapt
sPAPR accordingly.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155023082996.1011724.16237920586343905010.stgit@bahia.lan>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-18 10:43:19 +11:00
Greg Kurz d80b2ccfa7 xics: Explicitely call KVM ICS methods from the common code
The pre_save(), post_load() and synchronize_state() methods of the
ICSStateClass type are really KVM only things. Make that obvious
by dropping the indirections and directly calling the KVM functions
instead.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155023081817.1011724.14078777320394028836.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-18 10:39:24 +11:00
Greg Kurz 8c1ced677d xics: Drop the KVM ICP class
The KVM ICP class isn't used anymore. Drop it.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155023081228.1011724.12474992370439652538.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-18 10:37:33 +11:00
Greg Kurz 56af66566d spapr/irq: Use the base ICP class for KVM
The base ICP class knows how to interact with KVM. Adapt sPAPR to use it
instead of the ICP KVM class.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155023080638.1011724.792095453419098948.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-18 10:34:56 +11:00
Greg Kurz 8e6e6efef7 xics: Handle KVM ICP realize from the common code
The realization of KVM ICP currently follows the parent_realize logic,
which is a bit overkill here. Also we want to get rid of the KVM ICP
class. Explicitely call icp_kvm_realize() from the base ICP realize
function.

Note that ICPStateClass::parent_realize is retained because powernv
needs it.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155023080049.1011724.15423463482790260696.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-18 10:34:05 +11:00
Greg Kurz d82f397183 xics: Handle KVM ICP reset from the common code
The KVM ICP reset handler simply writes the ICP state to KVM. This
doesn't need the overkill parent_reset logic we have today. Call
icp_set_kvm_state() from the base ICP reset function instead.

Since there are no other users for ICPStateClass::parent_reset, and
it isn't currently expected to change, drop it as well.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155023079461.1011724.12644984391500635645.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-18 10:29:55 +11:00
Greg Kurz 0e5c7fad9c xics: Explicitely call KVM ICP methods from the common code
The pre_save(), post_load() and synchronize_state() methods of the
ICPStateClass type are really KVM only things. Make that obvious
by dropping the indirections and directly calling the KVM functions
instead.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155023078871.1011724.3083923389814185598.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-18 10:14:37 +11:00
Cédric Le Goater 2e66cdb715 spapr/irq: add an 'nr_irq' parameter to initialize the backend.
When using the 'dual' interrupt mode, the source numbers of both sPAPR
IRQ backends are aligned to share a common IRQ number space and to use
a similar mapping of the machine qemu_irq array which is indexed by
the source number.

The XICS IRQ number range initially being [ 0x1000 - 0x2000 ], this
requires to change the XICS ICSState offset to 0 and to provision for
an extra 4K of source numbers and qemu_irqs which will never be used
by the machine when running under the XICS interrupt mode. This is not
an optimal solution.

Change the init() method to allocate an IRQ number space of the
expected size for the XICS sPAPR IRQ backend. It breaks the interrupt
signaling when under the 'dual' mode because source numbers have
unexpected values but next patch will fix that.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190213210756.27032-2-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-17 21:54:02 +11:00
Greg Kurz 0afed8c819 xive: Only set source type for LSIs
MSI is the default and LSI specific code is guarded by the
xive_source_irq_is_lsi() helper. The xive_source_irq_set()
helper is a nop for MSIs.

Simplify the code by turning xive_source_irq_set() into
xive_source_irq_set_lsi() and only call it for LSIs. The
call to xive_source_irq_set(false) in spapr_xive_irq_free()
is also a nop. Just drop it.

Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <154999584656.690774.18352404495120358613.stgit@bahia.lan>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-17 21:54:02 +11:00
Greg Kurz 5c7adcf422 spapr: Rename xics to intc in interrupt controller agnostic code
All this code is used with both the XICS and XIVE interrupt controllers.

Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-17 21:54:02 +11:00
Cédric Le Goater a28b9a5a8d spapr: move the interrupt presenters under machine_data
Next step is to remove them from under the PowerPCCPU

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-04 18:44:18 +11:00
Cédric Le Goater 8907fc25cf ppc/pnv: introduce a CPU machine_data
Include the interrupt presenter under the machine_data as we plan to
remove it from under PowerPCCPU

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-04 18:44:18 +11:00
Cédric Le Goater 40a5056c41 xive: add a get_tctx() method to the XiveRouter
It provides a mean to retrieve the XiveTCTX of a CPU. This will become
necessary with future changes which move the interrupt presenter
object pointers under the PowerPCCPU machine_data.

The PowerNV machine has an extra requirement on TIMA accesses that
this new method addresses. The machine can perform indirect loads and
stores on the TIMA on behalf of another CPU. The PIR being defined in
the controller registers, we need a way to peek in the controller
model to find the PIR value.

The XiveTCTX is moved above the XiveRouter definition to avoid forward
typedef declarations.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-04 18:44:18 +11:00
Cédric Le Goater 6bf6f3a1d1 ppc/xive: fix remaining XiveFabric names
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-04 18:44:17 +11:00
BALATON Zoltan 7d8ccf58d5 ppc4xx: Use ram_addr_t in ppc4xx_sdram_adjust()
To avoid overflow if larger values are added later use ram_addr_t for
the sdram_bank_sizes parameter to match ram_size to which it is compared.

Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-04 18:44:17 +11:00
Thomas Huth 0d8d6a24fc ppc: Fix duplicated typedefs to be able to compile with Clang in gnu99 mode
When compiling the ppc code with clang and -std=gnu99, there are a
couple of warnings/errors like this one:

  CC      ppc64-softmmu/hw/intc/xics.o
In file included from hw/intc/xics.c:35:
include/hw/ppc/xics.h:43:25: error: redefinition of typedef 'ICPState' is a C11 feature
      [-Werror,-Wtypedef-redefinition]
typedef struct ICPState ICPState;
                        ^
target/ppc/cpu.h:1181:25: note: previous definition is here
typedef struct ICPState ICPState;
                        ^
Work around the problems by including the proper headers in spapr.h
and by using struct forward declarations in cpu.h.

Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2019-01-22 05:14:33 +01:00
Thomas Huth a51d5afc69 ppc: Move spapr-related prototypes from xics.h into a seperate header file
When compiling with Clang in -std=gnu99 mode, there is a warning/error:

  CC      ppc64-softmmu/hw/intc/xics_spapr.o
In file included from /home/thuth/devel/qemu/hw/intc/xics_spapr.c:34:
/home/thuth/devel/qemu/include/hw/ppc/xics.h:203:34: error: redefinition of typedef 'sPAPRMachineState' is a C11 feature
      [-Werror,-Wtypedef-redefinition]
typedef struct sPAPRMachineState sPAPRMachineState;
                                 ^
/home/thuth/devel/qemu/include/hw/ppc/spapr_irq.h:25:34: note: previous definition is here
typedef struct sPAPRMachineState sPAPRMachineState;
                                 ^

We have to remove the duplicated typedef here and include "spapr.h" instead.
But "spapr.h" should not be included for the pnv machine files. So move
the spapr-related prototypes into a new file called "xics_spapr.h" instead.

Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2019-01-22 05:14:33 +01:00
Cédric Le Goater 3a8eb78e6c spapr: enable XIVE MMIOs at reset
Depending on the interrupt mode of the machine, enable or disable the
XIVE MMIOs.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-01-09 09:28:14 +11:00
Cédric Le Goater 13db0cd9b8 spapr: introduce a new sPAPR IRQ backend supporting XIVE and XICS
The 'dual' sPAPR IRQ backend supports both interrupt mode, XIVE
exploitation mode and the legacy compatibility mode (XICS). both modes
are not supported at the same time.

The machine starts with the legacy mode and a new interrupt mode can
then be negotiated by the CAS process. In this case, the new mode is
activated after a reset to take into account the required changes in
the machine. These impact the device tree layout, the interrupt
presenter object and the exposed MMIO regions in the case of XIVE.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-01-09 09:28:14 +11:00
Cédric Le Goater 72c1e5a66a ppc/xics: allow ICSState to have an offset 0
commit 15ed653fa4 ("ppc/xics: An ICS with offset 0 is assumed to be
uninitialized") introduced an extra check on the ICS offset which is
not strictly necessary.

Revert the change to be able to map the XICS IRQ number space on the
XIVE IRQ number space.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-01-09 09:28:14 +11:00
Cédric Le Goater 872ff3dea3 spapr: move the qemu_irq array under the machine
The qemu_irq array is now allocated at the machine level using a sPAPR
IRQ set_irq handler depending on the chosen interrupt mode. The use of
this handler is slightly inefficient today but it will become necessary
when the 'dual' interrupt mode is introduced.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-01-09 09:28:14 +11:00
Cédric Le Goater f8df900316 pnv/psi: move the ICSState qemu_irq array under the PSI device model
Future changes of the ICSState object will remove the qemu_irq array
from under the interrupt controller model. Prepare ground for the PSI
interrupt sources and introduce a new one directly under the PSI
device model.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-01-09 09:28:14 +11:00
Cédric Le Goater 734d9c8905 ppc: export the XICS and XIVE set_irq handlers
To support the 'dual' interrupt mode, XICS and XIVE, we plan to move
the qemu_irq array of each interrupt controller under the machine and
do the allocation under the sPAPR IRQ init method.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-01-09 09:28:14 +11:00
Cédric Le Goater 8fa1f4ef38 spapr: modify the prototype of the cpu_intc_create() method
Today, the interrupt presenter is linked to a CPU using the
cpu_intc_create() method of the sPAPR IRQ backend. The resulting
object is assigned to the PowerPCCPU 'intc' pointer whatever the
interrupt mode, XICS or XIVE.

To support the 'dual' interrupt mode, we will need to distinguish
between the two presenter objects and for that, we plan to introduce a
second interrupt presenter object pointer under the PowerPCCPU. The
modifications below move the assignment of the presenter object under
the cpu_intc_create() method to prepare ground for the future changes.

Both sPAPR and PowerNV machines are impacted.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-01-09 09:28:14 +11:00
Cédric Le Goater a0c493ae67 spapr/xive: simplify the sPAPR IRQ qirq method for XIVE
The qirq routines of the XiveSource and the sPAPRXive model are only
used under the sPAPR IRQ backend. Simplify the overall call stack and
gather all the code under spapr_qirq_xive(). It will ease future
changes.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-01-09 09:28:14 +11:00
Alexey Kardashevskiy fea35ca4b8 ppc/spapr: Receive and store device tree blob from SLOF
SLOF receives a device tree and updates it with various properties
before switching to the guest kernel and QEMU is not aware of any changes
made by SLOF. Since there is no real RTAS (QEMU implements it), it makes
sense to pass the SLOF final device tree to QEMU to let it implement
RTAS related tasks better, such as PCI host bus adapter hotplug.

Specifially, now QEMU can find out the actual XICS phandle (for PHB
hotplug) and the RTAS linux,rtas-entry/base properties (for firmware
assisted NMI - FWNMI).

This stores the initial DT blob in the sPAPR machine and replaces it
in the KVMPPC_H_UPDATE_DT (new private hypercall) handler.

This adds an @update_dt_enabled machine property to allow backward
migration.

SLOF already has a hypercall since
https://github.com/aik/SLOF/commit/e6fc84652c9c0073f9183

This makes use of the new fdt_check_full() helper. In order to allow
the configure script to pick the correct DTC version, this adjusts
the DTC presense test.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-01-09 09:28:13 +11:00
Laurent Vivier c24ba3d0a3 spapr: Add H-Call H_HOME_NODE_ASSOCIATIVITY
H_HOME_NODE_ASSOCIATIVITY H-Call returns the associativity domain
designation associated with the identifier input parameter

This fixes a crash when we try to hotplug a CPU in memory-less and
CPU-less numa node. In this case, the kernel tries to online the
node, but without the information provided by this h-call, the node id,
it cannot and the CPU is started while the node is not onlined.

It also removes the warning message from the kernel:
  VPHN is not supported. Disabling polling..

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-01-09 09:28:13 +11:00
Cédric Le Goater 3ba3d0bc33 spapr: introduce an 'ic-mode' machine option
This option is used to select the interrupt controller mode (XICS or
XIVE) with which the machine will operate. XICS being the default
mode for now.

When running a machine with the XIVE interrupt mode backend, the guest
OS is required to have support for the XIVE exploitation mode. In the
case of legacy OS, the mode selected by CAS should be XICS and the OS
should fail to boot. However, QEMU could possibly detect it, terminate
the boot process and reset to stop in the SLOF firmware. This is not
yet handled.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21 09:40:43 +11:00
Cédric Le Goater db592b5b16 spapr: add an extra OV5 field to the sPAPR IRQ backend
The interrupt modes supported by the hypervisor are advertised to the
guest with new bits definitions of the option vector 5 of property
"ibm,arch-vec-5-platform-support. The byte 23 bits 0-1 of the OV5 are
defined as follow :

  0b00   PAPR 2.7 and earlier (Legacy systems)
  0b01   XIVE Exploitation mode only
  0b10   Either available

If the client/guest selects the XIVE interrupt mode, it informs the
hypervisor by returning the value 0b01 in byte 23 bits 0-1. A 0b00
value indicates the use of the XICS interrupt mode (Legacy systems).

The sPAPR IRQ backend is extended with these definitions and the
values are directly used to populate the "ibm,arch-vec-5-platform-support"
property. The interrupt mode is advertised under TCG and under KVM.
Although a KVM XIVE device is not yet available, the machine can still
operate with kernel_irqchip=off. However, we apply a restriction on
the CPU which is required to be a POWER9 when a XIVE interrupt
controller is in use.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21 09:40:43 +11:00
Cédric Le Goater b2e2247716 spapr: add a 'reset' method to the sPAPR IRQ backend
For the time being, the XIVE reset handler updates the OS CAM line of
the vCPU as it is done under a real hypervisor when a vCPU is
scheduled to run on a HW thread. This will let the XIVE presenter
engine find a match among the NVTs dispatched on the HW threads.

This handler will become even more useful when we introduce the
machine supporting both interrupt modes, XIVE and XICS. In this
machine, the interrupt mode is chosen by the CAS negotiation process
and activated after a reset.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
[dwg: Fix style nits]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21 09:40:35 +11:00
Cédric Le Goater 1c53b06c03 spapr: extend the sPAPR IRQ backend for XICS migration
Introduce a new sPAPR IRQ handler to handle resend after migration
when the machine is using a KVM XICS interrupt controller model.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21 09:39:13 +11:00
Cédric Le Goater 1a937ad7e7 spapr: allocate the interrupt thread context under the CPU core
Each interrupt mode has its own specific interrupt presenter object,
that we store under the CPU object, one for XICS and one for XIVE.

Extend the sPAPR IRQ backend with a new handler to support them both.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21 09:39:13 +11:00
Cédric Le Goater 6e21de4a50 spapr: add device tree support for the XIVE exploitation mode
The XIVE interface for the guest is described in the device tree under
the "interrupt-controller" node. A couple of new properties are
specific to XIVE :

 - "reg"

   contains the base address and size of the thread interrupt
   managnement areas (TIMA), for the User level and for the Guest OS
   level. Only the Guest OS level is taken into account today.

 - "ibm,xive-eq-sizes"

   the size of the event queues. One cell per size supported, contains
   log2 of size, in ascending order.

 - "ibm,xive-lisn-ranges"

   the IRQ interrupt number ranges assigned to the guest for the IPIs.

and also under the root node :

 - "ibm,plat-res-int-priorities"

   contains a list of priorities that the hypervisor has reserved for
   its own use. OPAL uses the priority 7 queue to automatically
   escalate interrupts for all other queues (DD2.X POWER9). So only
   priorities [0..6] are allowed for the guest.

Extend the sPAPR IRQ backend with a new handler to populate the DT
with the appropriate "interrupt-controller" node.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
[dwg: Fix style nits]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21 09:39:07 +11:00
Cédric Le Goater 23bcd5eb9a spapr: add hcalls support for the XIVE exploitation interrupt mode
The different XIVE virtualization structures (sources and event queues)
are configured with a set of Hypervisor calls :

 - H_INT_GET_SOURCE_INFO

   used to obtain the address of the MMIO page of the Event State
   Buffer (ESB) entry associated with the source.

 - H_INT_SET_SOURCE_CONFIG

   assigns a source to a "target".

 - H_INT_GET_SOURCE_CONFIG

   determines which "target" and "priority" is assigned to a source

 - H_INT_GET_QUEUE_INFO

   returns the address of the notification management page associated
   with the specified "target" and "priority".

 - H_INT_SET_QUEUE_CONFIG

   sets or resets the event queue for a given "target" and "priority".
   It is also used to set the notification configuration associated
   with the queue, only unconditional notification is supported for
   the moment. Reset is performed with a queue size of 0 and queueing
   is disabled in that case.

 - H_INT_GET_QUEUE_CONFIG

   returns the queue settings for a given "target" and "priority".

 - H_INT_RESET

   resets all of the guest's internal interrupt structures to their
   initial state, losing all configuration set via the hcalls
   H_INT_SET_SOURCE_CONFIG and H_INT_SET_QUEUE_CONFIG.

 - H_INT_SYNC

   issue a synchronisation on a source to make sure all notifications
   have reached their queue.

Calls that still need to be addressed :

   H_INT_SET_OS_REPORTING_LINE
   H_INT_GET_OS_REPORTING_LINE

See the code for more documentation on each hcall.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
[dwg: Folded in fix for field accessors]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21 09:37:38 +11:00
Cédric Le Goater dcc345b61e spapr: introduce a new machine IRQ backend for XIVE
The XIVE IRQ backend uses the same layout as the new XICS backend but
covers the full range of the IRQ number space. The IRQ numbers for the
CPU IPIs are allocated at the bottom of this space, below 4K, to
preserve compatibility with XICS which does not use that range.

This should be enough given that the maximum number of CPUs is 1024
for the sPAPR machine under QEMU. For the record, the biggest POWER8
or POWER9 system has a maximum of 1536 HW threads (16 sockets, 192
cores, SMT8).

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21 09:37:38 +11:00
Cédric Le Goater 3aa597f650 spapr/xive: introduce a XIVE interrupt controller
sPAPRXive models the XIVE interrupt controller of the sPAPR machine.
It inherits from the XiveRouter and provisions storage for the routing
tables :

  - Event Assignment Structure (EAS)
  - Event Notification Descriptor (END)

The sPAPRXive model incorporates an internal XiveSource for the IPIs
and for the interrupts of the virtual devices of the guest. This model
is consistent with XIVE architecture which also incorporates an
internal IVSE for IPIs and accelerator interrupts in the IVRE
sub-engine.

The sPAPRXive model exports two memory regions, one for the ESB
trigger and management pages used to control the sources and one for
the TIMA pages. They are mapped by default at the addresses found on
chip 0 of a baremetal system. This is also consistent with the XIVE
architecture which defines a Virtualization Controller BAR for the
internal IVSE ESB pages and a Thread Managment BAR for the TIMA.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
[dwg: Fold in field accessor fixes]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21 09:37:38 +11:00
Cédric Le Goater af53dbf622 ppc/xive: introduce a simplified XIVE presenter
The last sub-engine of the XIVE architecture is the Interrupt
Virtualization Presentation Engine (IVPE). On HW, the IVRE and the
IVPE share elements, the Power Bus interface (CQ), the routing table
descriptors, and they can be combined in the same HW logic. We do the
same in QEMU and combine both engines in the XiveRouter for
simplicity.

When the IVRE has completed its job of matching an event source with a
Notification Virtual Target (NVT) to notify, it forwards the event
notification to the IVPE sub-engine. The IVPE scans the thread
interrupt contexts of the Notification Virtual Targets (NVT)
dispatched on the HW processor threads and if a match is found, it
signals the thread. If not, the IVPE escalates the notification to
some other targets and records the notification in a backlog queue.

The IVPE maintains the thread interrupt context state for each of its
NVTs not dispatched on HW processor threads in the Notification
Virtual Target table (NVTT).

The model currently only supports single NVT notifications.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
[dwg: Folded in fix for field accessors]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21 09:37:04 +11:00
Cédric Le Goater 207d9fe985 ppc/xive: introduce the XIVE interrupt thread context
Each POWER9 processor chip has a XIVE presenter that can generate four
different exceptions to its threads:

  - hypervisor exception,
  - O/S exception
  - Event-Based Branch (EBB)
  - msgsnd (doorbell).

Each exception has a state independent from the others called a Thread
Interrupt Management context. This context is a set of registers which
lets the thread handle priority management and interrupt acknowledgment
among other things. The most important ones being :

  - Interrupt Priority Register  (PIPR)
  - Interrupt Pending Buffer     (IPB)
  - Current Processor Priority   (CPPR)
  - Notification Source Register (NSR)

These registers are accessible through a specific MMIO region, called
the Thread Interrupt Management Area (TIMA), four aligned pages, each
exposing a different view of the registers. First page (page address
ending in 0b00) gives access to the entire context and is reserved for
the ring 0 view for the physical thread context. The second (page
address ending in 0b01) is for the hypervisor, ring 1 view. The third
(page address ending in 0b10) is for the operating system, ring 2
view. The fourth (page address ending in 0b11) is for user level, ring
3 view.

The thread interrupt context is modeled with a XiveTCTX object
containing the values of the different exception registers. The TIMA
region is mapped at the same address for each CPU.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21 09:29:12 +11:00
Cédric Le Goater 002686be42 ppc/xive: add support for the END Event State Buffers
The Event Notification Descriptor (END) XIVE structure also contains
two Event State Buffers providing further coalescing of interrupts,
one for the notification event (ESn) and one for the escalation events
(ESe). A MMIO page is assigned for each to control the EOI through
loads only. Stores are not allowed.

The END ESBs are modeled through an object resembling the 'XiveSource'
It is stateless as the END state bits are backed into the XiveEND
structure under the XiveRouter and the MMIO accesses follow the same
rules as for the XiveSource ESBs.

END ESBs are not supported by the Linux drivers neither on OPAL nor on
sPAPR. Nevetherless, it provides a mean to study the question in the
future and validates a bit more the XIVE model.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
[dwg: Fold in a later fix for field access]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21 09:29:12 +11:00
Cédric Le Goater 1a518e7693 spapr: export and rename the xics_max_server_number() routine
The XIVE sPAPR IRQ backend will use it to define the number of ENDs of
the IC controller.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21 09:29:10 +11:00
Cédric Le Goater fab397d84a spapr: introduce a spapr_irq_init() routine
Initialize the MSI bitmap from it as this will be necessary for the
sPAPR IRQ backend for XIVE.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21 09:28:47 +11:00
Cédric Le Goater e4ddaac67f ppc/xive: introduce the XIVE Event Notification Descriptors
To complete the event routing, the IVRE sub-engine uses a second table
containing Event Notification Descriptor (END) structures.

An END specifies on which Event Queue (EQ) the event notification
data, defined in the associated EAS, should be posted when an
exception occurs. It also defines which Notification Virtual Target
(NVT) should be notified.

The Event Queue is a memory page provided by the O/S defining a
circular buffer, one per server and priority couple, containing Event
Queue entries. These are 4 bytes long, the first bit being a
'generation' bit and the 31 following bits the END Data field. They
are pulled by the O/S when the exception occurs.

The END Data field is a way to set an invariant logical event source
number for an IRQ. On sPAPR machines, it is set with the
H_INT_SET_SOURCE_CONFIG hcall when the EISN flag is used.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
[dwg: Fold in a later fix from Cédric fixing field accessors]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21 09:26:42 +11:00
Cédric Le Goater 7ff7ea9280 ppc/xive: introduce the XiveRouter model
The XiveRouter models the second sub-engine of the XIVE architecture :
the Interrupt Virtualization Routing Engine (IVRE).

The IVRE handles event notifications of the IVSE and performs the
interrupt routing process. For this purpose, it uses a set of tables
stored in system memory, the first of which being the Event Assignment
Structure (EAS) table.

The EAT associates an interrupt source number with an Event Notification
Descriptor (END) which will be used in a second phase of the routing
process to identify a Notification Virtual Target.

The XiveRouter is an abstract class which needs to be inherited from
to define a storage for the EAT, and other upcoming tables.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
[dwg: Folded in parts of a later fix by Cédric fixing field access]
[dwg: Fix style nits]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21 09:26:31 +11:00
Cédric Le Goater 5e79b155a8 ppc/xive: introduce the XiveNotifier interface
The XiveNotifier offers a simple interface, between the XiveSource
object and the main interrupt controller of the machine. It will
forward event notifications to the XIVE Interrupt Virtualization
Routing Engine (IVRE).

Signed-off-by: Cédric Le Goater <clg@kaod.org>
[dwg: Adjust type name string for XiveNotifier]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21 09:24:23 +11:00
Cédric Le Goater 5fd9ef18a9 ppc/xive: add support for the LSI interrupt sources
The 'sent' status of the LSI interrupt source is modeled with the 'P'
bit of the ESB and the assertion status of the source is maintained
with an extra bit under the main XiveSource object. The type of the
source is stored in the same array for practical reasons.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
[dwg: Fix style nit]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21 09:24:23 +11:00
Cédric Le Goater 02e3ff548d ppc/xive: introduce a XIVE interrupt source model
The first sub-engine of the overall XIVE architecture is the Interrupt
Virtualization Source Engine (IVSE). An IVSE can be integrated into
another logic, like in a PCI PHB or in the main interrupt controller
to manage IPIs.

Each IVSE instance is associated with an Event State Buffer (ESB) that
contains a two bit state entry for each possible event source. When an
event is signaled to the IVSE, by MMIO or some other means, the
associated interrupt state bits are fetched from the ESB and
modified. Depending on the resulting ESB state, the event is forwarded
to the IVRE sub-engine of the controller doing the routing.

Each supported ESB entry is associated with either a single or a
even/odd pair of pages which provides commands to manage the source:
to EOI, to turn off the source for instance.

On a sPAPR machine, the O/S will obtain the page address of the ESB
entry associated with a source and its characteristic using the
H_INT_GET_SOURCE_INFO hcall. On PowerNV, a similar OPAL call is used.

The xive_source_notify() routine is in charge forwarding the source
event notification to the routing engine. It will be filled later on.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21 09:24:23 +11:00
Greg Kurz 9929301ee1 mac_newworld: simplify IRQ wiring
The OpenPIC have 5 outputs per connected CPU. The machine init code hence
needs a bi-dimensional array (smp_cpu lines, 5 columns) to wire up the irqs
between the PIC and the CPUs.

The current code first allocates an array of smp_cpus pointers to qemu_irq
type, then it allocates another array of smp_cpus * 5 qemu_irq and fills the
first array with pointers to each line of the second array. This is rather
convoluted.

Simplify the logic by introducing a structured type that describes all the
OpenPIC outputs for a single CPU, ie, fixed size of 5 qemu_irq, and only
allocate a smp_cpu sized array of those.

This also allows to use g_new(T, n) instead of g_malloc(sizeof(T) * n)
as recommended in HACKING.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21 09:24:23 +11:00
Suraj Jitindar Singh b9a477b725 ppc/spapr_caps: Add SPAPR_CAP_NESTED_KVM_HV
Add the spapr cap SPAPR_CAP_NESTED_KVM_HV to be used to control the
availability of nested kvm-hv to the level 1 (L1) guest.

Assuming a hypervisor with support enabled an L1 guest can be allowed to
use the kvm-hv module (and thus run it's own kvm-hv guests) by setting:
-machine pseries,cap-nested-hv=true
or disabled with:
-machine pseries,cap-nested-hv=false

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-11-08 13:08:35 +11:00
Thomas Huth 0e947a89ce hw/ppc/spapr_rng: Introduce CONFIG_SPAPR_RNG switch for spapr_rng.c
The spapr-rng device is suboptimal when compared to virtio-rng, so
users might want to disable it in their builds. Thus let's introduce
a proper CONFIG switch to allow us to compile QEMU without this device.
The function spapr_rng_populate_dt is required for linking, so move it
to a different location.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-11-08 12:04:40 +11:00
Cédric Le Goater ae83740237 spapr: increase the size of the IRQ number space
The new layout using static IRQ number does not leave much space to
the dynamic MSI range, only 0x100 IRQ numbers. Increase the total
number of IRQS for newer machines and introduce a legacy XICS backend
for pre-3.1 machines to maintain compatibility.

For the old backend, provide a 'nr_msis' value covering the full IRQ
number space as it does not use the bitmap allocator to allocate MSI
interrupt numbers.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-09-25 11:12:25 +10:00
Cédric Le Goater e39de895f6 spapr: introduce a spapr_irq class 'nr_msis' attribute
The number of MSI interrupts a sPAPR machine can allocate is in direct
relation with the number of interrupts of the sPAPRIrq backend. Define
statically this value at the sPAPRIrq class level and use it for the
"ibm,pe-total-#msi" property of the sPAPR PHB.

According to the PAPR specs, "ibm,pe-total-#msi" defines the maximum
number of MSIs that are available to the PE. We choose to advertise
the maximum number of MSIs that are available to the machine for
simplicity of the model and to avoid segmenting the MSI interrupt pool
which can be easily shared. If the pool limit is reached, it can be
extended dynamically.

Finally, remove XICS_IRQS_SPAPR which is now unused.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-09-25 11:12:25 +10:00
Cédric Le Goater ef01ed9d19 spapr: introduce a IRQ controller backend to the machine
This proposal moves all the related IRQ routines of the sPAPR machine
behind a sPAPR IRQ backend interface 'spapr_irq' to prepare for future
changes. First of which will be to increase the size of the IRQ number
space, then, will follow a new backend for the POWER9 XIVE IRQ controller.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-08-21 14:28:45 +10:00
Cédric Le Goater 82cffa2eb2 spapr: introduce a fixed IRQ number space
This proposal introduces a new IRQ number space layout using static
numbers for all devices, depending on a device index, and a bitmap
allocator for the MSI IRQ numbers which are negotiated by the guest at
runtime.

As the VIO device model does not have a device index but a "reg"
property, we introduce a formula to compute an IRQ number from a "reg"
value. It should minimize most of the collisions.

The previous layout is kept in pre-3.1 machines raising the
'legacy_irq_allocation' machine class flag.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-08-21 14:28:45 +10:00
Greg Kurz 71c55a1eef xics: don't include "target/ppc/cpu-qom.h" in "hw/ppc/xics.h"
The last user of the PowerPCCPU typedef in "hw/ppc/xics.h" vanished with
commit b1fd36c363. It isn't necessary to
include "target/ppc/cpu-qom.h" there anymore.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-08-21 14:28:45 +10:00
Peter Maydell b07cd3e748 ppc patch queue 2018-07-03
Here's a last minue pull request before today's soft freeze.  Ideally
 I would have sent this earlier, but I was waiting for a couple of
 extra fixes I knew were close.  And the freeze crept up on me, like
 always.
 
 Most of the changes here are bugfixes in any case.  There are some
 cleanups as well, which have been in my staging tree for a little
 while.  There are a couple of truly new features (some extensions to
 the sam460ex platform), but these are low risk, since they only affect
 a new and not really stabilized machine type anyway.
 
 Higlights are:
   * Mac platform improvements from Mark Cave-Ayland
   * Sam460ex improvements from BALATON Zoltan et al.
   * XICS interrupt handler cleanups from Cédric Le Goater
   * TCG improvements for atomic loads and stores from Richard
     Henderson
   * Assorted other bugfixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAls7D8oACgkQbDjKyiDZ
 s5Lxmg//YzPfC/nKqTTKkyJPzh/NnSC+kRTMAT3mbxdRIc7yfgMqJtWGGbS1iKgK
 EeJ9hl5Qm0HfscfDuzf0xasU62ZEv3kNdLnWJEIgkqiXrxoO5KCnC0y4D8NN1W03
 mvINNCa8+QDg2OsirGmNUTkriiG3wLIrHTpLZ4+JuC2Bd9H3nTHZgJ0MXON/1VWY
 oRgr6kMZ5+IAzPhvYLFR6l3nPI883fgJOFyRo7YqYrkVBKFrFkfK0Xjw6vpsNxcx
 2dE/YCHhNIriLuBG5noewL7GuqZRtLnl6rjjee5VAKIe1EmFeR+jsXwNjzGOVOJg
 dhjOtsJsQQ3WdEw5uImJzE64kV228WCgmkeXzZd1010JBLr7sUkrd2EuoZ23vvat
 uvZAHVSBrJg5WvzMo1VMEoPU3VeeZQ5HL+MI80iKiU6oUgRK11gVJcebtA0sEKt+
 zhJC4JiUlHtZLTGIpMBmU8DJZ3Tyk1cBEm+Ky+SaPE+dsz16UHI0fazFQXJnXphE
 MLHEGAyQgzWYp7kIcAjUFev0Geq/Uovy4JKIGI6ISop1wRPEQDxkthfkfRyQxQkE
 zuse4EBcEH/Undw9KrmEQa0hCe+8BRkxklVbPesFPPdqH3PKNxtHYuWpSShQF0PW
 XMjw43O2Rbsl8kBUHCpy4pYSugD1hpfgaw/mVUOU1u/M1O6toTw=
 =AHrx
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-3.0-20180703' into staging

ppc patch queue 2018-07-03

Here's a last minue pull request before today's soft freeze.  Ideally
I would have sent this earlier, but I was waiting for a couple of
extra fixes I knew were close.  And the freeze crept up on me, like
always.

Most of the changes here are bugfixes in any case.  There are some
cleanups as well, which have been in my staging tree for a little
while.  There are a couple of truly new features (some extensions to
the sam460ex platform), but these are low risk, since they only affect
a new and not really stabilized machine type anyway.

Higlights are:
  * Mac platform improvements from Mark Cave-Ayland
  * Sam460ex improvements from BALATON Zoltan et al.
  * XICS interrupt handler cleanups from Cédric Le Goater
  * TCG improvements for atomic loads and stores from Richard
    Henderson
  * Assorted other bugfixes

# gpg: Signature made Tue 03 Jul 2018 06:55:22 BST
# gpg:                using RSA key 6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg:                 aka "David Gibson (kernel.org) <dwg@kernel.org>"
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-3.0-20180703: (35 commits)
  ppc: Include vga cirrus card into the compiling process
  target/ppc: Relax reserved bitmask of indexed store instructions
  target/ppc: set is_jmp on ppc_tr_breakpoint_check
  spapr: compute default value of "hpt-max-page-size" later
  target/ppc/kvm: don't pass cpu to kvm_get_smmu_info()
  target/ppc/kvm: get rid of kvm_get_fallback_smmu_info()
  ppc440_uc: Basic emulation of PPC440 DMA controller
  sam460ex: Add RTC device
  hw/timer: Add basic M41T80 emulation
  ppc4xx_i2c: Rewrite to model hardware more closely
  hw/ppc: Give sam46ex its own config option
  fpu_helper.c: fix setting FPSCR[FI] bit
  target/ppc: Implement the rest of gen_st_atomic
  target/ppc: Implement the rest of gen_ld_atomic
  target/ppc: Use atomic min/max helpers
  target/ppc: Use MO_ALIGN for EXIWX and ECOWX
  target/ppc: Split out gen_st_atomic
  target/ppc: Split out gen_ld_atomic
  target/ppc: Split out gen_load_locked
  target/ppc: Tidy gen_conditional_store
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

# Conflicts:
#	hw/ppc/spapr.c
2018-07-03 14:59:27 +01:00
Cédric Le Goater eeefd43b3c ppx/xics: introduce a parent_reset in ICSStateClass
Just like for the realize handlers, this makes possible to move the
common ICSState code of the reset handlers in the ics-base class.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-07-03 09:56:51 +10:00
Cédric Le Goater 0a647b76db ppc/xics: introduce a parent_realize in ICSStateClass
This makes possible to move the common ICSState code of the realize
handlers in the ics-base class.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-07-03 09:56:51 +10:00
Cédric Le Goater a028dd423e ppc/xics: introduce ICP DeviceRealize and DeviceReset handlers
This changes the ICP realize and reset handlers in DeviceRealize and
DeviceReset handlers. parent handlers are now called from the
inheriting classes which is a cleaner object pattern.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-07-03 09:56:51 +10:00
Philippe Mathieu-Daudé ab3dd74924 hw/ppc: Use the IEC binary prefix definitions
It eases code review, unit is explicit.

Patch generated using:

  $ git grep -E '(1024|2048|4096|8192|(<<|>>).?(10|20|30))' hw/ include/hw/

and modified manually.

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20180625124238.25339-33-f4bug@amsat.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-07-02 15:41:16 +02:00
David Gibson 123eec6552 spapr: Use maximum page size capability to simplify memory backend checking
The way we used to handle KVM allowable guest pagesizes for PAPR guests
required some convoluted checking of memory attached to the guest.

The allowable pagesizes advertised to the guest cpus depended on the memory
which was attached at boot, but then we needed to ensure that any memory
later hotplugged didn't change which pagesizes were allowed.

Now that we have an explicit machine option to control the allowable
maximum pagesize we can simplify this.  We just check all memory backends
against that declared pagesize.  We check base and cold-plugged memory at
reset time, and hotplugged memory at pre_plug() time.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
2018-06-22 14:19:07 +10:00
David Gibson 2309832afd spapr: Maximum (HPT) pagesize property
The way the POWER Hash Page Table (HPT) MMU is virtualized by KVM HV means
that every page that the guest puts in the pagetables must be truly
physically contiguous, not just GPA-contiguous.  In effect this means that
an HPT guest can't use any pagesizes greater than the host page size used
to back its memory.

At present we handle this by changing what we advertise to the guest based
on the backing pagesizes.  This is pretty bad, because it means the guest
sees a different environment depending on what should be host configuration
details.

As a start on fixing this, we add a new capability parameter to the
pseries machine type which gives the maximum allowed pagesizes for an
HPT guest.  For now we just create and validate the parameter without
making it do anything.

For backwards compatibility, on older machine types we set it to the max
available page size for the host.  For the 3.0 machine type, we fix it to
16, the intention being to only allow HPT pagesizes up to 64kiB by default
in future.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
2018-06-22 14:19:07 +10:00
Cédric Le Goater 71b5c8d26e spapr: remove unused spapr_irq routines
spapr_irq_alloc_block and spapr_irq_alloc() are now deprecated.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-06-21 21:22:53 +10:00
Cédric Le Goater 4fe75a8ccd spapr: split the IRQ allocation sequence
Today, when a device requests for IRQ number in a sPAPR machine, the
spapr_irq_alloc() routine first scans the ICSState status array to
find an empty slot and then performs the assignement of the selected
numbers. Split this sequence in two distinct routines : spapr_irq_find()
for lookups and spapr_irq_claim() for claiming the IRQ numbers.

This will ease the introduction of a static layout of IRQ numbers.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-06-21 21:22:53 +10:00
David Gibson e2e4f64118 spapr: Add cpu_apply hook to capabilities
spapr capabilities have an apply hook to actually activate (or deactivate)
the feature in the system at reset time.  However, a number of capabilities
affect the setup of cpus, and need to be applied to each of them -
including hotplugged cpus for extra complication.  To make this simpler,
add an optional cpu_apply hook that is called from spapr_cpu_reset().

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
2018-06-21 21:22:53 +10:00
David Gibson 9f6edd066e spapr: Compute effective capability values earlier
Previously, the effective values of the various spapr capability flags
were only determined at machine reset time.  That was a lazy way of making
sure it was after cpu initialization so it could use the cpu object to
inform the defaults.

But we've now improved the compat checking code so that we don't need to
instantiate the cpus to use it.  That lets us move the resolution of the
capability defaults much earlier.

This is going to be necessary for some future capabilities.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
2018-06-21 21:22:53 +10:00
Cédric Le Goater 77864267c3 ppc/pnv: introduce Pnv8Chip and Pnv9Chip models
It introduces a base PnvChip class from which the specific processor
chip classes, Pnv8Chip and Pnv9Chip, inherit. Each of them needs to
define an init and a realize routine which will create the controllers
of the target processor. For the moment, the base PnvChip class
handles the XSCOM bus and the cores.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-06-21 21:22:53 +10:00
Greg Kurz b94020268e spapr_cpu_core: migrate per-CPU data
A per-CPU machine data pointer was recently added to PowerPCCPU. The
motivation is to to hide platform specific details from the core CPU
code. This per-CPU data can hold state which is relevant to the guest
though, eg, Virtual Processor Areas, and we should migrate this state.

This patch adds the plumbing so that we can migrate the per-CPU data
for PAPR guests. We only do this for newer machine types for the sake
of backward compatibility. No state is migrated for the moment: the
vmstate_spapr_cpu_state structure will be populated by subsequent
patches.

Signed-off-by: Greg Kurz <groug@kaod.org>
[dwg: Fix some trivial spelling and spacing errors]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-06-21 21:22:53 +10:00
Cédric Le Goater 04026890f2 ppc/pnv: introduce a new isa_create() operation to the chip model
This moves the details of the ISA bus creation under the LPC model but
more important, the new PnvChip operation will let us choose the chip
class to use when we introduce the different chip classes for Power9
and Power8. It hides away the processor chip controllers from the
machine.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-06-21 21:22:53 +10:00
Cédric Le Goater d35aefa9ae ppc/pnv: introduce a new intc_create() operation to the chip model
On Power9, the thread interrupt presenter has a different type and is
linked to the chip owning the cores.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-06-21 21:22:53 +10:00
David Gibson 7388efafc2 target/ppc, spapr: Move VPA information to machine_data
CPUPPCState currently contains a number of fields containing the state of
the VPA.  The VPA is a PAPR specific concept covering several guest/host
shared memory areas used to communicate some information with the
hypervisor.

As a PAPR concept this is really machine specific information, although it
is per-cpu, so it doesn't really belong in the core CPU state structure.

There's also other information that's per-cpu, but platform/machine
specific.  So create a (void *)machine_data in PowerPCCPU which can be
used by the machine to locate per-cpu data.  Intialization, lifetime and
cleanup of machine_data is entirely up to the machine type.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Greg Kurz <groug@kaod.org>
2018-06-16 16:32:50 +10:00
David Gibson 08304a8689 pnv_core: Allocate cpu thread objects individually
Currently, we allocate space for all the cpu objects within a single core
in one big block.  This was copied from an older version of the spapr code
and requires some ugly pointer manipulation to extract the individual
objects.

This design was due to a misunderstanding of qemu lifetime conventions and
has already been changed in spapr (in 94ad93bd "spapr_cpu_core: instantiate
CPUs separately".

Make an equivalent change in pnv_core to get rid of the nasty pointer
arithmetic.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
2018-06-16 16:32:33 +10:00
Mark Cave-Ayland f1114c17ee mac_newworld: add via machine option to control mac99 VIA/ADB configuration
This option allows the VIA configuration to be controlled between 3
different possible setups: cuda, pmu-adb and pmu with USB rather than ADB
keyboard/mouse.

For the moment we don't do anything with the configuration except to pass
it to the macio device (the via-cuda parent) and also to the firmware via
the fw_cfg interface so that it can present the correct device tree.

The default is cuda which is the current default and so will have no
change in behaviour.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-06-16 16:32:33 +10:00
Cédric Le Goater d61c285703 ppc/pnv: fix LPC HC firmware address space
A specific MemoryRegion is required for the LPC HC Firmware address
space.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-06-12 10:44:36 +10:00
Mark Cave-Ayland 5b64db9754 ppc: add missing FW_CFG_PPC_NVRAM_FLAT definition
This is used in OpenBIOS to define the memory layout of the NVRAM device. Whilst
currently left at its default value, add the missing definition to ensure it is
reserved.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-06-12 10:44:36 +10:00
Thomas Huth f23c81073a trivial: Do not include pci.h if it is not necessary
There is no need to include pci.h in these files.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2018-05-20 08:40:00 +03:00
David Hildenbrand 0c9269a52d spapr: rename "hotplug memory" terminology to "device memory"
Let's make it clear at relevant places that we are dealing with device
memory. That it can be used for memory hotplug is just a special case.

Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180423165126.15441-11-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
[ehabkost: rebased series, solved conflicts at spapr.c]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2018-05-07 10:00:02 -03:00
David Hildenbrand b0c14ec4ef machine: make MemoryHotplugState accessible via the machine
Let's allow to query the MemoryHotplugState directly from the machine.
If the pointer is NULL, the machine does not support memory devices. If
the pointer is !NULL, the machine supports memory devices and the
data structure contains information about the applicable physical
guest address space region.

This allows us to generically detect if a certain machine has support
for memory devices, and to generically manage it (find free address
range, plug/unplug a memory region).

We will rename "MemoryHotplugState" to something more meaningful
("DeviceMemory") after we completed factoring out the pc-dimm code into
MemoryDevice code.

Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180423165126.15441-3-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
[ehabkost: rebased series, solved conflicts at spapr.c]
[ehabkost: squashed fix to use g_malloc0()]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2018-05-07 10:00:02 -03:00
David Gibson 84369f639e spapr: Make a helper to set up cpu entry point state
Under PAPR, only the boot CPU is active when the system starts.  Other cpus
must be explicitly activated using an RTAS call.  The entry state for the
boot and secondary cpus isn't identical, but it has some things in common.
We're going to add a bit more common setup later, too, so to simplify
make a helper which sets up the common entry state for both boot and
secondary cpu threads.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
2018-05-04 15:00:37 +10:00
Bharata B Rao a324d6f166 spapr: Support ibm,dynamic-memory-v2 property
The new property ibm,dynamic-memory-v2 allows memory to be represented
in a more compact manner in device tree.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-04-27 18:05:23 +10:00
David Gibson 644a2c99a9 target/ppc: Pass cpu instead of env to ppc_create_page_sizes_prop()
As a rule we prefer to pass PowerPCCPU instead of CPUPPCState, and this
change will make some things simpler later on.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
2018-04-27 18:05:22 +10:00
Suraj Jitindar Singh c76c0d3090 ppc/spapr-caps: Convert cap-ibs to custom spapr-cap
Convert cap-ibs (indirect branch speculation) to a custom spapr-cap
type.

All tristate caps have now been converted to custom spapr-caps, so
remove the remaining support for them.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
[dwg: Don't explicitly list "?"/help option, trust convention]
[dwg: Fold tristate removal into here, to not break bisect]
[dwg: Fix minor style problems]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-03-06 13:16:29 +11:00
Mark Cave-Ayland f7bd7941d8 openpic: move OpenPIC state and related definitions to openpic.h
This is to faciliate access to OpenPICState when wiring up the PIC to the macio
controller.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-03-06 13:16:29 +11:00
Mark Cave-Ayland 8d085cf03b openpic: move KVM-specific declarations into separate openpic_kvm.h file
This is needed before the next patch because the target-dependent kvm stub
uses the existing kvm_openpic_connect_vcpu() declaration, making it impossible
to move the device-specific declarations into the same file without breaking
ppc-linux-user compilation.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-03-06 13:16:29 +11:00
Markus Armbruster 9af2398977 Include less of the generated modular QAPI headers
In my "build everything" tree, a change to the types in
qapi-schema.json triggers a recompile of about 4800 out of 5100
objects.

The previous commit split up qmp-commands.h, qmp-event.h, qmp-visit.h,
qapi-types.h.  Each of these headers still includes all its shards.
Reduce compile time by including just the shards we actually need.

To illustrate the benefits: adding a type to qapi/migration.json now
recompiles some 2300 instead of 4800 objects.  The next commit will
improve it further.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180211093607.27351-24-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
[eblake: rebase to master]
Signed-off-by: Eric Blake <eblake@redhat.com>
2018-03-02 13:45:50 -06:00
Greg Kurz 14bb4486c8 spapr: rename spapr_vcpu_id() to spapr_get_vcpu_id()
The spapr_vcpu_id() function is an accessor actually. Let's rename it
for symmetry with the recently added spapr_set_vcpu_id() helper.

The motivation behind this is that a later patch will consolidate
the VCPU id formula in a function and spapr_vcpu_id looks like an
appropriate name.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-02-16 12:14:26 +11:00
Greg Kurz 648edb6475 spapr: move VCPU calculation to core machine code
The VCPU ids are currently computed and assigned to each individual
CPU threads in spapr_cpu_core_realize(). But the numbering logic
of VCPU ids is actually a machine-level concept, and many places
in hw/ppc/spapr.c also have to compute VCPU ids out of CPU indexes.

The current formula used in spapr_cpu_core_realize() is:

    vcpu_id = (cc->core_id * spapr->vsmt / smp_threads) + i

where:

    cc->core_id is a multiple of smp_threads
    cpu_index = cc->core_id + i
    0 <= i < smp_threads

So we have:

    cpu_index % smp_threads == i
    cc->core_id / smp_threads == cpu_index / smp_threads

hence:

    vcpu_id =
        (cpu_index / smp_threads) * spapr->vsmt + cpu_index % smp_threads;

This formula was used before VSMT at the time VCPU ids where computed
at the target emulation level. It has the advantage of being useable
to derive a VPCU id out of a CPU index only. It is fitted for all the
places where the machine code has to compute a VCPU id.

This patch introduces an accessor to set the VCPU id in a PowerPCCPU object
using the above formula. It is a first step to consolidate all the VCPU id
logic in a single place.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-02-16 12:14:26 +11:00
Suraj Jitindar Singh c59704b254 target/ppc/spapr: Add H-Call H_GET_CPU_CHARACTERISTICS
The new H-Call H_GET_CPU_CHARACTERISTICS is used by the guest to query
behaviours and available characteristics of the cpu.

Implement the handler for this new H-Call which formulates its response
based on the setting of the spapr_caps cap-cfpc, cap-sbbc and cap-ibs.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-01-29 14:24:55 +11:00
Suraj Jitindar Singh 4be8d4e7d9 target/ppc/spapr_caps: Add new tristate cap safe_indirect_branch
Add new tristate cap cap-ibs to represent the indirect branch
serialisation capability.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-01-29 14:24:55 +11:00
Suraj Jitindar Singh 09114fd817 target/ppc/spapr_caps: Add new tristate cap safe_bounds_check
Add new tristate cap cap-sbbc to represent the speculation barrier
bounds checking capability.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-01-29 14:24:55 +11:00
Suraj Jitindar Singh 8f38eaf8f9 target/ppc/spapr_caps: Add new tristate cap safe_cache
Add new tristate cap cap-cfpc to represent the cache flush on privilege
change capability.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-01-29 14:24:55 +11:00
Suraj Jitindar Singh 6898aed77f target/ppc/spapr_caps: Add support for tristate spapr_capabilities
spapr_caps are used to represent the level of support for various
capabilities related to the spapr machine type. Currently there is
only support for boolean capabilities.

Add support for tristate capabilities by implementing their get/set
functions. These capabilities can have the values 0, 1 or 2
corresponding to broken, workaround and fixed.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-01-29 14:24:55 +11:00
Suraj Jitindar Singh 8acc2ae5e9 target/ppc/kvm: Add cap_ppc_safe_[cache/bounds_check/indirect_branch]
Add three new kvm capabilities used to represent the level of host support
for three corresponding workarounds.

Host support for each of the capabilities is queried through the
new ioctl KVM_PPC_GET_CPU_CHAR which returns four uint64 quantities. The
first two, character and behaviour, represent the available
characteristics of the cpu and the behaviour of the cpu respectively.
The second two, c_mask and b_mask, represent the mask of known bits for
the character and beheviour dwords respectively.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
[dwg: Correct some compile errors due to name change in final kernel
 patch version]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-01-29 14:24:55 +11:00
Cédric Le Goater 9eff7830c4 ppc/pnv: fix PnvChip redefinition in <hw/ppc/pnv_xscom.h>
This redefinition generates warnings on some clang compilers and older
gcc4.4.

...include/hw/ppc/pnv_xscom.h:24:24: warning: redefinition of typedef 'PnvChip' is a C11
      feature [-Wtypedef-redefinition]
typedef struct PnvChip PnvChip;
                       ^
...include/hw/ppc/pnv.h:65:3: note: previous definition is here
} PnvChip;
  ^
1 warning generated.
  CC      ppc64-softmmu/hw/ppc/pnv_xscom.o

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-01-27 17:25:27 +11:00
Cédric Le Goater c035851ac0 ppc/pnv: fix XSCOM core addressing on POWER9
The XSCOM base address of the core chiplet was wrongly calculated. Use
the OPAL macros to fix that and do a couple of renames.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-01-17 09:35:24 +11:00
Cédric Le Goater b3b066e9d8 ppc/pnv: introduce pnv*_is_power9() helpers
These are useful when instantiating device models which are shared
between the POWER8 and the POWER9 processor families.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-01-17 09:35:24 +11:00
Suraj Jitindar Singh 4e5fe3688e hw/ppc/spapr_caps: Rework spapr_caps to use uint8 internal representation
Currently spapr_caps are tied to boolean values (on or off). This patch
reworks the caps so that they can have any uint8 value. This allows more
capabilities with various values to be represented in the same way
internally. Capabilities are numbered in ascending order. The internal
representation of capability values is an array of uint8s in the
sPAPRMachineState, indexed by capability number.

Capabilities can have their own name, description, options, getter and
setter functions, type and allow functions. They also each have their own
section in the migration stream. Capabilities are only migrated if they
were explictly set on the command line, with the assumption that
otherwise the default will match.

On migration we ensure that the capability value on the destination
is greater than or equal to the capability value from the source. So
long at this remains the case then the migration is considered
compatible and allowed to continue.

This patch implements generic getter and setter functions for boolean
capabilities. It also converts the existings cap-htm, cap-vsx and
cap-dfp capabilities to this new format.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-01-17 09:35:24 +11:00
David Gibson 2d1fb9bc8e spapr: Handle Decimal Floating Point (DFP) as an optional capability
Decimal Floating Point has been available on POWER7 and later (server)
cpus.  However, it can be disabled on the hypervisor, meaning that it's
not available to guests.

We currently handle this by conditionally advertising DFP support in the
device tree depending on whether the guest CPU model supports it - which
can also depend on what's allowed in the host for -cpu host.  That can lead
to confusion on migration, since host properties are silently affecting
guest visible properties.

This patch handles it by treating it as an optional capability for the
pseries machine type.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
2018-01-17 09:35:24 +11:00
David Gibson 2938664286 spapr: Handle VMX/VSX presence as an spapr capability flag
We currently have some conditionals in the spapr device tree code to decide
whether or not to advertise the availability of the VMX (aka Altivec) and
VSX vector extensions to the guest, based on whether the guest cpu has
those features.

This can lead to confusion and subtle failures on migration, since it makes
a guest visible change based only on host capabilities.  We now have a
better mechanism for this, in spapr capabilities flags, which explicitly
depend on user options rather than host capabilities.

Rework the advertisement of VSX and VMX based on a new VSX capability.  We
no longer bother with a conditional for VMX support, because every CPU
that's ever been supported by the pseries machine type supports VMX.

NOTE: Some userspace distributions (e.g. RHEL7.4) already rely on
availability of VSX in libc, so using cap-vsx=off may lead to a fatal
SIGILL in init.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
2018-01-17 09:35:24 +11:00
David Gibson be85537d65 spapr: Validate capabilities on migration
Now that the "pseries" machine type implements optional capabilities (well,
one so far) there's the possibility of having different capabilities
available at either end of a migration.  Although arguably a user error,
it would be nice to catch this situation and fail as gracefully as we can.

This adds code to migrate the capabilities flags.  These aren't pulled
directly into the destination's configuration since what the user has
specified on the destination command line should take precedence.  However,
they are checked against the destination capabilities.

If the source was using a capability which is absent on the destination,
we fail the migration, since that could easily cause a guest crash or other
bad behaviour.  If the source lacked a capability which is present on the
destination we warn, but allow the migration to proceed.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
2018-01-17 09:35:24 +11:00
David Gibson ee76a09fc7 spapr: Treat Hardware Transactional Memory (HTM) as an optional capability
This adds an spapr capability bit for Hardware Transactional Memory.  It is
enabled by default for pseries-2.11 and earlier machine types. with POWER8
or later CPUs (as it must be, since earlier qemu versions would implicitly
allow it).  However it is disabled by default for the latest pseries-2.12
machine type.

This means that with the latest machine type, HTM will not be available,
regardless of CPU, unless it is explicitly enabled on the command line.
That change is made on the basis that:

 * This way running with -M pseries,accel=tcg will start with whatever cpu
   and will provide the same guest visible model as with accel=kvm.
     - More specifically, this means existing make check tests don't have
       to be modified to use cap-htm=off in order to run with TCG

 * We hope to add a new "HTM without suspend" feature in the not too
   distant future which could work on both POWER8 and POWER9 cpus, and
   could be enabled by default.

 * Best guesses suggest that future POWER cpus may well only support the
   HTM-without-suspend model, not the (frankly, horribly overcomplicated)
   POWER8 style HTM with suspend.

 * Anecdotal evidence suggests problems with HTM being enabled when it
   wasn't wanted are more common than being missing when it was.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
2018-01-17 09:35:24 +11:00
David Gibson 33face6b89 spapr: Capabilities infrastructure
Because PAPR is a paravirtual environment access to certain CPU (or other)
facilities can be blocked by the hypervisor.  PAPR provides ways to
advertise in the device tree whether or not those features are available to
the guest.

In some places we automatically determine whether to make a feature
available based on whether our host can support it, in most cases this is
based on limitations in the available KVM implementation.

Although we correctly advertise this to the guest, it means that host
factors might make changes to the guest visible environment which is bad:
as well as generaly reducing reproducibility, it means that a migration
between different host environments can easily go bad.

We've mostly gotten away with it because the environments considered mature
enough to be well supported (basically, KVM on POWER8) have had consistent
feature availability.  But, it's still not right and some limitations on
POWER9 is going to make it more of an issue in future.

This introduces an infrastructure for defining "sPAPR capabilities".  These
are set by default based on the machine version, masked by the capabilities
of the chosen cpu, but can be overriden with machine properties.

The intention is at reset time we verify that the requested capabilities
can be supported on the host (considering TCG, KVM and/or host cpu
limitations).  If not we simply fail, rather than silently modifying the
advertised featureset to the guest.

This does mean that certain configurations that "worked" may now fail, but
such configurations were already more subtly broken.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
2018-01-17 09:35:24 +11:00
Cédric Le Goater b168a138a8 ppc/pnv: change powernv_ prefix to pnv_ for overall naming consistency
The 'pnv' prefix is now used for all and the routines populating the
device tree start with 'pnv_dt'. The handler of the PnvXScomInterface
is also renamed to 'dt_xscom' which should reflect that it is
populating the device tree under the 'xscom@' node of the chip.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-01-10 12:53:00 +11:00
Greg Kurz bb2d8ab636 spapr: fix LSI interrupt specifiers in the device tree
LoPAPR 1.1 B.6.9.1.2 describes the "#interrupt-cells" property of the
PowerPC External Interrupt Source Controller node as follows:

“#interrupt-cells”

  Standard property name to define the number of cells in an interrupt-
  specifier within an interrupt domain.

  prop-encoded-array: An integer, encoded as with encode-int, that denotes
  the number of cells required to represent an interrupt specifier in its
  child nodes.

  The value of this property for the PowerPC External Interrupt option shall
  be 2. Thus all interrupt specifiers (as used in the standard “interrupts”
  property) shall consist of two cells, each containing an integer encoded
  as with encode-int. The first integer represents the interrupt number the
  second integer is the trigger code: 0 for edge triggered, 1 for level
  triggered.

This patch fixes the interrupt specifiers in the "interrupt-map" property
of the PHB node, that were setting the second cell to 8 (confusion with
IRQ_TYPE_LEVEL_LOW ?) instead of 1.

VIO devices and RTAS event sources use the same format for interrupt
specifiers: while here, we introduce a common helper to handle the
encoding details.

Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Cédric Le Goater <clg@kaod.org>
--
v3: - reference public LoPAPR instead of internal PAPR+ in changelog
    - change helper name to spapr_dt_xics_irq()

v2: - drop the erroneous changes to the "interrupts" prop in PCI device nodes
    - introduce a common helper to encode interrupt specifiers
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-12-15 09:49:24 +11:00
Cédric Le Goater 7718375584 spapr: introduce a spapr_qirq() helper
xics_get_qirq() is only used by the sPAPR machine. Let's move it there
and change its name to reflect its scope. It will be useful for XIVE
support which will use its own set of qirqs.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-12-15 09:49:24 +11:00
Cédric Le Goater 60c6823b9b spapr: move the IRQ allocation routines under the machine
Also change the prototype to use a sPAPRMachineState and prefix them
with spapr_irq_. It will let us synchronise the IRQ allocation with
the XIVE interrupt mode when available.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-12-15 09:49:24 +11:00
Cédric Le Goater 4f7a47beeb ppc/xics: introduce an icp_create() helper
The sPAPR and the PowerNV core objects create the interrupt presenter
object of the CPUs in a very similar way. Let's provide a common
routine in which we use the presenter 'type' as a child identifier.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-12-15 09:49:24 +11:00
Greg Kurz 94ad93bd97 spapr_cpu_core: instantiate CPUs separately
The current code assumes that only the CPU core object holds a
reference on each individual CPU object, and happily frees their
allocated memory when the core is unrealized. This is dangerous
as some other code can legitimely keep a pointer to a CPU if it
calls object_ref(), but it would end up with a dangling pointer.

Let's allocate all CPUs with object_new() and let QOM free them
when their reference count reaches zero. This greatly simplify the
code as we don't have to fiddle with the instance size anymore.

Signed-off-by: Greg Kurz <groug@kaod.org>
Acked-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-12-15 09:49:23 +11:00
Greg Kurz dcb556fc6a xics/kvm: synchonize state before 'info pic'
When using the emulated XICS, the 'info pic' monitor command shows:

CPU 0 XIRR=ff000000 ((nil)) PP=ff MFRR=ff
ICS 1000..13ff 0x10040060340
  1000 MSI 05 00
  1001 MSI 05 00
  1002 MSI 05 00
  1003 MSI ff 00
  1004 LSI ff 00
  1005 LSI ff 00
  1006 LSI ff 00
  1007 LSI ff 00
  1008 MSI 05 00
  1009 MSI 05 00
  100a MSI 05 00
  100b MSI 05 00
  100c MSI 05 00

but when using the in-kernel XICS with the very same guest, we get:

CPU 0 XIRR=00000000 ((nil)) PP=ff MFRR=ff
ICS 1000..13ff 0x10032e00340
  1000 MSI ff 00
  1001 MSI ff 00
  1002 MSI ff 00
  1003 MSI ff 00
  1004 LSI ff 00
  1005 LSI ff 00
  1006 LSI ff 00
  1007 LSI ff 00
  1008 MSI ff 00
  1009 MSI ff 00
  100a MSI ff 00
  100b MSI ff 00
  100c MSI ff 00

ie, all irqs are masked and XIRR is null, while we should get the
same output as with the emulated XICS.

If the guest is then migrated, 'info pic' shows the expected values
on both source and destination.

The problem is that QEMU doesn't synchronize with KVM before printing
the XICS state. Migration happens to fix the output because it enforces
synchronization with KVM.

To fix the invalid output of 'info pic', this patch introduces a new
synchronize_state operation for both ICPStateClass and ICSStateClass.
The ICP operation relies on run_on_cpu() in order to kick the vCPU
and avoid sleeping on KVM_GET_ONE_REG.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-11-14 11:12:42 +11:00
Igor Mammedov 40abf43f72 ppc: pnv: drop PnvChipClass::cpu_model field
deduce core type directly from chip type instead of
maintaining type mapping in PnvChipClass::cpu_model.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-10-17 10:34:01 +11:00
Igor Mammedov 35bdb9def2 ppc: pnv: drop PnvCoreClass::cpu_oc field
deduce cpu type directly from core type instead of
maintaining type mapping in PnvCoreClass::cpu_oc and doing
extra cpu_model parsing in pnv_core_class_init()

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-10-17 10:34:01 +11:00
Igor Mammedov 7fd544d8a7 ppc: pnv: normalize core/chip type names
typically for cpus/core type names following convention is used

   new_type_prefix-superclass_typename

make PNV core/chip to follow common convention.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-10-17 10:34:01 +11:00
Igor Mammedov 4a12c699d3 ppc: pnv: use generic cpu_model parsing
use common cpu_model prasing in vl.c and set default cpu_model
using generic MachineClass::default_cpu_type.

Beside of switching to generic infrastructure it solves several
issues.

 * ppc_cpu_class_by_name() is used to deal with lower/upper case
   and alias translations into actual cpu type, which fixes
    '-M powernv -cpu power8' and '-M powernv -cpu power9_v1.0'
   usecases which error out with:
    'invalid CPU model 'FOO' for powernv machine'
 * allows to switch to lower-case typenames in pnv chip/core name
   (by convention typnames should be lower-case)
 * replace aliased names /power8, power9, .../ with exact cpu model
   names (i.e. typenames should be stable but aliases might decide to
   point to other cpu model withi family or changed by kvm). It will
   also help to simplify pnv_chip/core code and get rid of dependency
   on cpu_model parsing.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
[dwg: Updated to make DD2.0 as default POWER9 chip]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-10-17 10:34:01 +11:00
Igor Mammedov 2e9c10eba0 ppc: spapr: use generic cpu_model parsing
use generic cpu_model parsing introduced by
 (6063d4c0f vl.c: convert cpu_model to cpu type and set of global properties before machine_init())

it allows to:
  * replace sPAPRMachineClass::tcg_default_cpu with
    MachineClass::default_cpu_type
  * drop cpu_parse_cpu_model() from hw/ppc/spapr.c and reuse
    one in vl.c
  * simplify spapr_get_cpu_core_type() by removing
    not needed anymore recurrsion since alias look up
    happens earlier at vl.c and spapr_get_cpu_core_type()
    works only with resulted from that cpu type.
  * spapr no more needs to parse/depend on being phased out
    MachineState::cpu_model, all tha parsing done by generic
    code and target specific callback.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
[dwg: Correct minor compile error]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-10-17 10:34:01 +11:00
Igor Mammedov 5bbb264186 ppc: spapr: register 'host' core type along with the rest of core types
consolidate 'host' core type registration by moving it from
KVM specific code into spapr_cpu_core.c, similar like it's
done in x86 target.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-10-17 10:34:00 +11:00
Igor Mammedov b51d3c8818 ppc: spapr: use cpu type name directly
replace sPAPRCPUCoreClass::cpu_class with cpu type name
since it were needed just to get that at points it were
accessed.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-10-17 10:34:00 +11:00
Igor Mammedov 44cd95e31a ppc: spapr: define core types statically
spapr core type definition doesn't have any fields that
require it to be defined at runtime. So replace code
that fills in TypeInfo at runtime with static TypeInfo
array that does the same at complie time.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-10-17 10:34:00 +11:00
Igor Mammedov b8e999673b ppc: move '-cpu foo,compat=xxx' parsing into ppc_cpu_parse_featurestr()
there is a dedicated callback CPUClass::parse_features
which purpose is to convert -cpu features into a set of
global properties AND deal with compat/legacy features
that couldn't be directly translated into CPU's properties.

Create ppc variant of it (ppc_cpu_parse_featurestr) and
move 'compat=val' handling from spapr_cpu_core.c into it.
That removes a dependency of board/core code on cpu_model
parsing and would let to reuse common -cpu parsing
introduced by 6063d4c0

Set "max-cpu-compat" property only if it exists, in practice
it should limit 'compat' hack to spapr machine and allow
to avoid including machine/spapr headers in target/ppc/cpu.c

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-10-17 10:34:00 +11:00
Igor Mammedov a1063aa8a5 ppc: spapr: replace ppc_cpu_parse_features() with cpu_parse_cpu_model()
ppc_cpu_parse_features() is doing practically the same thing as
generic cpu_parse_cpu_model(). So remove duplicated impl. and
reuse generic one.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-10-17 10:34:00 +11:00
Mark Cave-Ayland ecba28dbf2 mac_dbdma: remove DBDMA_init() function
Instead we can now instantiate the MAC_DBDMA object directly within the
macio device. We also add the DBDMA device as a child property so that
it is possible to retrieve later.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-27 13:05:41 +10:00
Mark Cave-Ayland 1d27f351af mac_dbdma: QOMify
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-27 13:05:41 +10:00
Mark Cave-Ayland 2bb4a98f90 mac_dbdma: remove unused IO fields from DBDMAState
These fields were used to manually handle IO requests that weren't aligned
to a sector boundary before this feature was supported by the block API.

Once the block API changed to support byte-aligned IO requests, the macio
controller was switched over to use it in commit be1e343 but these fields
were accidentally left behind. Remove them, including the initialisation
in DBDMA_init().

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-27 13:05:41 +10:00
Eric Blake 5261158d21 ppc/pnv: Improve macro parenthesization
Although none of the existing macro call-sites were broken,
it's always better to write macros that properly parenthesize
arguments that can be complex expressions, so that the intended
order of operations is not broken.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-27 13:05:41 +10:00
Benjamin Herrenschmidt 58b6283586 ppc: Fix OpenPIC model
Apple uses an IBM MPIC2A without timers, it has 64 sources.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-27 13:05:41 +10:00
Cédric Le Goater 21f3f8db0e ppc/xive: fix OV5_XIVE_EXPLOIT bits
On POWER9, the Client Architecture Support (CAS) negotiation process
determines whether the guest operates in XIVE Legacy compatibility or
in XIVE exploitation mode. Now that we have initial guest support for
the XIVE interrupt controller, let's fix the bits definition which have
evolved in the latest specs.

The platform advertises the XIVE Exploitation Mode support using the
property "ibm,arch-vec-5-platform-support-vec-5", byte 23 bits 0-1 :

 - 0b00 XIVE legacy mode Only
 - 0b01 XIVE exploitation mode Only
 - 0b10 XIVE legacy or exploitation mode

The OS asks for XIVE Exploitation Mode support using the property
"ibm,architecture-vec-5", byte 23 bits 0-1:

 - 0b00 XIVE legacy mode Only
 - 0b01 XIVE exploitation mode Only

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-15 10:29:48 +10:00
Sam Bobroff fa98fbfcdf PPC: KVM: Support machine option to set VSMT mode
KVM now allows writing to KVM_CAP_PPC_SMT which has previously been
read only. Doing so causes KVM to act, for that VM, as if the host's
SMT mode was the given value. This is particularly important on Power
9 systems because their default value is 1, but they are able to
support values up to 8.

This patch introduces a way to control this capability via a new
machine property called VSMT ("Virtual SMT"). If the value is not set
on the command line a default is chosen that is, when possible,
compatible with legacy systems.

Note that the intialization of KVM_CAP_PPC_SMT has changed slightly
because it has changed (in KVM) from a global capability to a
VM-specific one. This won't cause a problem on older KVMs because VM
capabilities fall back to global ones.

Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-08 09:30:55 +10:00
BALATON Zoltan 0453428047 ppc4xx: Make MAL emulation more generic
Allow MAL with more RX and TX channels as found in newer versions.

Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-08 09:30:55 +10:00
BALATON Zoltan 517284a771 ppc4xx: Move MAL from ppc405_uc to ppc4xx_devs
This device appears in other SoCs as well not just in 405 ones

Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-08 09:30:55 +10:00
Sam Bobroff 2e886fb391 ppc: spapr: Make VCPU ID handling private to SPAPR
The concept of a VCPU ID that differs from the CPU's index
(cpu->cpu_index) exists only within SPAPR machines so, move the
functions ppc_get_vcpu_id() and ppc_get_cpu_by_vcpu_id() into spapr.c
and rename them appropriately.

Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-08 09:30:55 +10:00
Daniel Henrique Barboza 10f12e6450 hw/ppc: CAS reset on early device hotplug
This patch is a follow up on the discussions made in patch
"hw/ppc: disable hotplug before CAS is completed" that can be
found at [1].

At this moment, we do not support CPU/memory hotplug in early
boot stages, before CAS. When a hotplug occurs, the event is logged
in an internal RTAS event log queue and an IRQ pulse is fired. In
regular conditions, the guest handles the interrupt by executing
check_exception, fetching the generated hotplug event and enabling
the device for use.

In early boot, this IRQ isn't caught (SLOF does not handle hotplug
events), leaving the event in the rtas event log queue. If the guest
executes check_exception due to another hotplug event, the re-assertion
of the IRQ ends up de-queuing the first hotplug event as well. In short,
a device hotplugged before CAS is considered coldplugged by SLOF.
This leads to device misbehavior and, in some cases, guest kernel
Ooops when trying to unplug the device.

A proper fix would be to turn every device hotplugged before CAS
as a colplugged device. This is not trivial to do with the current
code base though - the FDT is written in the guest memory at
ppc_spapr_reset and can't be retrieved without adding extra state
(fdt_size for example) that will need to managed and migrated. Adding
the hotplugged DT in the middle of CAS negotiation via the updated DT
tree works with CPU devs, but panics the guest kernel at boot. Additional
analysis would be necessary for LMBs and PCI devices. There are
questions to be made in QEMU/SLOF/kernel level about how we can make
this change in a sustainable way.

With Linux guests, a fix would be the kernel executing check_exception
at boot time, de-queueing the events that happened in early boot and
processing them. However, even if/when the newer kernels start
fetching these events at boot time, we need to take care of older
kernels that won't be doing that.

This patch works around the situation by issuing a CAS reset if a hotplugged
device is detected during CAS:

- the DRC conditions that warrant a CAS reset is the same as those that
triggers a DRC migration - the DRC must have a device attached and
the DRC state is not equal to its ready_state. With that in mind, this
patch makes use of 'spapr_drc_needed' to determine if a CAS reset
is needed.

- In the middle of CAS negotiations, the function
'spapr_hotplugged_dev_before_cas' goes through all the DRCs to see
if there are any DRC that requires a reset, using spapr_drc_needed. If
that happens, returns '1' in 'spapr_h_cas_compose_response' which will set
spapr->cas_reboot to true, causing the machine to reboot.

No changes are made for coldplug devices.

[1] http://lists.nongnu.org/archive/html/qemu-devel/2017-08/msg02855.html

Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-08 09:30:54 +10:00
Daniel Henrique Barboza 5625817423 hw/ppc: clear pending_events on machine reset
The sPAPR machine isn't clearing up the pending events QTAILQ on
machine reboot. This allows for unprocessed hotplug/epow events
to persist in the queue after reset and, when reasserting the IRQs in
check_exception later on, these will be being processed by the OS.

This patch implements a new function called 'spapr_clear_pending_events'
that clears up the pending_events QTAILQ. This helper is then called
inside ppc_spapr_reset to clear up the events queue, preventing
old/deprecated events from persisting after a reset.

Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-08 09:30:54 +10:00
David Gibson 2772cf6be9 pseries: Use smaller default hash page tables when guest can resize
We've now implemented a PAPR extension allowing PAPR guest to resize
their hash page table (HPT) during runtime.

This patch makes use of that facility to allocate smaller HPTs by default.
Specifically when a guest is aware of the HPT resize facility, qemu sizes
the HPT to the initial memory size, rather than the maximum memory size on
the assumption that the guest will resize its HPT if necessary for hot
plugged memory.

When the initial memory size is much smaller than the maximum memory size
(a common configuration with e.g. oVirt / RHEV) then this can save
significant memory on the HPT.

If the guest does *not* advertise HPT resize awareness when it makes the
ibm,client-architecture-support call, qemu resizes the HPT for maxmimum
memory size (unless it's been configured not to allow such guests at all).

For now we make that reallocation assuming the guest has not yet used the
HPT at all.  That's true in practice, but not, strictly, an architectural
or PAPR requirement.  If we need to in future we can fix this by having
the client-architecture-support call reboot the guest with the revised
HPT size (the client-architecture-support call is explicitly permitted to
trigger a reboot in this way).

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
2017-07-17 15:07:05 +10:00
David Gibson 0b0b831016 pseries: Implement HPT resizing
This patch implements hypercalls allowing a PAPR guest to resize its own
hash page table.  This will eventually allow for more flexible memory
hotplug.

The implementation is partially asynchronous, handled in a special thread
running the hpt_prepare_thread() function.  The state of a pending resize
is stored in SPAPR_MACHINE->pending_hpt.

The H_RESIZE_HPT_PREPARE hypercall will kick off creation of a new HPT, or,
if one is already in progress, monitor it for completion.  If there is an
existing HPT resize in progress that doesn't match the size specified in
the call, it will cancel it, replacing it with a new one matching the
given size.

The H_RESIZE_HPT_COMMIT completes transition to a resized HPT, and can only
be called successfully once H_RESIZE_HPT_PREPARE has successfully
completed initialization of a new HPT.  The guest must ensure that there
are no concurrent accesses to the existing HPT while this is called (this
effectively means stop_machine() for Linux guests).

For now H_RESIZE_HPT_COMMIT goes through the whole old HPT, rehashing each
HPTE into the new HPT.  This can have quite high latency, but it seems to
be of the order of typical migration downtime latencies for HPTs of size
up to ~2GiB (which would be used in a 256GiB guest).

In future we probably want to move more of the rehashing to the "prepare"
phase, by having H_ENTER and other hcalls update both current and
pending HPTs.  That's a project for another day, but should be possible
without any changes to the guest interface.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-07-17 15:07:05 +10:00
David Gibson 30f4b05bd0 pseries: Stubs for HPT resizing
This introduces stub implementations of the H_RESIZE_HPT_PREPARE and
H_RESIZE_HPT_COMMIT hypercalls which we hope to add in a PAPR
extension to allow run time resizing of a guest's hash page table.  It
also adds a new machine property for controlling whether this new
facility is available.

For now we only allow resizing with TCG, allowing it with KVM will require
kernel changes as well.

Finally, it adds a new string to the hypertas property in the device
tree, advertising to the guest the availability of the HPT resizing
hypercalls.  This is a tentative suggested value, and would need to be
standardized by PAPR before being merged.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2017-07-17 15:07:05 +10:00
Alexey Kardashevskiy 2ee77040f5 ppc/pnv: Remove unused XICSState reference
e6f7e110ee "ppc/xics: remove the XICSState classes" got rid of
XICSState, this is just an leftover.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-07-17 15:07:05 +10:00
David Gibson 67fea71bf3 spapr: Implement DR-indicator for physical DRCs only
According to PAPR, the DR-indicator should only be valid for physical DRCs,
not logical DRCs.  At the moment we implement it for all DRCs, so restrict
it to physical ones only.

We move the state to the physical DRC subclass, which means adding some
QOM boilerplate to handle the newly distinct type.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
2017-07-17 15:07:05 +10:00
David Gibson 4445b1d27e spapr: Remove sPAPRConfigureConnectorState sub-structure
Most of the time, the state of a DRC object is contained in the single
'state' variable.  However, during the transition from UNISOLATE to
CONFIGURED state requires multiple calls to the ibm,configure-connector
RTAS call to retrieve the device tree for the attached device.  We need
some extra state to keep track of where we're up to in delivering the
device tree information to the guest.

Currently that extra state is in a sPAPRConfigureConnectorState
substructure which is only allocated when we're in the middle of the
configure connector process.  That sounds like a good idea, but the extra
state is only two integers - on many platforms that will take up the same
room as the (maybe NULL) ccs pointer even before malloc() overhead.  Plus
it's another object whose lifetime we need to manage.  In short, it's not
worth it.

So, fold the sPAPRConfigureConnectorState substructure directly into the
DRC object.

Previously the structure was allocated lazily when the configure-connector
call discovers it's not there.  Now, we need to initialize the subfields
pre-emptively, as soon as we enter UNISOLATE state.

Although it's not strictly necessary (the field values should only ever
be consulted when in UNISOLATE state), we try to keep them at -1 when in
other states, as a debugging aid.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
2017-07-17 15:07:05 +10:00
David Gibson 9d4c0f4f0a spapr: Consolidate DRC state variables
Each DRC has three fields describing its state: isolation_state,
allocation_state and configured.  At first this seems like a reasonable
representation, since its based directly on the PAPR defined
isolation-state and allocation-state indicators.  However:
  * Only a few combinations of the two fields' values are permitted
  * allocation_state isn't used at all for physical DRCs
  * The indicators are write only so they don't really have a well
    defined current value independent of each other

This replaces these variables with a single state variable, whose names
and numbers are based on the diagram in LoPAPR section 13.4.  Along with
this we add code to check the current state on various operations and make
sure the requested transition is permitted.

Strictly speaking, this makes guest visible changes to behaviour (since we
probably allowed some transitions we shouldn't have before).  However, a
hypothetical guest broken by that wasn't PAPR compliant, and probably
wouldn't have worked under PowerVM.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
2017-07-17 15:07:05 +10:00
David Gibson f1c52354e5 spapr: Cleanups relating to DRC awaiting_release field
'awaiting_release' indicates that the host has requested an unplug of the
device attached to the DRC, but the guest has not (yet) put the device
into a state where it is safe to complete removal.

1. Rename it to 'unplug_requested' which to me at least is clearer

2. Remove the ->release_pending() method used to check this from outside
spapr_drc.c.  The method only plausibly has one implementation, so use
a plain function (spapr_drc_unplug_requested()) instead.

3. Remove it from the migration stream.  Attempting to migrate mid-unplug
is broken not just for spapr - in general management has no good way to
determine if the device should be present on the destination or not.  So,
until that's fixed, there's no point adding extra things to the stream.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
2017-07-17 15:07:05 +10:00
David Gibson a8dc47fd82 spapr: Refactor spapr_drc_detach()
This function has two unused parameters - remove them.

It also sets awaiting_release on all paths, except one.  On that path
setting it is harmless, since it will be immediately cleared by
spapr_drc_release().  So factor it out of the if statements.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
2017-07-17 15:07:05 +10:00
David Gibson 82a93a1d30 spapr: Remove 'awaiting_allocation' DRC flag
The awaiting_allocation flag in the DRC was introduced by aab9913
"spapr_drc: Prevent detach racing against attach for CPU DR", allegedly to
prevent a guest crash on racing attach and detach.  Except.. information
from the BZ actually suggests a qemu crash, not a guest crash.  And there
shouldn't be a problem here anyway: if the guest has already moved the DRC
away from UNUSABLE state, the detach would already be deferred, and if it
hadn't it should be safe to detach it (the guest should fail gracefully
when it attempts to change the allocation state).

I think this was probably just a bandaid for some other problem in the
state management.  So, remove awaiting_allocation and associated code.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Greg Kurz <groug@kaod.org>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
2017-07-17 15:07:05 +10:00
Laurent Vivier 94fd9cbaa3 spapr: Treat devices added before inbound migration as coldplugged
When migrating a guest which has already had devices hotplugged,
libvirt typically starts the destination qemu with -incoming defer,
adds those hotplugged devices with qmp, then initiates the incoming
migration.

This causes problems for the management of spapr DRC state.  Because
the device is treated as hotplugged, it goes into a DRC state for a
device immediately after it's plugged, but before the guest has
acknowledged its presence.  However, chances are the guest on the
source machine *has* acknowledged the device's presence and configured
it.

If the source has fully configured the device, then DRC state won't be
sent in the migration stream: for maximum migration compatibility with
earlier versions we don't migrate DRCs in coldplug-equivalent state.
That means that the DRC effectively changes state over the migrate,
causing problems later on.

In addition, logging hotplug events for these devices isn't what we
want because a) those events should already have been issued on the
source host and b) the event queue should get wiped out by the
incoming state anyway.

In short, what we really want is to treat devices added before an
incoming migration as if they were coldplugged.

To do this, we first add a spapr_drc_hotplugged() helper which
determines if the device is hotplugged in the sense relevant for DRC
state management.  We only send hotplug events when this is true.
Second, when we add a device which isn't hotplugged in this sense, we
force a reset of the DRC state - this ensures the DRC is in a
coldplug-equivalent state (there isn't usually a system reset between
these device adds and the incoming migration).

This is based on an earlier patch by Laurent Vivier, cleaned up and
extended.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
2017-07-17 15:07:05 +10:00
David Gibson 5341258e86 spapr: Minor cleanups to events handling
The rtas_error_log structure is marked packed, which strongly suggests its
precise layout is important to match an external interface.  Along with
that one could expect it to have a fixed endianness to match the same
interface.  That used to be the case - matching the layout of PAPR RTAS
event format and requiring BE fields.

Now, however, it's only used embedded within sPAPREventLogEntry with the
fields in native order, since they're processed internally.

Clear that up by removing the nested structure in sPAPREventLogEntry.
struct rtas_error_log is moved back to spapr_events.c where it is used as
a temporary to help convert the fields in sPAPREventLogEntry to the correct
in memory format when delivering an event to the guest.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-07-17 15:07:05 +10:00
Daniel Henrique Barboza fd38804b38 spapr: migrate pending_events of spapr state
In racing situations between hotplug events and migration operation,
a rtas hotplug event could have not yet be delivered to the source
guest when migration is started. In this case the pending_events of
spapr state need be transmitted to the target so that the hotplug
event can be finished on the target.

To achieve the minimal VMSD possible to migrate the pending_events list,
this patch makes the changes in spapr_events.c:

- 'log_type' of sPAPREventLogEntry struct deleted. This information can be
derived by inspecting the rtas_error_log summary field. A new function
called 'spapr_event_log_entry_type' was added to retrieve the type of
a given sPAPREventLogEntry.

- sPAPREventLogEntry, epow_log_full and hp_log_full were redesigned. The
only data we're going to migrate in the VMSD is the event log data itself,
which can be divided in two parts: a rtas_error_log header and an extended
event log field. The rtas_error_log header contains information about the
size of the extended log field, which can be used inside VMSD as the size
parameter of the VBUFFER_ALOC field that will store it. To allow this use,
the header.extended_length field must be exposed inline to the VMSD instead
of embedded into a 'data' field that holds everything. With this in mind,
the following changes were done:

    * a new 'header' field was added to sPAPREventLogEntry. This field holds a
a struct rtas_error_log inline.
    * the declaration of the 'rtas_error_log' struct was moved to spapr.h
to be visible to the VMSD macros.
    * 'data' field of sPAPREventLogEntry was renamed to 'extended_log' and
now holds only the contents of the extended event log.
   *  'struct rtas_error_log hdr' were taken away from both epow_log_full
and hp_log_full. This information is now available at the header field of
sPAPREventLogEntry.
   * epow_log_full and hp_log_full were renamed to epow_extended_log and
hp_extended_log respectively. This rename makes it clearer to understand
the new purpose of both structures: hold the information of an extended
event log field.
    * spapr_powerdown_req and spapr_hotplug_req_event now creates a
sPAPREventLogEntry structure that contains the full rtas log entry.
    * rtas_event_log_queue and rtas_event_log_dequeue now receives a
sPAPREventLogEntry pointer as a parameter instead of a void pointer.

- the endianess of the sPAPREventLogEntry header is now native instead
of be32. We can use the fields in native endianess internally and write
them in be32 in the guest physical memory inside 'check_exception'. This
allows the VMSD inside spapr.c to read the correct size of the
entended_log field.

- inside spapr.c, pending_events is put in a subsection in the spapr state
VMSD to make sure migration across different versions is not broken.

A small change in rtas_event_log_queue and rtas_event_log_dequeue were also
made: instead of calling qdev_get_machine(), both functions now receive
a pointer to the sPAPRMachineState. This pointer is already available in
the callers of these functions and we don't need to waste resources
calling qdev() again.

Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-07-17 15:07:05 +10:00
Alexey Kardashevskiy 1221a47467 memory/iommu: introduce IOMMUMemoryRegionClass
This finishes QOM'fication of IOMMUMemoryRegion by introducing
a IOMMUMemoryRegionClass. This also provides a fastpath analog for
IOMMU_MEMORY_REGION_GET_CLASS().

This makes IOMMUMemoryRegion an abstract class.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170711035620.4232-3-aik@ozlabs.ru>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-07-14 12:04:41 +02:00
Alexey Kardashevskiy 3df9d74806 memory/iommu: QOM'fy IOMMU MemoryRegion
This defines new QOM object - IOMMUMemoryRegion - with MemoryRegion
as a parent.

This moves IOMMU-related fields from MR to IOMMU MR. However to avoid
dymanic QOM casting in fast path (address_space_translate, etc),
this adds an @is_iommu boolean flag to MR and provides new helper to
do simple cast to IOMMU MR - memory_region_get_iommu. The flag
is set in the instance init callback. This defines
memory_region_is_iommu as memory_region_get_iommu()!=NULL.

This switches MemoryRegion to IOMMUMemoryRegion in most places except
the ones where MemoryRegion may be an alias.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20170711035620.4232-2-aik@ozlabs.ru>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-07-14 12:04:41 +02:00
Cédric Le Goater f2b14e3a9f spapr: introduce the XIVE_EXPLOIT option in CAS
On POWER9, the Client Architecture Support (CAS) negotiation process
determines whether the guest operates in XIVE Legacy compatibility
(the former POWER8 interrupt model) or in XIVE exploitation mode (the
newer POWER9 interrupt model).

Bit 7 of Byte 23 of vector 5 is used for this purpose.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-07-11 11:04:02 +10:00
David Gibson 5c1da81215 spapr: Remove unnecessary differences between hotplug and coldplug paths
spapr_drc_attach() has a 'coldplug' parameter which sets the DRC into
configured state initially, instead of the usual ISOLATED/UNUSABLE state.
It turns out this is unnecessary: although coldplugged devices do need to
be in CONFIGURED state once the guest starts, that will already be
accomplished by the reset code which will move DRCs for already plugged
devices into a coldplug equivalent state.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
2017-07-11 11:04:01 +10:00
David Gibson 6b762f29a8 spapr: Add DRC release method
At the moment, spapr_drc_release() has an ugly switch on the DRC type to
call the right, device-specific release function.  This cleans it up by
doing that via a proper QOM method.

It's still arguably an abstraction violation for the DRC code to call into
the specific device code, but one mess at a time.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
2017-07-11 11:04:01 +10:00
Greg Kurz 498cd99544 spapr: refresh "platform-specific" hcalls comment
We have more of these since the addition of KVMPPC_H_LOGICAL_MEMOP in 2012.

Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-07-11 11:04:01 +10:00
Greg Kurz 04d0ffbd52 spapr: make spapr_populate_hotplug_cpu_dt() static
Since commit ff9006ddbf ("spapr: move spapr_core_[foo]plug() callbacks
close to machine code in spapr.c"), this function doesn't need to be extern
anymore.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-07-11 11:04:01 +10:00
David Gibson 0dfabd39d5 spapr: Clean up DRC set_isolation_state() path
There are substantial differences in the various paths through
set_isolation_state(), both for setting to ISOLATED versus UNISOLATED
state and for logical versus physical DRCs.

So, split the set_isolation_state() method into isolate() and unisolate()
methods, and give it different implementations for the two DRC types.

Factor some minimal common checks, including for valid indicator values
(which we weren't previously checking) into rtas_set_isolation_state().

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-06-30 14:03:32 +10:00
David Gibson 617367321e spapr: Clean up DRC set_allocation_state path
The allocation-state indicator should only actually be implemented for
"logical" DRCs, not physical ones.  Factor a check for this, and also for
valid indicator state values into rtas_set_allocation_state().  Because
they don't exist for physical DRCs, there's no reason that we'd ever want
more than one method implementation, so it can just be a plain function.

In addition, the setting to USABLE and setting to UNUSABLE paths in
set_allocation_state() don't actually have much in common.  So, split the
method separate functions for each parameter value (drc_set_usable()
and drc_set_unusable()).

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-06-30 14:03:32 +10:00
David Gibson 307b7715d0 spapr: Eliminate DRC 'signalled' state variable
The 'signalled' field in the DRC appears to be entirely a torturous
workaround for the fact that PCI devices were started in UNISOLATED state
for unclear reasons.

1) 'signalled' is already meaningless for logical (so far, all non PCI)
DRCs.  It's always set to true (at least at any point it might be tested),
and can't be assigned any real meaning due to the way signalling works for
logical DRCs.

2) For PCI DRCs, the only time signalled would be false is when non-zero
functions of a multifunction device are hotplugged, followed by function
zero (the other way around is explicitly not permitted). In that case the
secondary function DRCs are attached, but the notification isn't sent to
the guest until function 0 is plugged.

3) signalled being false is used to allow a DRC detach to switch mode
back to ISOLATED state, which allows a secondary function to be hotplugged
then unplugged with function 0 never inserted.  Without this a secondary
function starting in UNISOLATED state couldn't be detached again without
function 0 being inserted, all the functions configured by the guest, then
sent back to ISOLATED state.

4) But now that PCI DRCs start in ISOLATED state, there's nothing to be
done.  If the guest doesn't get the notification, it won't switch the
device to UNISOLATED state, so nothing prevents it from being unplugged.
If the guest does move it to UNISOLATED state without the signal (due to
a manual drmgr call, for instance) then it really isn't safe to unplug it.

So, this patch removes the signalled variable and all code related to it.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-06-30 14:03:32 +10:00
Greg Kurz 46f7afa370 spapr: fix migration of ICPState objects from/to older QEMU
Commit 5bc8d26de2 ("spapr: allocate the ICPState object from under
sPAPRCPUCore") moved ICPState objects from the machine to CPU cores.
This is an improvement since we no longer allocate ICPState objects
that will never be used. But it has the side-effect of breaking
migration of older machine types from older QEMU versions.

This patch allows spapr to register dummy "icp/server" entries to vmstate.
These entries use a dedicated VMStateDescription that can swallow and
discard state of an incoming migration stream, and that don't send anything
on outgoing migration.

As for real ICPState objects, the instance_id is the cpu_index of the
corresponding vCPU, which happens to be equal to the generated instance_id
of older machine types.

The machine can unregister/register these entries when CPUs are dynamically
plugged/unplugged.

This is only available for pseries-2.9 and older machines, thanks to a
compat property.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-06-30 14:03:31 +10:00
David Gibson 7843c0d60d pseries: Move CPU compatibility property to machine
Server class POWER CPUs have a "compat" property, which is used to set the
backwards compatibility mode for the processor.  However, this only makes
sense for machine types which don't give the guest access to hypervisor
privilege - otherwise the compatibility level is under the guest's control.

To reflect this, this removes the CPU 'compat' property and instead
creates a 'max-cpu-compat' property on the pseries machine.  Strictly
speaking this breaks compatibility, but AFAIK the 'compat' option was
never (directly) used with -device or device_add.

The option was used with -cpu.  So, to maintain compatibility, this
patch adds a hack to the cpu option parsing to strip out any compat
options supplied with -cpu and set them on the machine property
instead of the now deprecated cpu property.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Tested-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Greg Kurz <groug@kaod.org>
Tested-by: Andrea Bolognani <abologna@redhat.com>
2017-06-30 14:03:31 +10:00
Laurent Vivier 593080936a Revert "spapr: fix memory hot-unplugging"
This reverts commit fe6824d126.

Conflicts hw/ppc/spapr_drc.c, because get_index() has been renamed
spapr_get_index().

This didn't fix the problem. Once the hotplug has been started
some memory is allocated and some structures are allocated.
We don't free it when we ignore the unplug, and we can't because
they can be in use by the kernel.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-06-09 12:35:46 +10:00
Greg Kurz b1fd36c363 xics: drop ICPStateClass::cpu_setup() handler
The cpu_setup() handler is only implemented by xics_kvm, where it really
does a typical "realize" job. Moreover, the realize() handler is called
shortly after cpu_setup(), on the same path.

This patch converts xics_kvm to implement realize() instead of cpu_setup().

Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-06-09 12:17:59 +10:00
Greg Kurz 9ed656631d xics: setup cpu at realize time
Until recently, spapr used to allocate ICPState objects for the lifetime
of the machine. They would only be associated to vCPUs in xics_cpu_setup()
when plugging a CPU core.

Now that ICPState objects have the same lifecycle as vCPUs, it is
possible to associate them during realization.

This patch hence open-codes xics_cpu_setup() in icp_realize(). The vCPU
is passed as a property. Note that vCPU now needs to be realized first
for the IRQs to be allocated. It also needs to resetted before ICPState
realization in order to synchronize with KVM.

Since ICPState objects are freed when unrealized, xics_cpu_destroy() isn't
needed anymore and can be safely dropped.

Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-06-09 12:15:57 +10:00
Greg Kurz 100f738850 xics: pass appropriate types to realize() handlers.
It makes more sense to pass an IPCState * to handlers of ICPStateClass
instead of a DeviceState *, if only to benefit from compile time type
checking. The same goes with ICSStateClass.

While here, we also change the declaration of ICPStateClass in xics.h
for consistency.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-06-09 12:12:34 +10:00
Greg Kurz ad265631c0 xics: introduce macros for ICP/ICS link properties
These properties are part of the XICS API. They deserve to appear
explicitely in the XICS header file.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-06-09 12:12:34 +10:00
Greg Kurz a4d4edce7a xics: add reset() handler to ICPStateClass
Taking into account that qemu_set_irq() returns immediatly if its first
argument is NULL, icp_kvm_reset() largely duplicates icp_reset().

This patch introduces a reset() handler, so that the common logic can
be implemented in icp_reset() only.

While there we can also drop icp_kvm_realize() and icp_kvm_unrealize(). This
causes icp-kvm to be realized in icp_realize(), which sets icp->xics, but
it has no impact.

Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-06-08 14:38:27 +10:00
David Gibson 7980833619 spapr: Rework DRC name handling
DRC objects have a get_name method which returns the DRC name generated
when the DRC is created.  Replace that with a fixed spapr_drc_name()
function which generates the name on the fly from other information.  This
means:
  * We get rid of a method with only one implementation, and only local
    callers
  * We don't have to carry the name string around for the lifetime of the
    DRC
  * We use information added to the class structure to generate the name
    in standard format, so we don't need an explicit switch on drc type
    any more

We also eliminate the 'name' property; it's basically useless since the
only information in it can easily be deduced from other things.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-06-08 14:38:27 +10:00
David Gibson 0be4e88621 spapr: Change DRC attach & detach methods to functions
DRC objects have attach & detach methods, but there's only one
implementation.  Although there are some differences in its behaviour for
different DRC types, the overall structure is the same, so while we might
want different method implementations for some parts, we're unlikely to
want them for the top-level functions.

So, replace them with direct function calls.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-06-08 14:38:26 +10:00
David Gibson cd74d27e42 spapr: Clean up handling of DR-indicator
There are 3 types of "indicator" associated with hotplug in the PAPR spec
the "allocation state", "isolation state" and "DR-indicator".  The first
two are intimately tied to the various state transitions associated with
hotplug.  The DR-indicator, however, is different and simpler.

It's basically just a guest controlled variable which can be used by the
guest to flag state or problems associated with a device.  The idea is that
the hypervisor can use it to present information back on management
consoles (on some machines with PowerVM it may even control physical LEDs
on the machine case associated with the relevant device).

For that reason, there's only ever likely to be a single update
implementation so the set_indicator_state method isn't useful.  Replace it
with a direct function call.

While we're there, make some small associated cleanups:
  * PAPR doesn't use the term "indicator state", just "DR-indicator" and
the allocation state and isolation state are also considered "indicators".
Rename things to be less confusing
  * Fold set_indicator_state() and rtas_set_indicator_state() into a single
rtas_set_dr_indicator() function.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-06-08 14:38:26 +10:00
David Gibson f224d35be9 spapr: Clean up DR entity sense handling
DRC classes have an entity_sense method to determine (in a specific PAPR
sense) the presence or absence of a device plugged into a DRC.  However,
we only have one implementation of the method, which explicitly tests for
different DRC types.  This changes it to instead have different method
implementations for the two cases: "logical" and "physical" DRCs.

While we're at it, the entity sense method always returns RTAS_OUT_SUCCESS,
and the interesting value is returned via pass-by-reference.  Simplify this
to directly return the value we care about

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-06-08 14:38:26 +10:00
David Gibson 1693ea1685 spapr: Eliminate spapr_drc_get_type_str()
This function was used in generating the device tree.  However, now that
we have different QOM types for different DRC types we can easily store
the information we need in the class structure and avoid this specialized
lookup function.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-06-06 09:24:21 +10:00
David Gibson b8fdd530be spapr: Move configure-connector state into DRC
Currently the sPAPRMachineState contains a list of sPAPRConfigureConnector
structures which store intermediate state for the ibm,configure-connector
RTAS call.

This was an attempt to separate this state from the core of the DRC state.
However the configure connector process is intimately tied to the DRC
model, so there's really no point trying to have two levels of interface
here.

Moving the configure-connector state into its corresponding DRC allows
removal of a number of helpers for maintaining the anciliary list.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-06-06 09:24:17 +10:00
David Gibson fbf5539718 spapr: Clean up spapr_dr_connector_by_*()
* Change names to something less ludicrously verbose
 * Now that we have QOM subclasses for the different DRC types, use a QOM
   typename instead of a PAPR type value parameter

The latter allows removal of the get_type_shift() helper.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-06-06 09:24:08 +10:00
David Gibson 2d33581899 spapr: Introduce DRC subclasses
Currently we only have a single QOM type for all DRCs, but lots of
places where we switch behaviour based on the DRC's PAPR defined type.
This is a poor use of our existing type system.

So, instead create QOM subclasses for each PAPR defined DRC type.  We
also introduce intermediate subclasses for physical and logical DRCs,
a division which will be useful later on.

Instead of being stored in the DRC object itself, the PAPR type is now
stored in the class structure.  There are still many places where we
switch directly on the PAPR type value, but this at least provides the
basis to start to remove those.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-06-06 09:23:46 +10:00
David Gibson 0b55aa91c9 spapr: Make DRC get_index and get_type methods into plain functions
These two methods only have one implementation, and the spec they're
implementing means any other implementation is unlikely, verging on
impossible.

So replace them with simple functions.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
2017-06-06 08:53:24 +10:00
David Gibson 4f65ce00ab spapr: Abolish DRC set_configured method
DRConnectorClass has a set_configured method, however:
  * There is only one implementation, and only ever likely to be one
  * There's exactly one caller, and that's (now) local
  * The implementation is very straightforward

So abolish the method entirely, and just open-code what we need.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
2017-06-06 08:53:24 +10:00
David Gibson 88af6ea568 spapr: Abolish DRC get_fdt method
The DRConnectorClass includes a get_fdt method.  However
  * There's only one implementation, and there's only likely to ever be one
  * Both callers are local to spapr_drc
  * Each caller only uses one half of the actual implementation

So abolish get_fdt() entirely, and just open-code what we need.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
2017-06-06 08:53:24 +10:00
Daniel Henrique Barboza 318347234d hw/ppc: removing drc->detach_cb and drc->detach_cb_opaque
The pointer drc->detach_cb is being used as a way of informing
the detach() function inside spapr_drc.c which cb to execute. This
information can also be retrieved simply by checking drc->type and
choosing the right callback based on it. In this context, detach_cb
is redundant information that must be managed.

After the previous spapr_lmb_release change, no detach_cb_opaques
are being used by any of the three callbacks functions. This is
yet another information that is now unused and, on top of that, can't
be migrated either.

This patch makes the following changes:

- removal of detach_cb_opaque. the 'opaque' argument was removed from
the callbacks and from the detach() function of sPAPRConnectorClass. The
attribute detach_cb_opaque of sPAPRConnector was removed.

- removal of detach_cb from the detach() call. The function pointer
detach_cb of sPAPRConnector was removed. detach() now uses a
switch(drc->type) to execute the apropriate callback. To achieve this,
spapr_core_release, spapr_lmb_release and spapr_phb_remove_pci_device_cb
callbacks were made public to be visible inside detach().

Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-25 11:31:33 +10:00
David Gibson 0cffce56ae hw/ppc/spapr.c: adding pending_dimm_unplugs to sPAPRMachineState
The LMB DRC release callback, spapr_lmb_release(), uses an opaque
parameter, a sPAPRDIMMState struct that stores the current LMBs that
are allocated to a DIMM (nr_lmbs). After each call to this callback,
the nr_lmbs is decremented by one and, when it reaches zero, the callback
proceeds with the qdev calls to hot unplug the LMB.

Using drc->detach_cb_opaque is problematic because it can't be migrated in
the future DRC migration work. This patch makes the following changes to
eliminate the usage of this opaque callback inside spapr_lmb_release:

- sPAPRDIMMState was moved from spapr.c and added to spapr.h. A new
attribute called 'addr' was added to it. This is used as an unique
identifier to associate a sPAPRDIMMState to a PCDIMM element.

- sPAPRMachineState now hosts a new QTAILQ called 'pending_dimm_unplugs'.
This queue of sPAPRDIMMState elements will store the DIMM state of DIMMs
that are currently going under an unplug process.

- spapr_lmb_release() will now retrieve the nr_lmbs value by getting the
correspondent sPAPRDIMMState. A helper function called spapr_dimm_get_address
was created to fetch the address of a PCDIMM device inside spapr_lmb_release.
When nr_lmbs reaches zero and the callback proceeds with the qdev hot unplug
calls, the sPAPRDIMMState struct is removed from spapr->pending_dimm_unplugs.

After these changes, the opaque argument for spapr_lmb_release is now
unused and is passed as NULL inside spapr_del_lmbs. This and the other
opaque arguments can now be safely removed from the code.

As an additional cleanup made by this patch, the spapr_del_lmbs function
was merged with spapr_memory_unplug_request. The former was being called
only by the latter and both were small enough to fit one single function.

Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
[dwg: Minor stylistic cleanups]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-25 11:31:28 +10:00
Daniel Henrique Barboza bff3063837 hw/ppc/spapr_events.c: removing 'exception' from sPAPREventLogEntry
Currenty we do not have any RTAS event that is reported by the
event-scan interface. The existing events, RTAS_LOG_TYPE_EPOW and
RTAS_LOG_TYPE_HOTPLUG, are being reported by the check-exception
interface and, as such, marked as 'exception=true'.

Commit 79853e18d9, 'spapr_events: event-scan RTAS interface', added
the event_scan interface because the guest kernel requires it to
initialize other required interfaces. It is acting since then as
a stub because no events that would be reported by it were added
since then. However, the existence of the 'exception' boolean adds
an unnecessary load in the future migration of the pending_events,
sPAPREventLogEntry QTAILQ that hosts the pending RTAS events.

To make the code cleaner and ease the future migration changes, this
patch makes the following changes:

- remove the 'exception' boolean that filter these events. There is
nothing to filter since all events are reported by check-exception;

- functions rtas_event_log_queue, rtas_event_log_dequeue and
rtas_event_log_contains don't receive the 'exception' boolean
as parameter;

- event_scan function was simplified. It was calling
'rtas_event_log_dequeue(mask, false)' that was always returning
'NULL' because we have no events that are created with
exception=false, thus in the end it would execute a jump to
'out_no_events' all the time. The function now assumes that
this will always be the case and all the remaining logic were
deleted.

In the future, when or if we add new RTAS events that should
be reported with the event_scan interface, we can refer to
the changes made in this patch to add the event_scan logic
back.

Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-24 11:39:53 +10:00
Greg Kurz de86eccc0c xics_kvm: cache already enabled vCPU ids
Since commit a45863bda9 ("xics_kvm: Don't enable KVM_CAP_IRQ_XICS if
already enabled"), we were able to re-hotplug a vCPU that had been hot-
unplugged ealier, thanks to a boolean flag in ICPState that we set when
enabling KVM_CAP_IRQ_XICS.

This could work because the lifecycle of all ICPState objects was the
same as the machine. Commit 5bc8d26de2 ("spapr: allocate the ICPState
object from under sPAPRCPUCore") broke this assumption and now we always
pass a freshly allocated ICPState object (ie, with the flag unset) to
icp_kvm_cpu_setup().

This cause re-hotplug to fail with:

Unable to connect CPU8 to kernel XICS: Device or resource busy

Let's fix this by caching all the vCPU ids for which KVM_CAP_IRQ_XICS was
enabled. This also drops the now useless boolean flag from ICPState.

Reported-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
Tested-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-24 11:39:52 +10:00
Bharata B Rao 06ec79e865 spapr: Consolidate HPT freeing code into a routine
Consolidate the code that frees HPT into a separate routine
spapr_free_hpt() as the same chunk of code is called from two places.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-24 11:39:52 +10:00
Greg Kurz f63ebfe0ac ppc/xics: simplify prototype of xics_spapr_init()
This function only does hypercall and RTAS-call registration, and thus
never returns an error. This patch adapt the prototype to reflect that.

Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-24 11:39:52 +10:00
Stefan Hajnoczi ba9915e1f8 x86 and machine queue, 2017-05-11
Highlights:
 * New "-numa cpu" option
 * NUMA distance configuration
 * migration/i386 vmstatification
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJZFLh3AAoJECgHk2+YTcWmppQQAJe9Y5a3VwHXqvHbwBHX2ysn
 RZDAUPd9DWpbM+UUydyKOVIZ7u5RXbbVq4E0NeCD8VYYd+grZB5Wo1cAzy3b4U2j
 2s+MDqaPMtZtGoqxTsyQOVoVxazT5Kf1zglK+iUEzik44J7LGdro+ty2Z7Ut2c11
 q9rE/GNS78czBm7c4lxgkxXW4N95K/tEGlLtDQ7uct//3U/ZimF+mO6GcbVFlOWT
 4iEbOz2sqvBVv22nLJRufiPgFNIW4hizAz5KBWxwGFCCKvT3N6yYNKKjzEpCw+jE
 lpjIRODU02yIZZZY841fLRtyrk7p4zORS8jRaHTdEJgb5bGc/YazxxVL8nzRQT1W
 VxFwAMd+UNrDkV24hpN++Ln2O+b3kwcGZ7uA/qu9d5WvSYUKXlHqcMJ35q6zuhAI
 /ecfYO7EZfVP86VjIt5IH04iV8RChA9Q6de+kQEFa6wHUxufeCOwCFqukGo8zj07
 plX8NcjnzYmSXKnYjHOHao4rKT+DiJhRB60rFiMeKP/qvKbZPjtgsIeonhHm53qZ
 /QwkhowahHKkpAnetIl0QHm8KS4YudAofMi/Fl+he4gRkEbSQVAo6iQb2L4cjcLC
 LNSDDsIVWGem4gCR+vcsFqB3lggRDfltHXm15JKh92UMpOr6RI6s8pD55T7EdnPC
 CfdxWB5kYM6/lLbOHj94
 =48wH
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'ehabkost/tags/x86-and-machine-pull-request' into staging

x86 and machine queue, 2017-05-11

Highlights:
* New "-numa cpu" option
* NUMA distance configuration
* migration/i386 vmstatification

# gpg: Signature made Thu 11 May 2017 08:16:07 PM BST
# gpg:                using RSA key 0x2807936F984DC5A6
# gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>"
# gpg: Note: This key has expired!
# Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF  D1AA 2807 936F 984D C5A6

* ehabkost/tags/x86-and-machine-pull-request: (29 commits)
  migration/i386: Remove support for pre-0.12 formats
  vmstatification: i386 FPReg
  migration/i386: Remove old non-softfloat 64bit FP support
  tests: check -numa node,cpu=props_list usecase
  numa: add '-numa cpu,...' option for property based node mapping
  numa: remove node_cpu bitmaps as they are no longer used
  numa: use possible_cpus for not mapped CPUs check
  machine: call machine init from wrapper
  numa: remove no longer need numa_post_machine_init()
  tests: numa: add case for QMP command query-cpus
  QMP: include CpuInstanceProperties into query_cpus output output
  virt-arm: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu()
  spapr: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu()
  pc: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu()
  numa: do default mapping based on possible_cpus instead of node_cpu bitmaps
  numa: mirror cpu to node mapping in MachineState::possible_cpus
  numa: add check that board supports cpu_index to node mapping
  virt-arm: add node-id property to CPU
  pc: add node-id property to CPU
  spapr: add node-id property to sPAPR core
  ...

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-05-15 14:12:03 +01:00
Igor Mammedov 0b8497f08c spapr: add node-id property to sPAPR core
it will allow switching from cpu_index to core based numa
mapping in follow up patches.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <1494415802-227633-3-git-send-email-imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11 16:08:48 -03:00
David Gibson eaf87a3976 pnv: Fix build failures on some host platforms
This makes some changes to fix build failures on the 'min-glib' docker
image, and maybe other platforms with a buildchain that's less tolerant
about duplicated typedefs.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-11 09:45:15 +10:00
Paolo Bonzini 5c6b487d67 ppc: xics: fix compilation with CentOS 6
The PowerPCCPU typedef is included twice if a file includes
both hw/ppc/xics.h and target/ppc/cpu-qom.h.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-11 09:45:15 +10:00
Sam Bobroff 229e16fd24 ppc/xics: preserve P and Q bits for KVM IRQs
Kernel commit 17d48610ae0f ("KVM: PPC: Book 3S: XICS: Implement ICS
P/Q states") added new bits to the state used by KVM IRQs. Currently,
QEMU does not preserve these bits, so migrating (or otherwise saving
and restoring) the guest state causes the P and Q bits to be cleared.

Clearing the P bit has no effect, because the kernel will set it based
on other data, but the loss of a set Q bit will cause a lost
interrupt.

This patch preserves the P and Q bits, correcting the problem.

Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-11 09:45:15 +10:00
Cédric Le Goater bce0b69159 ppc/pnv: generate an OEM SEL event on shutdown
OpenPOWER systems expect to be notified with such an event before a
shutdown or a reboot. An OEM SEL message is sent with specific
identifiers and a user data containing the request : OFF or REBOOT.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-04-26 12:41:56 +10:00
Cédric Le Goater aeaef83dab ppc/pnv: add initial IPMI sensors for the BMC simulator
Skiboot, the firmware for the PowerNV platform, expects the BMC to
provide some specific IPMI sensors. These sensors are exposed in the
device tree and their values are updated by the firmware at boot time.

Sensors of interest are :

	"FW Boot Progress"
	"Boot Count"

As such a device is defined on the command line, we can only detect
its presence at reset time.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-04-26 12:41:56 +10:00
Benjamin Herrenschmidt 4d1df88b63 ppc/pnv: Add support for POWER8+ LPC Controller
It adds the Naples chip which supports proper LPC interrupts via the
LPC controller rather than via an external CPLD.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[clg: - updated for qemu-2.9
      - ported on latest PowerNV patchset
      - moved the IRQ handler in pnv_lpc.c
      - introduced pnv_lpc_isa_irq_create() to create the ISA IRQs ]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-04-26 12:41:55 +10:00
Cédric Le Goater 71cd4dace9 spapr: remove the 'nr_servers' field from the machine
xics_system_init() does not need 'nr_servers' anymore as it is only
used to define the 'interrupt-controller' node in the device tree. So
let's just compute the value when calling spapr_dt_xics().

This also gives us an opportunity to simplify the xics_system_init()
routine and introduce a specific spapr_ics_create() helper to create
the sPAPR ICS object.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-04-26 12:41:55 +10:00
Benjamin Herrenschmidt 0722d05ad8 ppc/pnv: Add OCC model stub with interrupt support
The OCC is an on-chip microcontroller based on a ppc405 core used
for various power management tasks. It comes with a pile of additional
hardware sitting on the PIB (aka XSCOM bus). At this point we don't
emulate it (nor plan to do so). However there is one facility which
is provided by the surrounding hardware that we do need, which is the
interrupt generation facility. OPAL uses it to send itself interrupts
under some circumstances and there are other uses around the corner.

So this implement just enough to support this.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[clg: - updated for qemu-2.9
      - changed the XSCOM interface to fit new model
      - QOMified the model ]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-04-26 12:00:42 +10:00
Cédric Le Goater 54f59d786c ppc/pnv: Add cut down PSI bridge model and hookup external interrupt
The Processor Service Interface (PSI) Controller is one of the engines
of the "Bridge" unit which connects the different interfaces to the
Power Processor.

This adds just enough of the PSI bridge to handle various on-chip and
the one external interrupt. The rest of PSI has to do with the link to
the IBM FSP service processor which we don't plan to emulate (not used
on OpenPower machines).

The ics_get() and ics_resend() handlers of the XICSFabric interface of
the PowerNV machine are now defined to handle the Interrupt Control
Source of PSI. The InterruptStatsProvider interface is also modified
to dump the new ICS.

Originally from Benjamin Herrenschmidt <benh@kernel.crashing.org>

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-04-26 12:00:42 +10:00
Cédric Le Goater bf5615e77c ppc/pnv: add memory regions for the ICP registers
This provides to a PowerNV chip (POWER8) access to the Interrupt
Management area, which contains the registers of the Interrupt Control
Presenters of each thread. These are used to accept, return, forward
interrupts in the system.

This area is modeled with a per-chip container memory region holding
all the ICP registers. Each thread of a chip is then associated with
its ICP registers using a memory subregion indexed by its PIR number
in the overall region.

The device tree is populated accordingly.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-04-26 12:00:42 +10:00
Cédric Le Goater 5509db4aec ppc/pnv: add a helper to calculate MMIO addresses registers
Some controllers (ICP, PSI) have a base register address which is
calculated using the chip id.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-04-26 12:00:42 +10:00
Cédric Le Goater 99285aae16 ppc/pnv: add a PnvICPState object
This provides a new ICPState object for the PowerNV machine (POWER8).
Access to the Interrupt Management area is done though a memory
region. It contains the registers of the Interrupt Control Presenters
of each thread which are used to accept, return, forward interrupts in
the system.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-04-26 12:00:42 +10:00
Cédric Le Goater 439071a92d ppc/xics: add a realize() handler to ICPStateClass
It will be used by derived classes in PowerNV for customization.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-04-26 12:00:42 +10:00
Cédric Le Goater 5bc8d26de2 spapr: allocate the ICPState object from under sPAPRCPUCore
Today, all the ICPs are created before the CPUs, stored in an array
under the sPAPR machine and linked to the CPU when the core threads
are realized. This modeling brings some complexity when a lookup in
the array is required and it can be simplified by allocating the ICPs
when the CPUs are.

This is the purpose of this proposal which introduces a new 'icp_type'
field under the machine and creates the ICP objects of the right type
(KVM or not) before the PowerPCCPU object are.

This change allows more cleanups : the removal of the icps array under
the sPAPR machine and the removal of the xics_get_cpu_index_by_dt_id()
helper.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-04-26 12:00:42 +10:00
Cédric Le Goater ad5d1add86 ppc/xics: introduce an 'intc' backlink under PowerPCCPU
Today, the ICPState array of the sPAPR machine is indexed with
'cpu_index' of the CPUState. This numbering of CPUs is internal to
QEMU and the guest only knows about what is exposed in the device
tree, that is the 'cpu_dt_id'. This is why sPAPR uses the helper
xics_get_cpu_index_by_dt_id() to do the mapping in a couple of places.

To provide a more generic XICS layer, we need to abstract the IRQ
'server' number and remove any assumption made on its nature. It
should not be used as a 'cpu_index' for lookups like xics_cpu_setup()
and xics_cpu_destroy() do.

To reach that goal, we choose to introduce a generic 'intc' backlink
under PowerPCCPU, and let the machine core init routine do the
ICPState lookup. The resulting object is passed on to xics_cpu_setup()
which does the store under PowerPCCPU. The IRQ 'server' number in XICS
is now generic. sPAPR uses 'cpu_dt_id' and PowerNV will use 'PIR'
number.

This also has the benefit of simplifying the sPAPR hcall routines
which do not need to do any ICPState lookups anymore.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-04-26 12:00:42 +10:00
Sam Bobroff e957f6a9b9 spapr: Workaround for broken radix guests
For a little while around 4.9, Linux kernels that saw the radix bit in
ibm,pa-features would attempt to set up the MMU as if they were a
hypervisor, even if they were a guest, which would cause them to
crash.

Work around this by detecting pre-ISA 3.0 guests by their lack of that
bit in option vector 1, and then removing the radix bit from
ibm,pa-features. Note: This now requires regeneration of that node
after CAS negotiation.

Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
[dwg: Fix style nits]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-04-26 12:00:41 +10:00
Sam Bobroff 9fb4541f58 spapr: Enable ISA 3.0 MMU mode selection via CAS
Add the new node, /chosen/ibm,arch-vec-5-platform-support to the
device tree. This allows the guest to determine which modes are
supported by the hypervisor.

Update the option vector processing in h_client_architecture_support()
to handle the new MMU bits. This allows guests to request hash or
radix mode and QEMU to create the guest's HPT at this time if it is
necessary but hasn't yet been done.  QEMU will terminate the guest if
it requests an unavailable mode, as required by the architecture.

Extend the ibm,pa-features node with the new ISA 3.0 values
and set the radix bit if KVM supports radix mode. This probably won't
be used directly by guests to determine the availability of radix mode
(that is indicated by the new node added above) but the architecture
requires that it be set when the hardware supports it.

If QEMU is using KVM, and KVM is capable of running in radix mode,
guests can be run in real-mode without allocating a HPT (because KVM
will use a minimal RPT). So in this case, we avoid creating the HPT
at reset time and later (during CAS) create it if it is necessary.

ISA 3.0 guests will now begin to call h_register_process_table(),
which has been added previously.

Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
[dwg: Strip some unneeded prefix from error messages]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-04-26 12:00:41 +10:00
Suraj Jitindar Singh b4db54132f target/ppc: Implement H_REGISTER_PROCESS_TABLE H_CALL
The H_REGISTER_PROCESS_TABLE H_CALL is used by a guest to indicate to the
hypervisor where in memory its process table is and how translation should
be performed using this process table.

Provide the implementation of this H_CALL for a guest.

We first check for invalid flags, then parse the flags to determine the
operation, and then check the other parameters for valid values based on
the operation (register new table/deregister table/maintain registration).
The process table is then stored in the appropriate location and registered
with the hypervisor (if running under KVM), and the LPCR_[UPRT/GTSE] bits
are updated as required.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
[dwg: Correct missing prototype and uninitialized variable]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-04-26 12:00:41 +10:00
Suraj Jitindar Singh d77a98b015 target/ppc: Add new H-CALL shells for in memory table translation
The use of the new in memory tables introduced in ISAv3.00 for translation,
also referred to as process tables, requires the introduction of 3 new
H-CALLs; H_REGISTER_PROCESS_TABLE, H_CLEAN_SLB, and H_INVALIDATE_PID.

Add shells for each of these and register them as the hypercall handlers.
Currently they all log an unimplemented hypercall and return H_FUNCTION.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
[dwg: Fix style nits]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-04-26 12:00:41 +10:00
Cédric Le Goater 147ff8079e ppc/spapr: QOM'ify sPAPRRTCState
Also use an 'sPAPRRTCState' attribute under the sPAPR machine to hold
the RTC object. Overall, these changes remove an unnecessary and
implicit dependency on SysBus.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-04-26 12:00:41 +10:00
Laurent Vivier fe6824d126 spapr: fix memory hot-unplugging
If, once the kernel has booted, we try to remove a memory
hotplugged while the kernel was not started, QEMU crashes on
an assert:

    qemu-system-ppc64: hw/virtio/vhost.c:651:
                       vhost_commit: Assertion `r >= 0' failed.
    ...
    #4  in vhost_commit
    #5  in memory_region_transaction_commit
    #6  in pc_dimm_memory_unplug
    #7  in spapr_memory_unplug
    #8  spapr_machine_device_unplug
    #9  in hotplug_handler_unplug
    #10 in spapr_lmb_release
    #11 in detach
    #12 in set_allocation_state
    #13 in rtas_set_indicator
    ...

If we take a closer look to the guest kernel log, we can see when
we try to unplug the memory:

    pseries-hotplug-mem: Attempting to hot-add 4 LMB(s)

What happens:

    1- The kernel has ignored the memory hotplug event because
       it was not started when it was generated.

    2- When we hot-unplug the memory,
       QEMU starts to remove the memory,
            generates an hot-unplug event,
        and signals the kernel of the incoming new event

    3- as the kernel is started, on the QEMU signal, it reads
       the event list, decodes the hotplug event and tries to
       finish the hotplugging.

    4- QEMU receive the the hotplug notification while it
       is trying to hot-unplug the memory. This moves the memory
       DRC to an invalid state

This patch prevents this by not allowing to set the allocation
state to USABLE while the DRC is awaiting release.

RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1432382

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-29 11:35:16 +11:00
Peter Maydell 17783ac828 ppc patch queuye for 2017-03-03
This will probably be my last pull request before the hard freeze.  It
 has some new work, but that has all been posted in draft before the
 soft freeze, so I think it's reasonable to include in qemu-2.9.
 
 This batch has:
     * A substantial amount of POWER9 work
         * Implements the legacy (hash) MMU for POWER9
 	* Some more preliminaries for implementing the POWER9 radix
           MMU
 	* POWER9 has_work
 	* Basic POWER9 compatibility mode handling
 	* Removal of some premature tests
     * Some cleanups and fixes to the existing MMU code to make the
       POWER9 work simpler
     * A bugfix for TCG multiply adds on power
     * Allow pseries guests to access PCIe extended config space
 
 This also includes a code-motion not strictly in ppc code - moving
 getrampagesize() from ppc code to exec.c.  This will make some future
 VFIO improvements easier, Paolo said it was ok to merge via my tree.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJYuOEEAAoJEGw4ysog2bOSHD0P/jBg/qr/4KnsB1KhnlVrB2sP
 vy2d3bGGlUWr9Z+CK/PMCRB8ekFgQLjidLIXji6mviUocv6m3WsVrnbLF/oOL/IT
 NPMVAffw7q804YVu1Ns9R82d6CIqHTy//bpg69tFMcJmhL9fqPan3wTZZ9JeiyAm
 SikqkAHBSW4SxKqg8ApaSqx5L2QTqyfkClR0sLmgM0JtmfJrbobpQ6bMtdPjUZ9L
 n2gnpO2vaWCa1SEQrRrdELqvcD8PHkSJapWOBXOkpGWxoeov/PYxOgkpdDUW4qYY
 lVLtp1Vd3OB/h3Unqfw32DNiHA5p89hWPX5UybKMgRVL9Cv2/lyY47pcY8XTeNzn
 bv84YRbFJeI+GgoEnghmtq+IM8XiW/cr9rWm9wATKfKGcmmFauumALrsffUpHVCM
 4hSNgBv5t2V9ptZ+MDlM/Ku+zk9GoqwQ+hemdpVtiyhOtGUPGFBn5YLE4c2DHFxV
 +L9JtBnFn8obnssNoz0wL+QvZchT1qUHMhH5CWAanjw9CTDp/YwQ2P01zK+00s9d
 4cB7fUG3WNto5eXXEGMaXeDsUEz8z//hTe3j5sVbnHsXi0R3dhv7iryifmx4bUKU
 H9EwAc+uNUHbvBy7u6IWg0I8P2n00CCO6JqXijQ92zELJ5j0XhzHUI2dOXn+zyEo
 3FZu56LFnSSUBEXuTjq4
 =PcNw
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.9-20170303' into staging

ppc patch queuye for 2017-03-03

This will probably be my last pull request before the hard freeze.  It
has some new work, but that has all been posted in draft before the
soft freeze, so I think it's reasonable to include in qemu-2.9.

This batch has:
    * A substantial amount of POWER9 work
        * Implements the legacy (hash) MMU for POWER9
	* Some more preliminaries for implementing the POWER9 radix
          MMU
	* POWER9 has_work
	* Basic POWER9 compatibility mode handling
	* Removal of some premature tests
    * Some cleanups and fixes to the existing MMU code to make the
      POWER9 work simpler
    * A bugfix for TCG multiply adds on power
    * Allow pseries guests to access PCIe extended config space

This also includes a code-motion not strictly in ppc code - moving
getrampagesize() from ppc code to exec.c.  This will make some future
VFIO improvements easier, Paolo said it was ok to merge via my tree.

# gpg: Signature made Fri 03 Mar 2017 03:20:36 GMT
# gpg:                using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg:                 aka "David Gibson (kernel.org) <dwg@kernel.org>"
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-2.9-20170303:
  target/ppc: rewrite f[n]m[add,sub] using float64_muladd
  spapr: Small cleanup of PPC MMU enums
  spapr_pci: Advertise access to PCIe extended config space
  target/ppc: Rework hash mmu page fault code and add defines for clarity
  target/ppc: Move no-execute and guarded page checking into new function
  target/ppc: Add execute permission checking to access authority check
  target/ppc: Add Instruction Authority Mask Register Check
  hw/ppc/spapr: Add POWER9 to pseries cpu models
  target/ppc/POWER9: Add cpu_has_work function for POWER9
  target/ppc/POWER9: Add POWER9 pa-features definition
  target/ppc/POWER9: Add POWER9 mmu fault handler
  target/ppc: Don't gen an SDR1 on POWER9 and rework register creation
  target/ppc: Add patb_entry to sPAPRMachineState
  target/ppc/POWER9: Add POWERPC_MMU_V3 bit
  powernv: Don't test POWER9 CPU yet
  exec, kvm, target-ppc: Move getrampagesize() to common code
  target/ppc: Add POWER9/ISAv3.00 to compat_table

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-03-04 16:31:14 +00:00
Paolo Bonzini eeb61d4f82 ppc: avoid typedef redefinitions
These cause compilation failures on CentOS 6 or other operating
systems with older GCCs.

Cc: David Gibson <dgibson@redhat.com>
Cc: qemu-ppc@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1488558530-21016-3-git-send-email-pbonzini@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-03-04 15:14:34 +00:00
Suraj Jitindar Singh 9861bb3efd target/ppc: Add patb_entry to sPAPRMachineState
ISA v3.00 adds the idea of a partition table which is used to store the
address translation details for all partitions on the system. The partition
table consists of double word entries indexed by partition id where the second
double word contains the location of the process table in guest memory. The
process table is registered by the guest via a h-call.

We need somewhere to store the address of the process table so we add an entry
to the sPAPRMachineState struct called patb_entry to represent the second
doubleword of a single partition table entry corresponding to the current
guest. We need to store this value so we know if the guest is using radix or
hash translation and the location of the corresponding process table in guest
memory. Since we only have a single guest per qemu instance, we only need one
entry.

Since the partition table is technically a hypervisor resource we require that
access to it is abstracted by the virtual hypervisor through the get_patbe()
call. Currently the value of the entry is never set (and thus
defaults to 0 indicating hash), but it will be required to both implement
POWER9 kvm support and tcg radix support.

We also add this field to be migrated as part of the sPAPRMachineState as we
will need it on the receiving side as the guest will never tell us this
information again and we need it to perform translation.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-03 11:30:59 +11:00
Cédric Le Goater 6449da4545 ppc/xics: move InterruptStatsProvider to the sPAPR machine
It provides a better monitor output of the ICP and ICS objects, else
the objects are printed out of order.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-01 11:23:40 +11:00
Cédric Le Goater a7ff1212e9 ppc/xics: move ics-simple post_load under the machine
The ICS object uses a post_load() handler which is implicitly relying
on the fact that the internal state of the ICS and ICP objects has
been restored but this is not guaranteed. So, let's move the code
under the post_load() handler of the machine where we know the objects
have been fully restored.

The icp_resend() handler of the XICSFabric QOM interface is also
removed as it is now obsolete.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-01 11:23:40 +11:00
Cédric Le Goater e6f7e110ee ppc/xics: remove the XICSState classes
The XICSState classes are not used anymore. They have now been fully
deprecated by the XICSFabric QOM interface. Do the cleanups.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-01 11:23:40 +11:00
Cédric Le Goater 2192a9303d ppc/xics: export the XICS init routines
There is nothing left related to the XICS object in the realize
functions of the KVMXICSState and XICSState class. So adapt the
interfaces to call these routines directly from the sPAPR machine init
sequence.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-01 11:23:40 +11:00
Cédric Le Goater 852ad27e14 ppc/xics: move the ICP array under the sPAPR machine
This is the last step to remove the XICSState abstraction and have the
machine hold all the objects related to interrupts : ICSs and ICPs.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-01 11:23:40 +11:00
Cédric Le Goater b0ec31290c ppc/xics: simplify spapr_dt_xics() interface
spapr_dt_xics() only needs the number of servers to build the device
tree nodes. Let's change the routine interface to reflect that.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-01 11:23:39 +11:00
Cédric Le Goater b4f27d71e3 ppc/xics: use the QOM interface to grab an ICP
Also introduce a xics_icp_get() helper to simplify the changes.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-01 11:23:39 +11:00
Cédric Le Goater f023243432 ppc/xics: move the cpu_setup() handler under the ICPState class
The cpu_setup() handler is currently under the XICSState class but it
really belongs under ICPState as it is setting up an individual vCPU.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-01 11:23:39 +11:00
Cédric Le Goater bf50860d1b ppc/xics: simplify the cpu_setup() handler
The cpu_setup() handler currently takes a 'XICSState *' argument to
grab the kernel ICP file descriptor. This interface can be simplified
by using the 'xics' backlink of the ICP object.

This change is also required by subsequent patches which makes use of
the QOM interface for XICS.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-01 11:23:39 +11:00
Cédric Le Goater b2fc59aaf9 ppc/xics: extend the QOM interface to handle ICPs
Let's add two new handlers for ICPs. One is to get an ICP object from
a server number and a second is to resend the irqs when needed.

The icp_resend() handler is a temporary workaround needed by the
ics-simple post_load() handler. It will be removed when the post_load
portion can be done at the machine level.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-01 11:23:39 +11:00
Cédric Le Goater d114a66225 ppc/xics: remove the XICS list of ICS
This is not used anymore.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-01 11:23:39 +11:00
Cédric Le Goater be1fe35199 ppc/xics: remove xics_find_source()
It is not used anymore now that we have the QOM interface for XICS.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-01 11:23:39 +11:00
Cédric Le Goater 2cd908d0ad ppc/xics: use the QOM interface to resend irqs
Also change the ICPState 'xics' backlink to be a XICSFabric, this
removes the need of using qdev_get_machine() to get the QOM interface
in some of the routines.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-01 11:23:39 +11:00
Cédric Le Goater f7759e4331 ppc/xics: use the QOM interface to get irqs
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-01 11:23:39 +11:00
Cédric Le Goater 7844e12b28 ppc/xics: use the QOM interface under the sPAPR machine
Add 'ics_get' and 'ics_resend' handlers to the sPAPR machine. These
are relatively simple for a single ICS.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-01 11:23:39 +11:00
Cédric Le Goater 51b180051e ppc/xics: introduce a XICSFabric QOM interface to handle ICSs
This interface provides two simple handlers. One is to get an ICS
(Interrupt Source Controller) object from an irq number and a second
to resend the irqs when needed.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-01 11:23:39 +11:00
Cédric Le Goater 681bfaded6 ppc/xics: store the ICS object under the sPAPR machine
A list of ICS objects was introduced under the XICS object for the
PowerNV machine but, for the sPAPR machine, it brings extra complexity
as there is only a single ICS. To simplify the code, let's add the ICS
pointer under the sPAPR machine and try to reduce the use of this list
where possible.

Also, change the xics_spapr_*() routines to use an ICS object instead
of an XICSState and change their name to reflect that these are
specific to the sPAPR ICS object.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-01 11:23:39 +11:00
Cédric Le Goater 817bb6a446 ppc/xics: remove set_nr_servers() handler from XICSStateClass
Today, the ICP (Interrupt Controller Presenter) objects are created by
the 'nr_servers' property handler of the XICS object and a class
handler. They are realized in the XICS object realize routine.

Let's simplify the process by creating the ICP objects along with the
XICS object at the machine level.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-01 11:23:39 +11:00
Cédric Le Goater 4e4169f7a2 ppc/xics: remove set_nr_irqs() handler from XICSStateClass
Today, the ICS (Interrupt Controller Source) object is created and
realized by the init and realize routines of the XICS object, but some
of the parameters are only known at the machine level.

These parameters are passed from the sPAPR machine to the ICS object
in a rather convoluted way using property handlers and a class handler
of the XICS object. The number of irqs required to allocate the IRQ
state objects in the ICS realize routine is one of them.

Let's simplify the process by creating the ICS object along with the
XICS object at the machine level and link the ICS into the XICS list
of ICSs at this level also. In the sPAPR machine, there is only a
single ICS but that will change with the PowerNV machine.

Also, QOMify the creation of the objects and get rid of the
superfluous code.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-01 11:23:39 +11:00
David Gibson 738d5db824 xics: XICS should not be a SysBusDevice
Currently xics - the component of the IBM POWER interrupt controller
representing the overall interrupt fabric / architecture is
represented as a descendent of SysBusDevice.  However, this is not
really correct - the xics presents nothing in MMIO space so it should
be an "unattached" device in the current QOM model.

Since this device will always be created by the machine type, not created
specifically from the command line, and because it has no migrated state
it should be safe to move it around the device composition tree.

Therefore this patch changes it to a descendent of TYPE_DEVICE, and
makes it an unattached device.  So that its reset handler still gets
called correctly, we add a qdev_set_parent_bus() to attach it to
sysbus.  It's not really clear that's correct (instead of using
register_reset()) but it appears to a common technique.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[clg corrected problems with reset]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
[dwg folded together and updated commit message]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-01 11:23:39 +11:00
Igor Mammedov 535455fdee spapr: reuse machine->possible_cpus instead of cores[]
Replace SPAPR specific cores[] array with generic
machine->possible_cpus and store core objects there.
It makes cores bookkeeping similar to x86 cpus and
will allow to unify similar code.
It would allow to replace cpu_index based NUMA node
mapping with iproperty based one (for -device created
cores) since possible_cpus carries board defined
topology/layout.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-02-22 11:28:28 +11:00
Igor Mammedov ff9006ddbf spapr: move spapr_core_[foo]plug() callbacks close to machine code in spapr.c
spapr_core_pre_plug/spapr_core_plug/spapr_core_unplug() are managing
wiring CPU core into spapr machine state and not internal CPU core state.
So move them from spapr_cpu_core.c to spapr.c where other similar
(spapr_memory_[foo]plug()) callbacks are located, which also matches
x86 target practice.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-02-22 11:28:27 +11:00
Nicholas Piggin 1c7ad77e56 ppc/spapr: implement H_SIGNAL_SYS_RESET
The H_SIGNAL_SYS_RESET hcall allows a guest CPU to raise a system reset
exception on CPUs within the same guest -- all CPUs, all-but-self, or a
specific CPU (including self).

This has not made its way to a PAPR release yet, but we have an hcall
number assigned.

  H_SIGNAL_SYS_RESET = 0x380

  Syntax:
    hcall(uint64 H_SIGNAL_SYS_RESET, int64 target);

  Generate a system reset NMI on the threads indicated by target.

  Values for target:
    -1 = target all online threads including the caller
    -2 = target all online threads except for the caller
    All other negative values: reserved
    Positive values: The thread to be targeted, obtained from the value
    of the "ibm,ppc-interrupt-server#s" property of the CPU in the OF
    device tree.

  Semantics:
    - Invalid target: return H_Parameter.
    - Otherwise: Generate a system reset NMI on target thread(s),
      return H_Success.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-01-31 10:10:13 +11:00
David Gibson 5b120785e7 pseries: Make cpu_update during CAS unconditional
spapr_h_cas_compose_response() includes a cpu_update parameter which
controls whether it includes updated information on the CPUs in the device
tree fragment returned from the ibm,client-architecture-support (CAS) call.

Providing the updated information is essential when CAS has negotiated
compatibility options which require different cpu information to be
presented to the guest.  However, it should be safe to provide in other
cases (it will just override the existing data in the device tree with
identical data).  This simplifies the code by removing the parameter and
always providing the cpu update information.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2017-01-31 10:10:13 +11:00
David Gibson 0c86d0fd92 pseries: Always use core objects for CPU construction
Currently the pseries machine has two paths for constructing CPUs.  On
newer machine type versions, which support cpu hotplug, it constructs
cpu core objects, which in turn construct CPU threads.  For older machine
versions it individually constructs the CPU threads.

This division is going to make some future changes to the cpu construction
harder, so this patch unifies them.  Now cpu core objects are always
created.  This requires some updates to allow core objects to be created
without a full complement of threads (since older versions allowed a
number of cpus not a multiple of the threads-per-core).  Likewise it needs
some changes to the cpu core hot/cold plug path so as not to choke on the
old machine types without hotplug support.

For good measure, we move the cpu construction to its own subfunction,
spapr_init_cpus().

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
2017-01-31 10:10:13 +11:00
Marc-André Lureau 0ec7b3e7f2 char: rename CharDriverState Chardev
Pick a uniform chardev type name.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-27 18:07:59 +01:00
Thomas Huth fcf5ef2ab5 Move target-* CPU file into a target/ folder
We've currently got 18 architectures in QEMU, and thus 18 target-xxx
folders in the root folder of the QEMU source tree. More architectures
(e.g. RISC-V, AVR) are likely to be included soon, too, so the main
folder of the QEMU sources slowly gets quite overcrowded with the
target-xxx folders.
To disburden the main folder a little bit, let's move the target-xxx
folders into a dedicated target/ folder, so that target-xxx/ simply
becomes target/xxx/ instead.

Acked-by: Laurent Vivier <laurent@vivier.eu> [m68k part]
Acked-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de> [tricore part]
Acked-by: Michael Walle <michael@walle.cc> [lm32 part]
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> [s390x part]
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> [s390x part]
Acked-by: Eduardo Habkost <ehabkost@redhat.com> [i386 part]
Acked-by: Artyom Tarasenko <atar4qemu@gmail.com> [sparc part]
Acked-by: Richard Henderson <rth@twiddle.net> [alpha part]
Acked-by: Max Filippov <jcmvbkbc@gmail.com> [xtensa part]
Reviewed-by: David Gibson <david@gibson.dropbear.id.au> [ppc part]
Acked-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> [cris&microblaze part]
Acked-by: Guan Xuetao <gxt@mprc.pku.edu.cn> [unicore32 part]
Signed-off-by: Thomas Huth <thuth@redhat.com>
2016-12-20 21:52:12 +01:00
Michael Roth 62ef3760d4 spapr: migration support for CAS-negotiated option vectors
With the additional of the OV5_HP_EVT option vector, we now have
certain functionality (namely, memory unplug) that checks at run-time
for whether or not the guest negotiated the option via CAS. Because
we don't currently migrate these negotiated values, we are unable
to unplug memory from a guest after it's been migrated until after
the guest is rebooted and CAS-negotiation is repeated.

This patch fixes this by adding CAS-negotiated options to the
migration stream. We do this using a subsection, since the
negotiated value of OV5_HP_EVT is the only option currently needed
to maintain proper functionality for a running guest.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-11-23 12:00:48 +11:00
Cédric Le Goater ad521238b4 ppc/pnv: add a 'xscom_core_base' field to PnvChipClass
The XSCOM addresses for the core registers are encoded in a slightly
different way on POWER8 and POWER9.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-11-15 10:08:43 +11:00
Cédric Le Goater ec575aa0ae ppc/pnv: fix compile breakage on old gcc
PnvChip is defined twice and this can confuse old compilers :

  CC      ppc64-softmmu/hw/ppc/pnv_xscom.o
In file included from qemu.git/hw/ppc/pnv.c:29:
qemu.git/include/hw/ppc/pnv.h:60: error: redefinition of typedef ‘PnvChip’
qemu.git/include/hw/ppc/pnv_xscom.h:24: note: previous declaration of ‘PnvChip’ was here
make[1]: *** [hw/ppc/pnv.o] Error 1
make[1]: *** Waiting for unfinished jobs....

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-11-15 10:05:51 +11:00
Bharata B Rao afdbd40356 spapr: Add DRC count indexed hotplug identifier type
Add support for DRC count indexed hotplug ID type which is primarily
needed for memory hot unplug. This type allows for specifying the
number of DRs that should be plugged/unplugged starting from a given
DRC index.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
* updated rtas_event_log_v6_hp to reflect count/index field ordering
  used in PAPR hotplug ACR
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-28 11:17:35 +11:00
Michael Roth ffbb1705a3 spapr_events: add support for dedicated hotplug event source
Hotplug events were previously delivered using an EPOW interrupt
and were queued by linux guests into a circular buffer. For traditional
EPOW events like shutdown/resets, this isn't an issue, but for hotplug
events there are cases where this buffer can be exhausted, resulting
in the loss of hotplug events, resets, etc.

Newer-style hotplug event are delivered using a dedicated event source.
We enable this in supported guests by adding standard an additional
event source in the guest device-tree via /event-sources, and, if
the guest advertises support for the newer-style hotplug events,
using the corresponding interrupt to signal the available of
hotplug/unplug events.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-28 11:17:35 +11:00
Michael Roth 417ece33fc spapr: improve ibm,architecture-vec-5 property handling
ibm,architecture-vec-5 is supposed to encode all option vector 5 bits
negotiated between platform/guest. Currently we hardcode this property
in the boot-time device tree to advertise a single negotiated
capability, "Form 1" NUMA Affinity, regardless of whether or not CAS
has been invoked or that capability has actually been negotiated.

Improve this by generating ibm,architecture-vec-5 based on the full
set of option vector 5 capabilities negotiated via CAS.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-28 09:38:26 +11:00
Michael Roth 6787d27b04 spapr: add option vector handling in CAS-generated resets
In some cases, ibm,client-architecture-support calls can fail. This
could happen in the current code for situations where the modified
device tree segment exceeds the buffer size provided by the guest
via the call parameters. In these cases, QEMU will reset, allowing
an opportunity to regenerate the device tree from scratch via
boot-time handling. There are potentially other scenarios as well,
not currently reachable in the current code, but possible in theory,
such as cases where device-tree properties or nodes need to be removed.

We currently don't handle either of these properly for option vector
capabilities however. Instead of carrying the negotiated capability
beyond the reset and creating the boot-time device tree accordingly,
we start from scratch, generating the same boot-time device tree as we
did prior to the CAS-generated and the same device tree updates as we
did before. This could (in theory) cause us to get stuck in a reset
loop. This hasn't been observed, but depending on the extensiveness
of CAS-induced device tree updates in the future, could eventually
become an issue.

Address this by pulling capability-related device tree
updates resulting from CAS calls into a common routine,
spapr_dt_cas_updates(), and adding an sPAPROptionVector*
parameter that allows us to test for newly-negotiated capabilities.
We invoke it as follows:

1) When ibm,client-architecture-support gets called, we
   call spapr_dt_cas_updates() with the set of capabilities
   added since the previous call to ibm,client-architecture-support.
   For the initial boot, or a system reset generated by something
   other than the CAS call itself, this set will consist of *all*
   options supported both the platform and the guest. For calls
   to ibm,client-architecture-support immediately after a CAS-induced
   reset, we call spapr_dt_cas_updates() with only the set
   of capabilities added since the previous call, since the other
   capabilities will have already been addressed by the boot-time
   device-tree this time around. In the unlikely event that
   capabilities are *removed* since the previous CAS, we will
   generate a CAS-induced reset. In the unlikely event that we
   cannot fit the device-tree updates into the buffer provided
   by the guest, well generate a CAS-induced reset.

2) When a CAS update results in the need to reset the machine and
   include the updates in the boot-time device tree, we call the
   spapr_dt_cas_updates() using the full set of negotiated
   capabilities as part of the reset path. At initial boot, or after
   a reset generated by something other than the CAS call itself,
   this set will be empty, resulting in what should be the same
   boot-time device-tree as we generated prior to this patch. For
   CAS-induced reset, this routine will be called with the full set of
   capabilities negotiated by the platform/guest in the previous
   CAS call, which should result in CAS updates from previous call
   being accounted for in the initial boot-time device tree.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
[dwg: Changed an int -> bool conversion to be more explicit]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-28 09:38:26 +11:00
Michael Roth facdb8b63b spapr_hcall: use spapr_ovec_* interfaces for CAS options
Currently we access individual bytes of an option vector via
ldub_phys() to test for the presence of a particular capability
within that byte. Currently this is only done for the "dynamic
reconfiguration memory" capability bit. If that bit is present,
we pass a boolean value to spapr_h_cas_compose_response()
to generate a modified device tree segment with the additional
properties required to enable this functionality.

As more capability bits are added, will would need to modify the
code to add additional option vector accesses and extend the
param list for spapr_h_cas_compose_response() to include similar
boolean values for these parameters.

Avoid this by switching to spapr_ovec_* helpers so we can do all
the parsing in one shot and then test for these additional bits
within spapr_h_cas_compose_response() directly.

Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-28 09:38:26 +11:00
Michael Roth b20b7b7add spapr_ovec: initial implementation of option vector helpers
PAPR guests advertise their capabilities to the platform by passing
an ibm,architecture-vec structure via an
ibm,client-architecture-support hcall as described by LoPAPR v11,
B.6.2.3. during early boot.

Using this information, the platform enables the capabilities it
supports, then encodes a subset of those enabled capabilities (the
5th option vector of the ibm,architecture-vec structure passed to
ibm,client-architecture-support) into the guest device tree via
"/chosen/ibm,architecture-vec-5".

The logical format of these these option vectors is a bit-vector,
where individual bits are addressed/documented based on the byte-wise
offset from the beginning of the bit-vector, followed by the bit-wise
index starting from the byte-wise offset. Thus the bits of each of
these bytes are stored in reverse order. Additionally, the first
byte of each option vector is encodes the length of the option vector,
so byte offsets begin at 1, and bit offset at 0.

This is not very intuitive for the purposes of mapping these bits to
a particular documented capability, so this patch introduces a set
of abstractions that encapsulate the work of parsing/encoding these
options vectors and testing for individual capabilities.

Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
[dwg: Tweaked double-include protection to not trigger a checkpatch
 false positive]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-28 09:38:26 +11:00
David Gibson 398a0bd5ae pseries: Remove spapr_create_fdt_skel()
For historical reasons construction of the guest device tree in spapr is
divided between spapr_create_fdt_skel() which is called at init time, and
spapr_build_fdt() which runs at reset time.  Over time, more and more
things have needed to be moved to reset time.

Previous cleanups mean the only things left in spapr_create_fdt_skel() are
the properties of the root node itself.  Finish consolidating these two
parts of device tree construction, by moving this to the start of
spapr_build_fdt(), and removing spapr_create_fdt_skel() entirely.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2016-10-28 09:38:26 +11:00
David Gibson bf5a6696ba pseries: Consolidate construction of /vdevice device tree node
Construction of the /vdevice node (and its children) is divided between
spapr_create_fdt_skel() (at init time), which creates the base node, and
spapr_populate_vdevice() (at reset time) which creates the nodes for each
individual virtual device.

This consolidates both into a single function called from
spapr_build_fdt().

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2016-10-28 09:38:26 +11:00
David Gibson ffb1e275a6 pseries: Move /event-sources construction to spapr_build_fdt()
The /event-sources device tree node is built from spapr_create_fdt_skel().
As part of consolidating device tree construction to reset time, this moves
it to spapr_build_fdt().

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2016-10-28 09:38:26 +11:00
David Gibson 3f5dabceba pseries: Consolidate construction of /rtas device tree node
For historical reasons construction of the /rtas node in the device
tree (amongst others) is split into several places.  In particular
it's split between spapr_create_fdt_skel(), spapr_build_fdt() and
spapr_rtas_device_tree_setup().

In fact, as well as adding the actual RTAS tokens to the device tree,
spapr_rtas_device_tree_setup() just adds the ibm,lrdr-capacity
property, which despite going in the /rtas node, doesn't have a lot to
do with RTAS.

This patch consolidates the code constructing /rtas together into a new
spapr_dt_rtas() function.  spapr_rtas_device_tree_setup() is renamed to
spapr_dt_rtas_tokens() and now only adds the token properties.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2016-10-28 09:38:26 +11:00
David Gibson 7c866c6a60 pseries: Consolidate construction of /chosen device tree node
For historical reasons, building the /chosen node in the guest device tree
is split across several places and includes both parts which write the DT
sequentially and others which use random access functions.

This patch consolidates construction of the node into one place, using
random access functions throughout.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2016-10-28 09:38:26 +11:00
David Gibson 9b9a19080a pseries: Move construction of /interrupt-controller fdt node
Currently the device tree node for the XICS interrupt controller is in
spapr_create_fdt_skel().  As part of consolidating device tree construction
to reset time, this moves it to a function called from spapr_build_fdt().

In addition we move the actual code into hw/intc/xics_spapr.c with the
rest of the PAPR specific interrupt controller code.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2016-10-28 09:38:26 +11:00
David Gibson 2cac78c12a pseries: Consolidate RTAS loading
At each system reset, the pseries machine needs to load RTAS (the runtime
portion of the guest firmware) into the VM.  This means copying
the actual RTAS code into guest memory, and also updating the device
tree so that the guest OS and boot firmware can locate it.

For historical reasons the copy and update to the device tree were in
different parts of the code.  This cleanup brings them both together in
an spapr_load_rtas() function.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2016-10-28 09:38:26 +11:00
David Gibson a19f7fb045 pseries: Make spapr_create_fdt_skel() get information from machine state
Currently spapr_create_fdt_skel() takes a bunch of individual parameters
for various things it will put in the device tree.  Some of these can
already be taken directly from sPAPRMachineState.  This patch alters it so
that all of them can be taken from there, which will allow this code to
be moved away from its current caller in future.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2016-10-28 09:38:25 +11:00
David Gibson cae172ab6d pseries: Remove rtas_addr and fdt_addr fields from machinestate
These values are used only within ppc_spapr_reset(), so just change them
to local variables.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2016-10-28 09:38:25 +11:00
Cédric Le Goater 3495b6b610 ppc/pnv: add a ISA bus
As Qemu only supports a single instance of the ISA bus, we use the LPC
controller of chip 0 to create one and plug in a couple of useful
devices, like an UART and RTC. An IPMI BT device, which is also an ISA
device, can be defined on the command line to connect an external BMC.
That is for later.

The PowerNV machine now has a console. Skiboot should load a kernel
and jump into it but execution will stop quite early because we lack a
model for the native XICS controller for the moment :

    [    0.000000] NR_IRQS:512 nr_irqs:512 16
    [    0.000000] XICS: Cannot find a Presentation Controller !
    [    0.000000] ------------[ cut here ]------------
    [    0.000000] WARNING: at arch/powerpc/platforms/powernv/setup.c:81
    ...
    [    0.000000] NIP [c00000000079d65c] pnv_init_IRQ+0x30/0x44

You can still do a few things under xmon.

Based on previous work from :
      Benjamin Herrenschmidt <benh@kernel.crashing.org>

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
[dwg: Trivial fix for a change in the serial_hds_isa_init() interface]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-28 09:38:25 +11:00
Benjamin Herrenschmidt a3980bf517 ppc/pnv: add a LPC controller
The LPC (Low Pin Count) interface on a POWER8 is made accessible to
the system through the ADU (XSCOM interface). This interface is part
of set of units connected together via a local OPB (On-Chip Peripheral
Bus) which act as a bridge between the ADU and the off chip LPC
endpoints, like external flash modules.

The most important units of this OPB are :
 - OPB Master: contains the ADU slave logic, a set of internal
   registers and the logic to control the OPB.
 - LPCHC (LPC HOST Controller): which implements a OPB Slave, a set of
   internal registers and the LPC HOST Controller to control the LPC
   interface.

Four address spaces are provided to the ADU :
 - LPC Bus Firmware Memory
 - LPC Bus Memory
 - LPC Bus I/O (ISA bus)
 - and the registers for the OPB Master and the LPC Host Controller

On POWER8, an intermediate hop is necessary to reach the OPB, through
a unit called the ECCB. OPB commands are simply mangled in ECCB write
commands.

On POWER9, the OPB master address space can be accessed via MMIO. The
logic is same but the code will be simpler as the XSCOM and ECCB hops
are not necessary anymore.

This version of the LPC controller model doesn't yet implement support
for the SerIRQ deserializer present in the Naples version of the chip
though some preliminary work is there.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[clg: - updated for qemu-2.7
      - ported on latest PowerNV patchset
      - changed the XSCOM interface to fit new model
      - QOMified the model
      - moved the ISA hunks in another patch
      - removed printf logging
      - added a couple of UNIMP logging
      - rewrote commit log ]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-28 09:38:25 +11:00
Cédric Le Goater 24ece07250 ppc/pnv: add XSCOM handlers to PnvCore
Now that we are using real HW ids for the cores in PowerNV chips, we
can route the XSCOM accesses to them. We just need to attach a
specific XSCOM memory region to each core in the appropriate window
for the core number.

To start with, let's install the DTS (Digital Thermal Sensor) handlers
which should return 38°C for each core.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-28 09:38:25 +11:00
Cédric Le Goater 967b75230b ppc/pnv: add XSCOM infrastructure
On a real POWER8 system, the Pervasive Interconnect Bus (PIB) serves
as a backbone to connect different units of the system. The host
firmware connects to the PIB through a bridge unit, the
Alter-Display-Unit (ADU), which gives him access to all the chiplets
on the PCB network (Pervasive Connect Bus), the PIB acting as the root
of this network.

XSCOM (serial communication) is the interface to the sideband bus
provided by the POWER8 pervasive unit to read and write to chiplets
resources. This is needed by the host firmware, OPAL and to a lesser
extent, Linux. This is among others how the PCI Host bridges get
configured at boot or how the LPC bus is accessed.

To represent the ADU of a real system, we introduce a specific
AddressSpace to dispatch XSCOM accesses to the targeted chiplets. The
translation of an XSCOM address into a PCB register address is
slightly different between the P9 and the P8. This is handled before
the dispatch using a 8byte alignment for all.

To customize the device tree, a QOM InterfaceClass, PnvXScomInterface,
is provided with a populate() handler. The chip populates the device
tree by simply looping on its children. Therefore, each model needing
custom nodes should not forget to declare itself as a child at
instantiation time.

Based on previous work done by :
      Benjamin Herrenschmidt <benh@kernel.crashing.org>

Signed-off-by: Cédric Le Goater <clg@kaod.org>
[dwg: Added cpu parameter to xscom_complete()]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-28 09:38:25 +11:00
Cédric Le Goater d2fd9612ee ppc/pnv: add a PnvCore object
This is largy inspired by sPAPRCPUCore with some simplification, no
hotplug for instance. A set of PnvCore objects is added to the PnvChip
and the device tree is populated looping on these cores.

Real HW cpu ids are now generated depending on the chip cpu model, the
chip id and a core mask. The id is propagated to the CPU object, using
properties, to set the SPR_PIR (Processor Identification Register)

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-28 09:38:25 +11:00
Cédric Le Goater 631adaff31 ppc/pnv: add a PIR handler to PnvChip
The Processor Identification Register (PIR) is a register that holds a
processor identifier which is used for bus transactions (XSCOM) and
for processor differentiation in multiprocessor systems. It also used
in the interrupt vector entries (IVE) to identify the thread serving
the interrupts.

P9 and P8 have some differences in the CPU PIR encoding.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-28 09:38:25 +11:00
Cédric Le Goater 397a79e757 ppc/pnv: add a core mask to PnvChip
This will be used to build real HW ids for the cores and enforce some
limits on the available cores per chip.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-28 09:38:25 +11:00
Cédric Le Goater e997040e3f ppc/pnv: add a PnvChip object
This is is an abstraction of a POWER8 chip which is a set of cores
plus other 'units', like the pervasive unit, the interrupt controller,
the memory controller, the on-chip microcontroller, etc. The whole can
be seen as a socket. It depends on a cpu model and its characteristics:
max cores and specific inits are defined in a PnvChipClass.

We start with an near empty PnvChip with only a few cpu constants
which we will grow in the subsequent patches with the controllers
required to run the system.

The Chip CFAM (Common FRU Access Module) ID gives the model of the
chip and its version number. It is generally the first thing firmwares
fetch, available at XSCOM PCB address 0xf000f, to start initialization.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-28 09:38:25 +11:00
Benjamin Herrenschmidt 9e933f4a62 ppc/pnv: add skeleton PowerNV platform
The goal is to emulate a PowerNV system at the level of the skiboot
firmware, which loads the OS and provides some runtime services. Power
Systems have a lower firmware (HostBoot) that does low level system
initialization, like DRAM training. This is beyond the scope of what
qemu will address in a PowerNV guest.

No devices yet, not even an interrupt controller. Just to get started,
some RAM to load the skiboot firmware, the kernel and initrd. The
device tree is fully created in the machine reset op.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[clg: - updated for qemu-2.7
      - replaced fprintf by error_report
      - used a common definition of _FDT macro
      - removed VMStateDescription as migration is not yet supported
      - added IBM Copyright statements
      - reworked kernel_filename handling
      - merged PnvSystem and sPowerNVMachineState
      - removed PHANDLE_XICP
      - added ppc_create_page_sizes_prop helper
      - removed nmi support
      - removed kvm support
      - updated powernv machine to version 2.8
      - removed chips and cpus, They will be provided in another patches
      - added a machine reset routine to initialize the device tree (also)
      - french has a squelette and english a skeleton.
      - improved commit log.
      - reworked prototypes parameters
      - added a check on the ram size (thanks to Michael Ellerman)
      - fixed chip-id cell
      - changed MAX_CPUS to 2048
      - simplified memory node creation to one node only
      - removed machine version
      - rewrote the device tree creation with the fdt "rw" routines
      - s/sPowerNVMachineState/PnvMachineState/
      - etc.]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-28 09:38:24 +11:00
David Gibson e763da2344 pseries: Remove unused callbacks from sPAPR VIO bus state
The original QOMification of the spapr VIO devices in 3954d33 "spapr:
convert to QEMU Object Model (v2)" moved some callbacks from the
VIOsPAPRBus structure to the VIOsPAPRDeviceClass.  Except, that it
forgot to actually remove them from the VIOsPAPRBus structure (which
still exists, though it doesn't fulfill quite the same function as it
did pre-QOM).

This patch removes those now unused callback fields.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Thomas Huth <thuth@redhat.com>
2016-10-28 09:36:58 +11:00
Cédric Le Goater e3403258a2 ppc/xics: change the icp_ routines API to use an 'ICPState *' argument
The routines :

	void icp_set_cppr(ICPState *icp, uint8_t cppr);
	void icp_set_mfrr(ICPState *icp, uint8_t mfrr);
	void icp_eoi(ICPState *icp, uint32_t xirr);

now use one 'ICPState *icp' argument instead of a 'XICSState *' and a
server arguments. The backlink on XICSState* is used whenever needed.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-28 09:36:58 +11:00
Cédric Le Goater d49c603b37 ppc/xics: add a XICSState backlink in ICPState
The link will be used to change the API of the icp_* routines which
are still using an XICSState as an argument.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-28 09:36:58 +11:00
Cédric Le Goater 2bb0d10aeb ppc/xics: add a xics_set_nr_servers common routine
xics_spapr and xics_kvm nearly define the same 'set_nr_servers'
handler. Only the type of the ICP differs. So let's make a common one
to remove some duplicated code.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-28 09:36:58 +11:00
David Gibson daa2369903 spapr_pci: Add a 64-bit MMIO window
On real hardware, and under pHyp, the PCI host bridges on Power machines
typically advertise two outbound MMIO windows from the guest's physical
memory space to PCI memory space:
  - A 32-bit window which maps onto 2GiB..4GiB in the PCI address space
  - A 64-bit window which maps onto a large region somewhere high in PCI
    address space (traditionally this used an identity mapping from guest
    physical address to PCI address, but that's not always the case)

The qemu implementation in spapr-pci-host-bridge, however, only supports a
single outbound MMIO window, however.  At least some Linux versions expect
the two windows however, so we arranged this window to map onto the PCI
memory space from 2 GiB..~64 GiB, then advertised it as two contiguous
windows, the "32-bit" window from 2G..4G and the "64-bit" window from
4G..~64G.

This approach means, however, that the 64G window is not naturally aligned.
In turn this limits the size of the largest BAR we can map (which does have
to be naturally aligned) to roughly half of the total window.  With some
large nVidia GPGPU cards which have huge memory BARs, this is starting to
be a problem.

This patch adds true support for separate 32-bit and 64-bit outbound MMIO
windows to the spapr-pci-host-bridge implementation, each of which can
be independently configured.  The 32-bit window always maps to 2G.. in PCI
space, but the PCI address of the 64-bit window can be configured (it
defaults to the same as the guest physical address).

So as not to break possible existing configurations, as long as a 64-bit
window is not specified, a large single window can be specified.  This
will appear the same way to the guest as the old approach, although it's
now implemented by two contiguous memory regions rather than a single one.

For now, this only adds the possibility of 64-bit windows.  The default
configuration still uses the legacy mode.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2016-10-16 12:03:09 +11:00
David Gibson 6737d9ad79 spapr_pci: Delegate placement of PCI host bridges to machine type
The 'spapr-pci-host-bridge' represents the virtual PCI host bridge (PHB)
for a PAPR guest.  Unlike on x86, it's routine on Power (both bare metal
and PAPR guests) to have numerous independent PHBs, each controlling a
separate PCI domain.

There are two ways of configuring the spapr-pci-host-bridge device: first
it can be done fully manually, specifying the locations and sizes of all
the IO windows.  This gives the most control, but is very awkward with 6
mandatory parameters.  Alternatively just an "index" can be specified
which essentially selects from an array of predefined PHB locations.
The PHB at index 0 is automatically created as the default PHB.

The current set of default locations causes some problems for guests with
large RAM (> 1 TiB) or PCI devices with very large BARs (e.g. big nVidia
GPGPU cards via VFIO).  Obviously, for migration we can only change the
locations on a new machine type, however.

This is awkward, because the placement is currently decided within the
spapr-pci-host-bridge code, so it breaks abstraction to look inside the
machine type version.

So, this patch delegates the "default mode" PHB placement from the
spapr-pci-host-bridge device back to the machine type via a public method
in sPAPRMachineClass.  It's still a bit ugly, but it's about the best we
can do.

For now, this just changes where the calculation is done.  It doesn't
change the actual location of the host bridges, or any other behaviour.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2016-10-16 12:03:09 +11:00
Benjamin Herrenschmidt d4d7a59a7a ppc/xics: Split ICS into ics-base and ics class
The existing implementation remains same and ics-base is introduced. The
type name "ics" is retained, and all the related functions renamed as
ics_simple_*

This will allow different implementations for the source controllers
such as the MSI support of PHB3 on Power8 which uses in-memory state
tables for example.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
[ clg: added ICS_BASE_GET_CLASS and related fixes, based on :
       http://patchwork.ozlabs.org/patch/646010/ ]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-14 16:31:02 +11:00
Benjamin Herrenschmidt cc706a5305 ppc/xics: Make the ICSState a list
Instead of an array of fixed sized blocks, use a list, as we will need
to have sources with variable number of interrupts. SPAPR only uses
a single entry. Native will create more. If performance becomes an
issue we can add some hashed lookup but for now this will do fine.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[ move the initialization of list to xics_common_initfn,
  restore xirr_owner after migration and move restoring to
  icp_post_load]
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
[ clg: removed the icp_post_load() changes from nikunj patchset v3:
       http://patchwork.ozlabs.org/patch/646008/ ]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-14 16:31:02 +11:00
Thomas Huth 3daa4a9f95 hw/ppc/spapr: Use POWER8 by default for the pseries-2.8 machine
A couple of distributors are compiling their distributions
with "-mcpu=power8" for ppc64le these days, so the user sooner
or later runs into a crash there when not explicitely specifying
the "-cpu POWER8" option to QEMU (which is currently using POWER7
for the "pseries" machine by default). Due to this reason, the
linux-user target already switched to POWER8 a while ago (see commit
de3f1b9841). Since the softmmu target
of course has the same problem, we should switch there to POWER8 for
the newer machine types, too.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-06 16:15:53 +11:00
Benjamin Herrenschmidt 15ed653fa4 ppc/xics: An ICS with offset 0 is assumed to be uninitialized
This will make life easier for dealing with dynamically configured
ICSes such as PHB3

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-09-23 12:39:07 +10:00
Bharata B Rao 7ebaf79556 spapr: Introduce sPAPRCPUCoreClass
Each spapr cpu core type defines an instance_init routine which just
populates the CPU class name. This can be done in the class_init
commonly for all core types which simplifies the registration.
This is inspired by how PowerNV core types are registered.

Certain types of spapr cpu cores ('host' and generic type based on host
CPU) are initialized in target-ppc/kvm.c. To convert these type
registrations to use class_init, we need to expose
spapr_cpu_core_class_init() outside of spapr_cpu_core.c.

Commit d11b268e17 added a generic sPAPR CPU core family
type to support cases like POWER8 CPU type on POWER8E host CPU.
Switching to class_init would fix such scenarios to use the right
CPU thread type instead of defaulting to host-powerpc64-cpu.

In an unrelated cleanup, fix a typo in .get_hotplug_handler routine.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-09-23 12:39:06 +10:00
Laurent Vivier eeddd59f59 tests: add RTAS command in the protocol
Add a first test to validate the protocol:

- rtas/get-time-of-day compares the time
  from the guest with the time from the host.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-09-23 10:29:40 +10:00
Ladi Prosek d4b84d564e Remove unused function declarations
Unused function declarations were found using a simple gcc plugin and
manually verified by grepping the sources.

Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2016-09-15 15:32:22 +03:00
Paolo Bonzini 2286459d3a ppc: do not redefine CPUPPCState
Just include the file that is supposed to bring it in.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-09-13 19:09:44 +02:00
Cédric Le Goater 3654fa95bc hw/ppc: add a ppc_create_page_sizes_prop() helper routine
The exact same routine will be used in PowerNV.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-09-07 12:40:12 +10:00
Cédric Le Goater ce9863b797 hw/ppc: use error_report instead of fprintf
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-09-07 12:40:12 +10:00
Cédric Le Goater 7804c353a9 hw/ppc: include fdt helper routine in a common file
spapr_pci would also be a good candidate but the macro _FDT is
slightly different. It returns and does not exit.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-09-07 09:52:14 +10:00
Greg Kurz e703d2f71c ppc: parse cpu features once
Considering that features are converted to global properties and
global properties are automatically applied to every new instance
of created CPU (at object_new() time), there is no point in
parsing cpu_model string every time a CPU created. So move
parsing outside CPU creation loop and do it only once.

Parsing also should be done before any CPU is created so that
features would affect the first CPU a well.

This patch does that for all PowerPC machine types.

It is based on previous work from Bharata:

https://lists.nongnu.org/archive/html/qemu-devel/2016-06/msg07564.html

Signed-off-by: Greg Kurz <groug@kaod.org>
[clg: only kept the fix for the spapr platform. support for other
      platform will be added in 2.8 ]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-08-13 17:32:58 +10:00
David Gibson 3c0c47e346 spapr: Correctly set query_hotpluggable_cpus hook based on machine version
Prior to c8721d3 "spapr: Error out when CPU hotplug is attempted on older
pseries machines", attempting to use query-hotpluggable-cpus on pseries-2.6
and earlier machine types would SEGV.

That change fixed that, but due to some unexpected interactions in init
order and a brown-paper-bag worthy failure to test, it accidentally
disabled query-hotpluggable-cpus for all pseries machine types, including
the current one which should allow it.

In fact, query_hotpluggable_cpus needs to be non-NULL when and only when
the dr_cpu_enabled flag in sPAPRMachineClass is set, which makes
dr_cpu_enabled itself redundant.

This patch removes dr_cpu_enabled, instead directly setting
query_hotpluggable_cpus from the machine class_init functions, and using
that to determine the availability of CPU hotplug when necessary.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-08-08 09:45:03 +10:00
Markus Armbruster 175de52487 Clean up decorations and whitespace around header guards
Cleaned up with scripts/clean-header-guards.pl.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
2016-07-12 16:20:46 +02:00
Markus Armbruster 2a6a4076e1 Clean up ill-advised or unusual header guards
Cleaned up with scripts/clean-header-guards.pl.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
2016-07-12 16:20:46 +02:00
Markus Armbruster 121d07125b Clean up header guards that don't match their file name
Header guard symbols should match their file name to make guard
collisions less likely.  Offenders found with
scripts/clean-header-guards.pl -vn.

Cleaned up with scripts/clean-header-guards.pl, followed by some
renaming of new guard symbols picked by the script to better ones.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
2016-07-12 16:19:16 +02:00
Markus Armbruster a9c94277f0 Use #include "..." for our own headers, <...> for others
Tracked down with an ugly, brittle and probably buggy Perl script.

Also move includes converted to <...> up so they get included before
ours where that's obviously okay.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Tested-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
2016-07-12 16:19:16 +02:00
Alexey Kardashevskiy ae4de14cd3 spapr_pci/spapr_pci_vfio: Support Dynamic DMA Windows (DDW)
This adds support for Dynamic DMA Windows (DDW) option defined by
the SPAPR specification which allows to have additional DMA window(s)

The "ddw" property is enabled by default on a PHB but for compatibility
the pseries-2.6 machine and older disable it.
This also creates a single DMA window for the older machines to
maintain backward migration.

This implements DDW for PHB with emulated and VFIO devices. The host
kernel support is required. The advertised IOMMU page sizes are 4K and
64K; 16M pages are supported but not advertised by default, in order to
enable them, the user has to specify "pgsz" property for PHB and
enable huge pages for RAM.

The existing linux guests try creating one additional huge DMA window
with 64K or 16MB pages and map the entire guest RAM to. If succeeded,
the guest switches to dma_direct_ops and never calls TCE hypercalls
(H_PUT_TCE,...) again. This enables VFIO devices to use the entire RAM
and not waste time on map/unmap later. This adds a "dma64_win_addr"
property which is a bus address for the 64bit window and by default
set to 0x800.0000.0000.0000 as this is what the modern POWER8 hardware
uses and this allows having emulated and VFIO devices on the same bus.

This adds 4 RTAS handlers:
* ibm,query-pe-dma-window
* ibm,create-pe-dma-window
* ibm,remove-pe-dma-window
* ibm,reset-pe-dma-window
These are registered from type_init() callback.

These RTAS handlers are implemented in a separate file to avoid polluting
spapr_iommu.c with PCI.

This changes sPAPRPHBState::dma_liobn to an array to allow 2 LIOBNs
and updates all references to dma_liobn. However this does not add
64bit LIOBN to the migration stream as in fact even 32bit LIOBN is
rather pointless there (as it is a PHB property and the management
software can/should pass LIOBNs via CLI) but we keep it for the backward
migration support.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-07-05 14:31:08 +10:00
Benjamin Herrenschmidt 27f2458245 ppc/xics: Replace "icp" with "xics" in most places
The "ICP" is a different object than the "XICS". For historical reasons,
we have a number of places where we name a variable "icp" while it contains
a XICSState pointer. There *is* an ICPState structure too so this makes
the code really confusing.

This is a mechanical replacement of all those instances to use the name
"xics" instead. There should be no functional change.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[spapr_cpu_init has been moved to spapr_cpu_core.c, change there]
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-07-01 13:41:47 +10:00
Benjamin Herrenschmidt 1cbd222055 ppc/xics: Implement H_IPOLL using an accessor
None of the other presenter functions directly mucks with the
internal state, so don't do it there either.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-07-01 13:41:47 +10:00
Benjamin Herrenschmidt 9c7027ba94 ppc/xics: Move SPAPR specific code to a separate file
Leave the core ICP/ICS logic in xics.c and move the top level
class wrapper, hypercall and RTAS handlers to xics_spapr.c

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[add cpu.h in xics_spapr.c, move set_nr_irqs and set_nr_servers to
 xics_spapr.c]
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-07-01 13:41:46 +10:00
Benjamin Herrenschmidt 161deaf225 ppc/xics: Rename existing xics to xics_spapr
The common class doesn't change, the KVM one is sPAPR specific. Rename
variables and functions to xics_spapr.

Retain the type name as "xics" to preserve migration for existing sPAPR
guests.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-07-01 13:41:46 +10:00
Benjamin Herrenschmidt d29f086169 ppc/xics: Remove unused xics_set_irq_type()
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
[dwg: Adjusted for context to apply without original series]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-06-27 13:13:30 +10:00
Bharata B Rao 6f4b5c3ec5 spapr: CPU hot unplug support
Remove the CPU core device by removing the underlying CPU thread devices.
Hot removal of CPU for sPAPR guests is achieved by sending the hot unplug
notification to the guest. Release the vCPU object after CPU hot unplug so
that vCPU fd can be parked and reused.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-06-17 16:33:49 +10:00
Bharata B Rao af81cf323c spapr: CPU hotplug support
Set up device tree entries for the hotplugged CPU core and use the
exising RTAS event logging infrastructure to send CPU hotplug notification
to the guest.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-06-17 16:33:49 +10:00
Bharata B Rao 94a94e4c49 spapr: convert boot CPUs into CPU core devices
Introduce sPAPRMachineClass.dr_cpu_enabled to indicate support for
CPU core hotplug. Initialize boot time CPUs as core deivces and prevent
topologies that result in partially filled cores. Both of these are done
only if CPU core hotplug is supported.

Note: An unrelated change in the call to xics_system_init() is done
in this patch as it makes sense to use the local variable smt introduced
in this patch instead of kvmppc_smt_threads() call here.

TODO: We derive sPAPR core type by looking at -cpu <model>. However
we don't take care of "compat=" feature yet for boot time as well
as hotplug CPUs.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-06-17 16:33:49 +10:00
Bharata B Rao afd10a0fa6 spapr: Move spapr_cpu_init() to spapr_cpu_core.c
Start consolidating CPU init related routines in spapr_cpu_core.c. As
part of this, move spapr_cpu_init() and its dependencies from spapr.c
to spapr_cpu_core.c

No functionality change in this patch.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
[dwg: Rename TIMEBASE_FREQ to SPAPR_TIMEBASE_FREQ, since it's now in a
 public(ish) header]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-06-17 16:33:48 +10:00
Bharata B Rao 3b54254966 spapr: Abstract CPU core device and type specific core devices
Add sPAPR specific abastract CPU core device that is based on generic
CPU core device. Use this as base type to create sPAPR CPU specific core
devices.

TODO:
- Add core types for other remaining CPU types
- Handle CPU model alias correctly

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-06-17 16:33:48 +10:00
Bharata B Rao aab99135b6 spapr_drc: Prevent detach racing against attach for CPU DR
If a CPU is hot removed while hotplug of the same is still in progress,
the guest crashes. Prevent this by ensuring that detach is done only
after attach has completed.

The existing code already prevents such race for PCI hotplug. However
given that CPU is a logical DR unlike PCI and starts with ISOLATED
state, we need a logic that works for CPU too.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
               [Don't set awaiting_attach for PCI devices]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-06-17 16:33:48 +10:00
Bharata B Rao 4a4b344c7c xics,xics_kvm: Handle CPU unplug correctly
XICS is setup for each CPU during initialization. Provide a routine
to undo the same when CPU is unplugged. While here, move ss->cs management
into xics from xics_kvm since there is nothing KVM specific in it.
Also ensure xics reset doesn't set irq for CPUs that are already unplugged.

This allows reboot of a VM that has undergone CPU hotplug and unplug
to work correctly.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-06-17 16:33:48 +10:00
Bharata B Rao d0e5a8f293 spapr: Ensure all LMBs are represented in ibm,dynamic-memory
Memory hotplug can fail for some combinations of RAM and maxmem when
DDW is enabled in the presence of devices like nec-usb-xhci. DDW depends
on maximum addressable memory returned by guest and this value is currently
being calculated wrongly by the guest kernel routine memory_hotplug_max().
While there is an attempt to fix the guest kernel, this patch works
around the problem within QEMU itself.

memory_hotplug_max() routine in the guest kernel arrives at max
addressable memory by multiplying lmb-size with the lmb-count obtained
from ibm,dynamic-memory property. There are two assumptions here:

- All LMBs are part of ibm,dynamic memory: This is not true for PowerKVM
  where only hot-pluggable LMBs are present in this property.
- The memory area comprising of RAM and hotplug region is contiguous: This
  needn't be true always for PowerKVM as there can be gap between
  boot time RAM and hotplug region.

To work around this guest kernel bug, ensure that ibm,dynamic-memory
has information about all the LMBs (RMA, boot-time LMBs, future
hotpluggable LMBs, and dummy LMBs to cover the gap between RAM and
hotpluggable region).

RMA is represented separately by memory@0 node. Hence mark RMA LMBs
and also the LMBs for the gap b/n RAM and hotpluggable region as
reserved and as having no valid DRC so that these LMBs are not considered
by the guest.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-06-14 13:20:01 +10:00
Mark Cave-Ayland bc9ca5958d macio: call dma_memory_unmap() at the end of each DMA transfer
This ensures that the underlying memory is marked dirty once the transfer
is complete and resolves cache coherency problems under MacOS 9.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-06-14 10:43:24 +10:00
Alexey Kardashevskiy b4b6eb771a spapr_iommu: Add root memory region
We are going to have multiple DMA windows at different offsets on
a PCI bus. For the sake of migration, we will have as many TCE table
objects pre-created as many windows supported.
So we need a way to map windows dynamically onto a PCI bus
when migration of a table is completed but at this stage a TCE table
object does not have access to a PHB to ask it to map a DMA window
backed by just migrated TCE table.

This adds a "root" memory region (UINT64_MAX long) to the TCE object.
This new region is mapped on a PCI bus with enabled overlapping as
there will be one root MR per TCE table, each of them mapped at 0.
The actual IOMMU memory region is a subregion of the root region and
a TCE table enables/disables this subregion and maps it at
the specific offset inside the root MR which is 1:1 mapping of
a PCI address space.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-06-07 10:17:45 +10:00
Alexey Kardashevskiy a26fdf3934 spapr_iommu: Migrate full state
The source guest could have reallocated the default TCE table and
migrate bigger/smaller table. This adds reallocation in post_load()
if the default table size is different on source and destination.

This adds @bus_offset, @page_shift to the migration stream as
a subsection so when DDW is added, migration to older machines will
still be possible. As @bus_offset and @page_shift are not used yet,
this makes no change in behavior.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-06-07 10:17:45 +10:00
Alexey Kardashevskiy df7625d422 spapr_iommu: Introduce "enabled" state for TCE table
Currently TCE tables are created once at start and their sizes never
change. We are going to change that by introducing a Dynamic DMA windows
support where DMA configuration may change during the guest execution.

This changes spapr_tce_new_table() to create an empty zero-size IOMMU
memory region (IOMMU MR). Only LIOBN is assigned by the time of creation.
It still will be called once at the owner object (VIO or PHB) creation.

This introduces an "enabled" state for TCE table objects, some
helper functions are added:
- spapr_tce_table_enable() receives TCE table parameters, stores in
sPAPRTCETable and allocates a guest view of the TCE table
(in the user space or KVM) and sets the correct size on the IOMMU MR;
- spapr_tce_table_disable() disposes the table and resets the IOMMU MR
size; it is made public as the following DDW code will be using it.

This changes the PHB reset handler to do the default DMA initialization
instead of spapr_phb_realize(). This does not make differenct now but
later with more than just one DMA window, we will have to remove them all
and create the default one on a system reset.

No visible change in behaviour is expected except the actual table
will be reallocated every reset. We might optimize this later.

The other way to implement this would be dynamically create/remove
the TCE table QOM objects but this would make migration impossible
as the migration code expects all QOM objects to exist at the receiver
so we have to have TCE table objects created when migration begins.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-06-07 10:17:45 +10:00
Paolo Bonzini 741da0d38b hw: cannot include hw/hw.h from user emulation
All qdev definitions are available from other headers, user-mode
emulation does not need hw/hw.h.

By considering system emulation only, it is simpler to disentangle
hw/hw.h from NEED_CPU_H.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:42:28 +02:00
Paolo Bonzini cbd62f8616 hw: do not use VMSTATE_*TL
Reserve this to CPU state serialization.

Luckily, they were only used by sPAPR devices and these are ppc64
only.  So there is no change to migration format.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:42:28 +02:00
Paolo Bonzini 7d0c99a9d8 explicitly include qom/cpu.h
exec/cpu-all.h includes qom/cpu.h.  Explicit inclusion
will keep things working when cpu.h will not be included
indirectly almost everywhere (either directly or through
qemu-common.h).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:42:27 +02:00
Paolo Bonzini aa5a9e2484 ppc: use PowerPCCPU instead of CPUPPCState
This changes a cpu.h dependency for hw/ppc/ppc.h into a cpu-qom.h
dependency.  For it to compile we also need to clean up a few unused
definitions.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:42:27 +02:00
Michael Roth f40eb921da spapr_drc: enable immediate detach for unsignalled devices
Currently spapr doesn't support "aborting" hotplug of PCI
devices by allowing device_del to immediately remove the
device if we haven't signalled the presence of the device
to the guest.

In the past this wasn't an issue, since we always immediately
signalled device attach and simply relied on full guest-aware
add->remove path for device removal. However, as of 788d259,
we now defer signalling for PCI functions until function 0
is attached, so now we need to deal with these "abort" operations
for cases where a user hotplugs a non-0 function, then opts to
remove it prior hotplugging function 0. Currently they'd have to
reboot before the unplug completed. PCIe multifunction hotplug
does not have this requirement however, so from a management
implementation perspective it would be good to address this within
the same release as 788d259.

We accomplish this by simply adding a 'signalled' flag to track
whether a device hotplug event has been sent to the guest. If it
hasn't, we allow immediate removal under the assumption that the
guest will not be using the device. Devices present at boot/reset
time are also assumed to be 'signalled'.

For CPU/memory/etc, signalling will still happen immediately
as part of device_add, so only PCI functions should be affected.

Cc: bharata@linux.vnet.ibm.com
Cc: david@gibson.dropbear.id.au
Cc: sbhat@linux.vnet.ibm.com
Cc: qemu-ppc@nongnu.org
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
[dwg: This fixes a regression where an incorrect hot-add of a non-zero
      function can no longer be backed out until function 0 is added]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-04-05 10:47:03 +10:00
Cédric Le Goater 5c94b2a5e5 ppc: Rework POWER7 & POWER8 exception model
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>

This patch fixes the current AIL implementation for POWER8. The
interrupt vector address can be calculated directly from LPCR when the
exception is handled. The excp_prefix update becomes useless and we
can cleanup the H_SET_MODE hcall.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[clg: Removed LPES0/1 handling for HV vs. !HV
      Fixed LPCR_ILE case for POWERPC_EXCP_POWER8 ]
Signed-off-by: Cédric Le Goater <clg@fr.ibm.com>
[dwg: This was written as a cleanup, but it also fixes a real bug
      where setting an alternative interrupt location would not be
      correctly migrated]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-04-05 10:38:24 +10:00
Markus Armbruster daf015ef5a include/qemu/iov.h: Don't include qemu-common.h
qemu-common.h should only be included by .c files.  Its file comment
explains why: "No header file should depend on qemu-common.h, as this
would easily lead to circular header dependencies."

qemu/iov.h includes qemu-common.h for QEMUIOVector stuff.  Move all
that to qemu/iov.h and drop the ill-advised include.  Include
qemu/iov.h where the QEMUIOVector stuff is now missing.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 22:20:16 +01:00
Greg Kurz a005b3ef50 xics: report errors with the QEMU Error API
Using the return value to report errors is error prone:
- xics_alloc() returns -1 on error but spapr_vio_busdev_realize() errors
  on 0
- xics_alloc_block() returns the unclear value of ics->offset - 1 on error
  but both rtas_ibm_change_msi() and spapr_phb_realize() error on 0

This patch adds an errp argument to xics_alloc() and xics_alloc_block() to
report errors. The return value of these functions is a valid IRQ number
if errp is NULL. It is undefined otherwise.

The corresponding error traces get promotted to error messages. Note that
the "can't allocate IRQ" error message in spapr_vio_busdev_realize() also
moves to xics_alloc(). Similar error message consolidation isn't really
applicable to xics_alloc_block() because callers have extra context (device
config address, MSI or MSIX).

This fixes the issues mentioned above.

Based on previous work from Brian W. Hart.

Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-02-28 16:19:02 +11:00
David Gibson 715c54071a pseries: Simplify handling of the hash page table fd
When migrating the 'pseries' machine type with KVM, we use a special fd
to access the hash page table stored within KVM.  Usually, this fd is
opened at the beginning of migration, and kept open until the migration
is complete.

However, if there is a guest reset during the migration, the fd can become
stale and we need to re-open it.  At the moment we use an 'htab_fd_stale'
flag in sPAPRMachineState to signal this, which is checked in the migration
iterators.

But that's rather ugly.  It's simpler to just close and invalidate the
fd on reset, and lazily re-open it in migration if necessary.  This patch
implements that change.

This requires a small addition to the machine state's instance_init,
so that htab_fd is initialized to -1 (telling the migration code it
needs to open it) instead of 0, which could be a valid fd.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2016-02-17 09:59:30 +11:00
David Gibson f201987b84 spapr: Remove rtas_st_buffer_direct()
rtas_st_buffer_direct() is a not particularly useful wrapper around
cpu_physical_memory_write().  All the callers are in
rtas_ibm_configure_connector, where it's better handled by local helper.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2016-01-30 23:37:36 +11:00