Commit Graph

1294 Commits

Author SHA1 Message Date
David Gibson
ee76a09fc7 spapr: Treat Hardware Transactional Memory (HTM) as an optional capability
This adds an spapr capability bit for Hardware Transactional Memory.  It is
enabled by default for pseries-2.11 and earlier machine types. with POWER8
or later CPUs (as it must be, since earlier qemu versions would implicitly
allow it).  However it is disabled by default for the latest pseries-2.12
machine type.

This means that with the latest machine type, HTM will not be available,
regardless of CPU, unless it is explicitly enabled on the command line.
That change is made on the basis that:

 * This way running with -M pseries,accel=tcg will start with whatever cpu
   and will provide the same guest visible model as with accel=kvm.
     - More specifically, this means existing make check tests don't have
       to be modified to use cap-htm=off in order to run with TCG

 * We hope to add a new "HTM without suspend" feature in the not too
   distant future which could work on both POWER8 and POWER9 cpus, and
   could be enabled by default.

 * Best guesses suggest that future POWER cpus may well only support the
   HTM-without-suspend model, not the (frankly, horribly overcomplicated)
   POWER8 style HTM with suspend.

 * Anecdotal evidence suggests problems with HTM being enabled when it
   wasn't wanted are more common than being missing when it was.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
2018-01-17 09:35:24 +11:00
David Gibson
33face6b89 spapr: Capabilities infrastructure
Because PAPR is a paravirtual environment access to certain CPU (or other)
facilities can be blocked by the hypervisor.  PAPR provides ways to
advertise in the device tree whether or not those features are available to
the guest.

In some places we automatically determine whether to make a feature
available based on whether our host can support it, in most cases this is
based on limitations in the available KVM implementation.

Although we correctly advertise this to the guest, it means that host
factors might make changes to the guest visible environment which is bad:
as well as generaly reducing reproducibility, it means that a migration
between different host environments can easily go bad.

We've mostly gotten away with it because the environments considered mature
enough to be well supported (basically, KVM on POWER8) have had consistent
feature availability.  But, it's still not right and some limitations on
POWER9 is going to make it more of an issue in future.

This introduces an infrastructure for defining "sPAPR capabilities".  These
are set by default based on the machine version, masked by the capabilities
of the chosen cpu, but can be overriden with machine properties.

The intention is at reset time we verify that the requested capabilities
can be supported on the host (considering TCG, KVM and/or host cpu
limitations).  If not we simply fail, rather than silently modifying the
advertised featureset to the guest.

This does mean that certain configurations that "worked" may now fail, but
such configurations were already more subtly broken.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
2018-01-17 09:35:24 +11:00
Michael S. Tsirkin
acc95bc850 Merge remote-tracking branch 'origin/master' into HEAD
Resolve conflicts around apb.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-01-11 22:03:50 +02:00
David Gibson
51f84465dd spapr: Correct compatibility mode setting for hotplugged CPUs
Currently the pseries machine sets the compatibility mode for the
guest's cpus in two places: 1) at machine reset and 2) after CAS
negotiation.

This means that if we set or negotiate a compatiblity mode, then
hotplug a cpu, the hotplugged cpu doesn't get the right mode set and
will incorrectly have the full native features.

To correct this, we set the compatibility mode on a cpu when it is
brought online with the 'start-cpu' RTAS call.  Given that we no
longer need to set the compatibility mode on all CPUs at machine
reset, so we change that to only set the mode for the boot cpu.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Tested-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2018-01-10 12:53:00 +11:00
Thomas Huth
a716766889 hw/ppc: Remove the deprecated spapr-pci-vfio-host-bridge device
It's a deprecated dummy device since QEMU v2.6.0. That should have
been enough time to allow the users to update their scripts in case
they still use it, so let's remove this legacy code now.

Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-01-10 12:53:00 +11:00
Cédric Le Goater
a6a444a87a target/ppc: more use of the PPC_*() macros
Also introduce utilities to manipulate bitmasks (originaly from OPAL)
which be will be used in the model of the XIVE interrupt controller.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-01-10 12:53:00 +11:00
Cédric Le Goater
b168a138a8 ppc/pnv: change powernv_ prefix to pnv_ for overall naming consistency
The 'pnv' prefix is now used for all and the routines populating the
device tree start with 'pnv_dt'. The handler of the PnvXScomInterface
is also renamed to 'dt_xscom' which should reflect that it is
populating the device tree under the 'xscom@' node of the chip.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-01-10 12:53:00 +11:00
Greg Kurz
2b3db9dd34 spapr_pci: use warn_report()
These two are definitely warnings. Let's use the appropriate API.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-01-10 12:52:59 +11:00
Philippe Mathieu-Daudé
489983d6b4 hw/net/ne2000: extract ne2k-isa code from i386/pc to ne2000-isa.c
- add "hw/net/ne2000-isa.h"
- remove the old i386 dependency

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Hervé Poussineau <hpoussin@reactos.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au> [PPC]
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-12-18 17:07:02 +03:00
Philippe Mathieu-Daudé
6c646a11bf hw/timer/mc146818: rename rtc_init() -> mc146818_rtc_init()
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Hervé Poussineau <hpoussin@reactos.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-12-18 17:07:02 +03:00
Philippe Mathieu-Daudé
1945e6ab47 ppc: remove duplicated includes
applied using ./scripts/clean-includes

not needed since 7ebaf79556

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-12-18 17:07:02 +03:00
Philippe Mathieu-Daudé
e9808d0969 hw: use "qemu/osdep.h" as first #include in source files
applied using ./scripts/clean-includes

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-12-18 17:07:02 +03:00
Laurent Vivier
1481fe5fcf spapr: don't initialize PATB entry if max-cpu-compat < power9
if KVM is enabled and KVM capabilities MMU radix is available,
the partition table entry (patb_entry) for the radix mode is
initialized by default in ppc_spapr_reset().

It's a problem if we want to migrate the guest to a POWER8 host
while the kernel is not started to set the value to the one
expected for a POWER8 CPU.

The "-machine max-cpu-compat=power8" should allow to migrate
a POWER9 KVM host to a POWER8 KVM host, but because patb_entry
is set, the destination QEMU tries to enable radix mode on the
POWER8 host. This fails and cancels the migration:

    Process table config unsupported by the host
    error while loading state for instance 0x0 of device 'spapr'
    load of migration failed: Invalid argument

This patch doesn't set the PATB entry if the user provides
a CPU compatibility mode that doesn't support radix mode.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-12-15 09:50:29 +11:00
David Gibson
4f441474c6 spapr: Assume msi_nonbroken
We conditionally adjust part of the guest device tree based on the
global msi_nonbroken flag.  However, the main machine type code
initializes msi_nonbroken to true and there's nothing that would set
it to false again.

So replace the test with an assert().

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2017-12-15 09:49:24 +11:00
David Gibson
bcb5ce08cf spapr: Rename machine init functions for clarity
Machine objects have two init functions - the generic QOM level
instance_init which should only do static object initialization, and
the Machine specific MachineClass::init which does the actual
construction of the machine.

In spapr the functions implementing these two have names -
ppc_machine_initfn() and ppc_spapr_init() - which don't correspond closely
to either of those.  To prevent people (read, me) from confusing which is
which, rename them spapr_instance_init() and spapr_machine_init() to
make it clearer which is which.

While we're there rename ppc_spapr_reset() to spapr_machine_reset() to
match.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
2017-12-15 09:49:24 +11:00
Greg Kurz
638f2caa01 spapr_events: drop bogus cell from "interrupt-ranges" property
According to LoPAPR 1.1 B.6.12, the "/event-sources" node has an "interrupt-
ranges" property, the format of which is described in B.6.9.1.2 as follows:

“interrupt-ranges”
 Standard property name that defines the interrupt number(s) and range(s)
 handled by this unit.

 prop-encoded-array: List of (int-number, range) specifications.

 Int-number is encoded as with encode-int.
 Range is encoded as with encode-int.

 The first entry in this list shall contain the int-number associated with
 the first “reg” property entry. The int-num-ber is the value representing
 the interrupt source as would appear in the PowerPC External Interrupt
 Architecture XISR. The range shall be the number of sequential interrupt
 numbers which this unit can generate.

There's no such thing as a cell count at the end of the array, like the
one introduced by commit ffbb1705a3 in QEMU 2.8. It doesn't seem it had
any impact on existing guests and I couldn't find any related workaround
in linux. So, let's just drop the bogus lines.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-12-15 09:49:24 +11:00
Greg Kurz
bb2d8ab636 spapr: fix LSI interrupt specifiers in the device tree
LoPAPR 1.1 B.6.9.1.2 describes the "#interrupt-cells" property of the
PowerPC External Interrupt Source Controller node as follows:

“#interrupt-cells”

  Standard property name to define the number of cells in an interrupt-
  specifier within an interrupt domain.

  prop-encoded-array: An integer, encoded as with encode-int, that denotes
  the number of cells required to represent an interrupt specifier in its
  child nodes.

  The value of this property for the PowerPC External Interrupt option shall
  be 2. Thus all interrupt specifiers (as used in the standard “interrupts”
  property) shall consist of two cells, each containing an integer encoded
  as with encode-int. The first integer represents the interrupt number the
  second integer is the trigger code: 0 for edge triggered, 1 for level
  triggered.

This patch fixes the interrupt specifiers in the "interrupt-map" property
of the PHB node, that were setting the second cell to 8 (confusion with
IRQ_TYPE_LEVEL_LOW ?) instead of 1.

VIO devices and RTAS event sources use the same format for interrupt
specifiers: while here, we introduce a common helper to handle the
encoding details.

Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Cédric Le Goater <clg@kaod.org>
--
v3: - reference public LoPAPR instead of internal PAPR+ in changelog
    - change helper name to spapr_dt_xics_irq()

v2: - drop the erroneous changes to the "interrupts" prop in PCI device nodes
    - introduce a common helper to encode interrupt specifiers
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-12-15 09:49:24 +11:00
Igor Mammedov
f47bd1c839 spapr: replace numa_get_node() with lookup in pc-dimm list
SPAPR is the last user of numa_get_node() and a bunch of
supporting code to maintain numa_info[x].addr list.

Get LMB node id from pc-dimm list, which allows to
remove ~80LOC maintaining dynamic address range
lookup list.

It also removes pc-dimm dependency on numa_[un]set_mem_node_id()
and makes pc-dimms a sole source of information about which
node it belongs to and removes duplicate data from global
numa_info.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-12-15 09:49:24 +11:00
Cédric Le Goater
7718375584 spapr: introduce a spapr_qirq() helper
xics_get_qirq() is only used by the sPAPR machine. Let's move it there
and change its name to reflect its scope. It will be useful for XIVE
support which will use its own set of qirqs.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-12-15 09:49:24 +11:00
Cédric Le Goater
9e7dc5fc2e spapr: introduce a spapr_irq_set_lsi() helper
It will make synchronisation easier with the XIVE interrupt mode when
available. The 'irq' parameter refers to the global IRQ number space.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-12-15 09:49:24 +11:00
Cédric Le Goater
60c6823b9b spapr: move the IRQ allocation routines under the machine
Also change the prototype to use a sPAPRMachineState and prefix them
with spapr_irq_. It will let us synchronise the IRQ allocation with
the XIVE interrupt mode when available.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-12-15 09:49:24 +11:00
Cédric Le Goater
ed0c37eedf ppc/xics: assign of the CPU 'intc' pointer under the core
The 'intc' pointer of the CPU references the interrupt presenter in
the XICS interrupt mode. When the XIVE interrupt mode is available and
activated, the machine will need to reassign this pointer to reflect
the change.

Moving this assignment under the realize routine of the CPU will ease
the process when the interrupt mode is toggled.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-12-15 09:49:24 +11:00
Cédric Le Goater
4f7a47beeb ppc/xics: introduce an icp_create() helper
The sPAPR and the PowerNV core objects create the interrupt presenter
object of the CPUs in a very similar way. Let's provide a common
routine in which we use the presenter 'type' as a child identifier.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-12-15 09:49:24 +11:00
Cédric Le Goater
3fe4f0fc85 spapr/rtas: do not reset the MSR in stop-self command
When a CPU is stopped with the 'stop-self' RTAS call, its state
'halted' is switched to 1 and, in this case, the MSR is not taken into
account anymore in the cpu_has_work() routine. Only the pending
hardware interrupts are checked with their LPCR:PECE* enablement bit.

The CPU is now also protected from the decrementer interrupt by the
LPCR:PECE* bits which are disabled in the 'stop-self' RTAS
call. Reseting the MSR is pointless.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-12-15 09:49:24 +11:00
Cédric Le Goater
d6322252b3 spapr/rtas: fix reboot of a a SMP TCG guest
Just like for hot unplug CPUs, when a guest is rebooted, the secondary
CPUs can be awaken by the decrementer and start entering SLOF at the
same time the boot CPU is.

To be safe, let's disable on the secondaries all the exceptions which
can cause an exit while the CPU is in power-saving mode.

Based on previous work from Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-12-15 09:49:24 +11:00
Cédric Le Goater
9a94ee5bb1 spapr/rtas: disable the decrementer interrupt when a CPU is unplugged
When a CPU is stopped with the 'stop-self' RTAS call, its state
'halted' is switched to 1 and, in this case, the MSR is not taken into
account anymore in the cpu_has_work() routine. Only the pending
hardware interrupts are checked with their LPCR:PECE* enablement bit.

If the DECR timer fires after 'stop-self' is called and before the CPU
'stop' state is reached, the nearly-dead CPU will have some work to do
and the guest will crash. This case happens very frequently with the
not yet upstream P9 XIVE exploitation mode. In XICS mode, the DECR is
occasionally fired but after 'stop' state, so no work is to be done
and the guest survives.

I suspect there is a race between the QEMU mainloop triggering the
timers and the TCG CPU thread but I could not quite identify the root
cause. To be safe, let's disable in the LPCR all the exceptions which
can cause an exit while the CPU is in power-saving mode and reenable
them when the CPU is started.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-12-15 09:49:24 +11:00
Michael Davidsaver
e75ce32a75 e500: name openpic and pci host bridge
Signed-off-by: Michael Davidsaver <mdavidsaver@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-12-15 09:49:23 +11:00
Greg Kurz
94ad93bd97 spapr_cpu_core: instantiate CPUs separately
The current code assumes that only the CPU core object holds a
reference on each individual CPU object, and happily frees their
allocated memory when the core is unrealized. This is dangerous
as some other code can legitimely keep a pointer to a CPU if it
calls object_ref(), but it would end up with a dangling pointer.

Let's allocate all CPUs with object_new() and let QOM free them
when their reference count reaches zero. This greatly simplify the
code as we don't have to fiddle with the instance size anymore.

Signed-off-by: Greg Kurz <groug@kaod.org>
Acked-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-12-15 09:49:23 +11:00
David Gibson
2b6154120c spapr: Add pseries-2.12 machine type
While we're at it fix a couple of small errors in the 2.11 and 2.10 models
(they didn't have any real effect, but don't quite match the template).

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-12-15 09:49:23 +11:00
David Gibson
fd56e0612b pci: Eliminate redundant PCIDevice::bus pointer
The bus pointer in PCIDevice is basically redundant with QOM information.
It's always initialized to the qdev_get_parent_bus(), the only difference
is the type.

Therefore this patch eliminates the field, instead creating a pci_get_bus()
helper to do the type mangling to derive it conveniently from the QOM
Device object underneath.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
2017-12-05 19:13:45 +02:00
David Gibson
1115ff6d26 pci: Rename root bus initialization functions for clarity
pci_bus_init(), pci_bus_new_inplace(), pci_bus_new() and pci_register_bus()
are misleadingly named.  They're not used for initializing *any* PCI bus,
but only for a root PCI bus.

Non-root buses - i.e. ones under a logical PCI to PCI bridge - are instead
created with a direct qbus_create_inplace() (see pci_bridge_initfn()).

This patch renames the functions to make it clear they're only used for
a root bus.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
2017-12-05 19:13:45 +02:00
David Gibson
768a20f3a4 spapr: Include "pre-plugged" DIMMS in ram size calculation at reset
At guest reset time, we allocate a hash page table (HPT) for the guest
based on the guest's RAM size.  If dynamic HPT resizing is not available we
use the maximum RAM size, if it is we use the current RAM size.

But the "current RAM size" calculation is incorrect - we just use the
"base" ram_size from the machine structure.  This doesn't include any
pluggable DIMMs that are already plugged at reset time.

This means that if you try to start a 'pseries' machine with a DIMM
specified on the command line that's much larger than the "base" RAM size,
then the guest will get a woefully inadequate HPT.  This can lead to a
guest freeze during boot as it runs out of HPT space during initial MMU
setup.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Greg Kurz <groug@kaod.org>
2017-12-04 11:31:22 +11:00
Laurent Vivier
0c86b2df78 pseries: fix TCG migration
Migration of pseries is broken with TCG because
QEMU tries to restore KVM MMU state unconditionally.

The result is a SIGSEGV in kvm_vm_ioctl():

  #0  kvm_vm_ioctl (s=0x0, type=-2146390353)
      at qemu/accel/kvm/kvm-all.c:2032
  #1  0x00000001003e3e2c in kvmppc_configure_v3_mmu (cpu=<optimized out>,
      radix=<optimized out>, gtse=<optimized out>, proc_tbl=<optimized out>)
      at qemu/target/ppc/kvm.c:396
  #2  0x00000001002f8b88 in spapr_post_load (opaque=0x1019103c0,
      version_id=<optimized out>) at qemu/hw/ppc/spapr.c:1578
  #3  0x000000010059e4cc in vmstate_load_state (f=0x106230000,
      vmsd=0x1009479e0 <vmstate_spapr>, opaque=0x1019103c0,
      version_id=<optimized out>) at qemu/migration/vmstate.c:165
  #4  0x00000001005987e0 in vmstate_load (f=<optimized out>, se=<optimized out>)
      at qemu/migration/savevm.c:748

This patch fixes the problem by not calling the KVM function with the
TCG mode.

Fixes: d39c90f5f3 ("spapr: Fix migration of Radix guests")
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-11-30 13:57:51 +11:00
Suraj Jitindar Singh
ee4d9ecc36 target/ppc: Move setting of patb_entry on hash table init
The patb_entry is used to store the location of the process table in
guest memory. The msb is also used to indicate the mmu mode of the
guest, that is patb_entry & 1 << 63 ? radix_mode : hash_mode.

Currently we set this to zero in spapr_setup_hpt_and_vrma() since if
this function gets called then we know we're hash. However some code
paths, such as setting up the hpt on incoming migration of a hash guest,
call spapr_reallocate_hpt() directly bypassing this higher level
function. Since we assume radix if the host is capable this results in
the msb in patb_entry being left set so in spapr_post_load() we call
kvmppc_configure_v3_mmu() and tell the host we're radix which as
expected means addresses cannot be translated once we actually run the cpu.

To fix this move the zeroing of patb_entry into spapr_reallocate_hpt().

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-11-27 12:20:11 +11:00
Thomas Huth
bac658d1a4 hw/ppc/spapr: Fix virtio-scsi bootindex handling for LUNs >= 256
LUNs >= 256 have to be encoded with the so-called "flat space
addressing method" for virtio-scsi, where an additional bit has to
be set. SLOF already took care of this with the following commit:

 https://git.qemu.org/?p=SLOF.git;a=commitdiff;h=f72a37713fea47da
 (see https://bugzilla.redhat.com/show_bug.cgi?id=1431584 for details)

But QEMU does not use this encoding yet for device tree paths
that have to be handed over to SLOF to deal with the "bootindex"
property, so SLOF currently fails to boot from virtio-scsi devices
with LUNs >= 256 in the right boot order. Fix it by using the bit
to indicate the "flat space addressing method" for LUNs >= 256.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-11-22 15:28:37 +11:00
Greg Kurz
8251248394 spapr: reset DRCs after devices
A DRC with a pending unplug request releases its associated device at
machine reset time.

In the case of LMB, when all DRCs for a DIMM device have been reset,
the DIMM gets unplugged, causing guest memory to disappear. This may
be very confusing for anything still using this memory.

This is exactly what happens with vhost backends, and QEMU aborts
with:

qemu-system-ppc64: used ring relocated for ring 2
qemu-system-ppc64: qemu/hw/virtio/vhost.c:649: vhost_commit: Assertion
 `r >= 0' failed.

The issue is that each DRC registers a QEMU reset handler, and we
don't control the order in which these handlers are called (ie,
a LMB DRC will unplug a DIMM before the virtio device using the
memory on this DIMM could stop its vhost backend).

To avoid such situations, let's reset DRCs after all devices
have been reset.

Reported-by: Mallesh N. Koti <mallesh@linux.vnet.ibm.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-11-20 10:10:56 +11:00
Suraj Jitindar Singh
7abd43baec target/ppc: Update setting of cpu features to account for compat modes
The device tree nodes ibm,arch-vec-5-platform-support and ibm,pa-features
are used to communicate features of the cpu to the guest operating
system. The properties of each of these are determined based on the
selected cpu model and the availability of hypervisor features.
Currently the compatibility mode of the cpu is not taken into account.

The ibm,arch-vec-5-platform-support node is used to communicate the
level of support for various ISAv3 processor features to the guest
before CAS to inform the guests' request. The available mmu mode should
only be hash unless the cpu is a POWER9 which is not in a prePOWER9
compat mode, in which case the available modes depend on the
accelerator and the hypervisor capabilities.

The ibm,pa-featues node is used to communicate the level of cpu support
for various features to the guest os. This should only contain features
relevant to the operating mode of the processor, that is the selected
cpu model taking into account any compat mode. This means that the
compat mode should be taken into account when choosing the properties of
ibm,pa-features and they should match the compat mode selected, or the
cpu model selected if no compat mode.

Update the setting of these cpu features in the device tree as described
above to properly take into account any compat mode. We use the
ppc_check_compat function which takes into account the current processor
model and the cpu compat mode.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-11-20 10:07:49 +11:00
Sam Bobroff
e05fba5004 target/ppc: correct htab shift for hash on radix
KVM HV will soon support running a guest in hash mode on a POWER9 host
running in radix mode (see [1]), however the guest currently fails to
boot.

This is because the "htab_shift" value (the size of the MMU's hash
table) is added to the device tree before KVM has had a chance to
change it. If the host is in hash mode, KVM does not need to change it
and so the problem is not seen, but when the host is in radix mode a
change is required and we see a problem.

To fix this, move the call spapr_setup_hpt_and_vrma() (where
htab_shift could be changed) up a little so that it's called before
spapr_h_cas_compose_response() (where htab_shift is added to the
device tree).

Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>

[1] See http://www.spinics.net/lists/kvm-ppc/msg13057.html
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-11-14 10:28:32 +11:00
Michael Davidsaver
c91c187f71 e500: ppce500_init_mpic() return device instead of IRQ array
Actual number of interrupt pins isn't known
in ppce500_init_mpic() so a hardcoded number
was used, which causes a crash with older openpic.

Instead, return the DeviceState* and change ppce500_init()
to call qdev_get_gpio_in() to get only the irq pins
which are needed.

Signed-off-by: Michael Davidsaver <mdavidsaver@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-11-08 13:21:37 +11:00
Greg Kurz
e7cca3e94f spapr_cpu_core: rewrite machine type sanity check
This makes the code easier to understand and it is consistent with what
we already do for PHBs.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-10-17 10:34:01 +11:00
Greg Kurz
f7d6bfcdc0 spapr_pci: fail gracefully with non-pseries machine types
QEMU currently crashes when the user tries to add an spapr-pci-host-bridge
on a non-pseries machine:

$ qemu-system-ppc64 -M ppce500 -device spapr-pci-host-bridge,index=1
hw/ppc/spapr_pci.c:1535:spapr_phb_realize:
Object 0x1003dacae60 is not an instance of type spapr-machine
Aborted (core dumped)

The same thing happens with the deprecated but still available child type
spapr-pci-vfio-host-bridge.

Fix both by checking the machine type with object_dynamic_cast().

Reviewed-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-10-17 10:34:01 +11:00
David Gibson
db50f280cf spapr: Correct RAM size calculation for HPT resizing
In order to prevent the guest from forcing the allocation of large amounts
of qemu memory (or host kernel memory, in the case of KVM HV), we limit
the size of Hashed Page Table (HPT) it is allowed to allocated, based on
its RAM size.

However, the current calculation is not correct: it only adds up the size
of plugged memory, ignoring the base memory size.  This patch corrects it.

While we're there, use get_plugged_memory_size() instead of directly
calling pc_existing_dimms_capacity().  The only difference is that it
will abort on failure, which is right: a failure here indicates something
wrong within qemu.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2017-10-17 10:34:01 +11:00
Igor Mammedov
beba5c0fe4 ppc: pnv: consolidate type definitions and batch register them
Use a new DEFINE_TYPES() helper to simplify type registration

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-10-17 10:34:01 +11:00
Igor Mammedov
40abf43f72 ppc: pnv: drop PnvChipClass::cpu_model field
deduce core type directly from chip type instead of
maintaining type mapping in PnvChipClass::cpu_model.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-10-17 10:34:01 +11:00
Igor Mammedov
7383af1edc ppc: pnv: define core types statically
pnv core type definition doesn't have any fields that
require it to be defined at runtime. So replace code
that fills in TypeInfo at runtime with static TypeInfo
array that does the same at complie time.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-10-17 10:34:01 +11:00
Igor Mammedov
35bdb9def2 ppc: pnv: drop PnvCoreClass::cpu_oc field
deduce cpu type directly from core type instead of
maintaining type mapping in PnvCoreClass::cpu_oc and doing
extra cpu_model parsing in pnv_core_class_init()

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-10-17 10:34:01 +11:00
Igor Mammedov
7fd544d8a7 ppc: pnv: normalize core/chip type names
typically for cpus/core type names following convention is used

   new_type_prefix-superclass_typename

make PNV core/chip to follow common convention.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-10-17 10:34:01 +11:00
Igor Mammedov
4a12c699d3 ppc: pnv: use generic cpu_model parsing
use common cpu_model prasing in vl.c and set default cpu_model
using generic MachineClass::default_cpu_type.

Beside of switching to generic infrastructure it solves several
issues.

 * ppc_cpu_class_by_name() is used to deal with lower/upper case
   and alias translations into actual cpu type, which fixes
    '-M powernv -cpu power8' and '-M powernv -cpu power9_v1.0'
   usecases which error out with:
    'invalid CPU model 'FOO' for powernv machine'
 * allows to switch to lower-case typenames in pnv chip/core name
   (by convention typnames should be lower-case)
 * replace aliased names /power8, power9, .../ with exact cpu model
   names (i.e. typenames should be stable but aliases might decide to
   point to other cpu model withi family or changed by kvm). It will
   also help to simplify pnv_chip/core code and get rid of dependency
   on cpu_model parsing.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
[dwg: Updated to make DD2.0 as default POWER9 chip]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-10-17 10:34:01 +11:00
Igor Mammedov
2e9c10eba0 ppc: spapr: use generic cpu_model parsing
use generic cpu_model parsing introduced by
 (6063d4c0f vl.c: convert cpu_model to cpu type and set of global properties before machine_init())

it allows to:
  * replace sPAPRMachineClass::tcg_default_cpu with
    MachineClass::default_cpu_type
  * drop cpu_parse_cpu_model() from hw/ppc/spapr.c and reuse
    one in vl.c
  * simplify spapr_get_cpu_core_type() by removing
    not needed anymore recurrsion since alias look up
    happens earlier at vl.c and spapr_get_cpu_core_type()
    works only with resulted from that cpu type.
  * spapr no more needs to parse/depend on being phased out
    MachineState::cpu_model, all tha parsing done by generic
    code and target specific callback.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
[dwg: Correct minor compile error]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-10-17 10:34:01 +11:00
Igor Mammedov
17be88a713 ppc: spapr: use cpu model names as tcg defaults instead of aliases
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-10-17 10:34:01 +11:00
Igor Mammedov
5bbb264186 ppc: spapr: register 'host' core type along with the rest of core types
consolidate 'host' core type registration by moving it from
KVM specific code into spapr_cpu_core.c, similar like it's
done in x86 target.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-10-17 10:34:00 +11:00
Igor Mammedov
b51d3c8818 ppc: spapr: use cpu type name directly
replace sPAPRCPUCoreClass::cpu_class with cpu type name
since it were needed just to get that at points it were
accessed.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-10-17 10:34:00 +11:00
Igor Mammedov
44cd95e31a ppc: spapr: define core types statically
spapr core type definition doesn't have any fields that
require it to be defined at runtime. So replace code
that fills in TypeInfo at runtime with static TypeInfo
array that does the same at complie time.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-10-17 10:34:00 +11:00
Igor Mammedov
b8e999673b ppc: move '-cpu foo,compat=xxx' parsing into ppc_cpu_parse_featurestr()
there is a dedicated callback CPUClass::parse_features
which purpose is to convert -cpu features into a set of
global properties AND deal with compat/legacy features
that couldn't be directly translated into CPU's properties.

Create ppc variant of it (ppc_cpu_parse_featurestr) and
move 'compat=val' handling from spapr_cpu_core.c into it.
That removes a dependency of board/core code on cpu_model
parsing and would let to reuse common -cpu parsing
introduced by 6063d4c0

Set "max-cpu-compat" property only if it exists, in practice
it should limit 'compat' hack to spapr machine and allow
to avoid including machine/spapr headers in target/ppc/cpu.c

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-10-17 10:34:00 +11:00
Igor Mammedov
a1063aa8a5 ppc: spapr: replace ppc_cpu_parse_features() with cpu_parse_cpu_model()
ppc_cpu_parse_features() is doing practically the same thing as
generic cpu_parse_cpu_model(). So remove duplicated impl. and
reuse generic one.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-10-17 10:34:00 +11:00
Igor Mammedov
23ec69ecf9 ppc: 40p/prep: replace cpu_model with cpu_type
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-10-17 10:34:00 +11:00
Igor Mammedov
6bab8eaa95 ppc: virtex-ml507: replace cpu_model with cpu_type
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-10-17 10:34:00 +11:00
Igor Mammedov
9391b8c563 ppc: replace cpu_model with cpu_type on ref405ep,taihu boards
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-10-17 10:34:00 +11:00
Igor Mammedov
376d7a2abb ppc: bamboo: use generic cpu_model parsing
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-10-17 10:34:00 +11:00
Igor Mammedov
f4c6604e86 ppc: mac_oldworld: use generic cpu_model parsing
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-10-17 10:34:00 +11:00
Igor Mammedov
9dff4c07e1 ppc: mac_newworld: use generic cpu_model parsing
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-10-17 10:34:00 +11:00
Igor Mammedov
59e816fd3e ppc: mpc8544ds/e500plat: use generic cpu_model parsing
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-10-17 10:34:00 +11:00
Daniel Henrique Barboza
2a129767eb hw/ppc/spapr.c: abort unplug_request if previous unplug isn't done
LMB removal is completed only when the spapr_lmb_release callback
is called after all DRCs of the dimm are detached. During this
time, it is possible that a unplug request for the same dimm
arrives, trying to detach DRCs that were detached by the guest
in the first unplug_request.

BQL doesn't help in this case - the lock will prevent any concurrent
removal from happening until the end of spapr_memory_unplug_request
only. What happens is that the second unplug_request ends up calling
spapr_drc_detach in a DRC that were detached already, causing an
assert error in spapr_drc_detach (e.g
https://bugs.launchpad.net/qemu/+bug/1718118).

spapr_lmb_release uses a structure called sPAPRDIMMState, stored in the
spapr->pending_dimm_unplugs QTAIL, to track how many LMB DRCs are left
to be detached by the guest. When there are no more DRCs left, this
structure is deleted and the pc-dimm unplug handler is called to
finish the process.

This patch reuses the sPAPRDIMMState to allow unplug_request to know
if there is an ongoing unplug process for a given dimm, aborting the
unplug request in this case, by doing the following changes:

- in spapr_lmb_release callback, move the dimm state removal to the
end, after pc-dimm unplug handler. With this change we can check for
the existence of the dimm state to see if the unplug process is
done.

- use spapr_pending_dimm_unplugs_find in spapr_memory_unplug_request
to check if the dimm state exists. If positive, there is an unplug
operation already in progress for this dimm, meaning that we should
abort it and warn the user about it.

Fixes: https://bugs.launchpad.net/qemu/+bug/1718118
Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-10-17 10:34:00 +11:00
David Gibson
1ed9c8af50 target/ppc: Add POWER9 DD2.0 model information
At the moment the only POWER9 model which is listed in qemu is v1.0 (aka
"DD1").  This is a very early (read, buggy) version which will never be
released to the public - it was included in qemu only for the convenience
of those doing bringup on the early silicon.  For bonus points, we actually
had its PVR incorrect in the table (0x004e0000 instead of 0x004e0100).  We
also never actually implemented the differences in behaviour (read, bugs)
that marked DD1 in qemu.

Now that we know the PVR for the substantially better v2.0 (DD2) chip,
include it and make it the default POWER9 in qemu.  For the time being we
leave the DD1 definition in place for the poor souls (read, me) who still
need to work with DD1 hardware.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-10-17 10:34:00 +11:00
Greg Kurz
827b17c468 spapr: sanity check size of the CAS buffer
The CAS buffer is provided by SLOF. A broken SLOF could pass a silly
size: either smaller than the diff header, in which case the current
code will try to allocate 16 Exabytes of memory and g_malloc0() will
abort, or bigger than the maximum memory provisioned for SLOF (ie,
40 Megabytes), which doesn't make sense. Both cases indicate that
SLOF has a bug.

Let's print out an explicit error message and exit since rebooting as
we do with other errors would only result in a reset loop.

Signed-off-by: Greg Kurz <groug@kaod.org>
[dwg: Fix format specifier that broke 32-bit builds]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-10-17 10:34:00 +11:00
Greg Kurz
dc1b5eee86 spapr: fix OF word name in comment
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-10-17 10:34:00 +11:00
Greg Kurz
a4f3885c74 hw/ppc: use 0 instead of fdt_path_offset(fdt, "/")
The offset of the root node is guaranteed to be 0.

This doesn't fix anything, it's just trivial cleanup of the two
remaining places where this was done under hw/ppc.

Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-10-17 10:34:00 +11:00
Eduardo Habkost
fd3b02c889 pci: Add INTERFACE_CONVENTIONAL_PCI_DEVICE to Conventional PCI devices
Add INTERFACE_CONVENTIONAL_PCI_DEVICE to all direct subtypes of
TYPE_PCI_DEVICE, except:

1) The ones that already have INTERFACE_PCIE_DEVICE set:

* base-xhci
* e1000e
* nvme
* pvscsi
* vfio-pci
* virtio-pci
* vmxnet3

2) base-pci-bridge

Not all PCI bridges are Conventional PCI devices, so
INTERFACE_CONVENTIONAL_PCI_DEVICE is added only to the subtypes
that are actually Conventional PCI:

* dec-21154-p2p-bridge
* i82801b11-bridge
* pbm-bridge
* pci-bridge

The direct subtypes of base-pci-bridge not touched by this patch
are:

* xilinx-pcie-root: Already marked as PCIe-only.
* pcie-pci-bridge: Already marked as PCIe-only.
* pcie-port: all non-abstract subtypes of pcie-port are already
  marked as PCIe-only devices.

3) megasas-base

Not all megasas devices are Conventional PCI devices, so the
interface names are added to the subclasses registered by
megasas_register_types(), according to information in the
megasas_devices[] array.

"megasas-gen2" already implements INTERFACE_PCIE_DEVICE, so add
INTERFACE_CONVENTIONAL_PCI_DEVICE only to "megasas".

Acked-by: Alberto Garcia <berto@igalia.com>
Acked-by: John Snow <jsnow@redhat.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-10-15 05:54:43 +03:00
Peter Maydell
ab16152926 Migration pull 2017-09-27
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJZy64HAAoJEAUWMx68W/3nTqwP/A5Gx4Qwkv5KKdpM0YLq//d+
 OODmzl7Ni3a5Up1ETqGdLb84estrgY+5DISp73Rkt4a5tbT7+XKrhb4qD+93NnTe
 zynY9in4C1jGxYm7YzeOhwSeIiuLZMTCLQlGdYw7/nunIFwkItUEvAFx3AG1WCJe
 2Mk0lvmg4LikruDDMdzqZaJu7h5RU5sQjA7SsyrTBdsN7tNWl3rKLYGXwgzv0uz5
 n2xkUgzvvnj1Bk/Adojkn05yxA86xKD/4rhFED9fjNVSjAGHMrHIWOJ70V26Cg5w
 3gJ+5mesWsH+erf0JFYv0S38SyFbmIOE39Nn13D/d0o1x89P8B8cgqbi3ADTKM77
 875wuIVnZzi2vIwVdxXQ9GHQ79cpXwr2fOfQ2rjT6Ll95K+u/MQG86fQiO0eJW+0
 KwQVCwwh+HmCUcCogMuxAc9+F8C8qolwCi/9QXwS2yLBElHKaWDIMyTce36cW9d7
 cZaKIOeSJUGNFoaWZnXN88MRuOYbdywTl+GddVAW3+VJCTYV2oi0o5fsTfxXy5AV
 y7uYo/pcSj2gSZJ5GairMlB6p5iXnE8yusi1e4ZKA1x1TaSHSb6zR59lRUFr+j/L
 JhUCfA85v5/elGqgkYp6UhSzFDJ2ID2oSEMQTIzfVrinOXtnf2KEh33YMbUH5qyo
 yHVEu12uPe9rE6A0vWlu
 =/+LV
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/dgilbert/tags/pull-migration-20170927a' into staging

Migration pull 2017-09-27

# gpg: Signature made Wed 27 Sep 2017 14:56:23 BST
# gpg:                using RSA key 0x0516331EBC5BFDE7
# gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A  9FA9 0516 331E BC5B FDE7

* remotes/dgilbert/tags/pull-migration-20170927a:
  migration: Route more error paths
  migration: Route errors up through vmstate_save
  migration: wire vmstate_save_state errors up to vmstate_subsection_save
  migration: Check field save returns
  migration: check pre_save return in vmstate_save_state
  migration: pre_save return int
  migration: disable auto-converge during bulk block migration

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-09-27 22:44:51 +01:00
Dr. David Alan Gilbert
44b1ff319c migration: pre_save return int
Modify the pre_save method on VMStateDescription to return an int
rather than void so that it potentially can fail.

Changed zillions of devices to make them return 0; the only
case I've made it return non-0 is hw/intc/s390_flic_kvm.c that already
had an error_report/return case.

Note: If you add an error exit in your pre_save you must emit
an error_report to say why.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20170925112917.21340-2-dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2017-09-27 11:35:59 +01:00
Mark Cave-Ayland
e451b85f1b macio: use object link between MACIO_IDE and MAC_DBDMA object
Using a standard QOM object link we can pass a reference to the MAC_DBDMA
controller to the MACIO_IDE object which removes the last external parameter
to macio_ide_register_dma().

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-27 13:05:41 +10:00
Mark Cave-Ayland
0fc84331d6 macio: pass channel into MACIOIDEState via qdev property
One of the reasons macio_ide_register_dma() needs to exist is because the
channel id isn't passed into the MACIO_IDE object. Pass in the channel id
using a qdev property to remove this requirement.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-27 13:05:41 +10:00
Greg Kurz
1ec26c757d spapr: fix the value of SDR1 in kvmppc_put_books_sregs()
When running with KVM PR, if a new HPT is allocated we need to inform
KVM about the HPT address and size. This is currently done by hacking
the value of SDR1 and pushing it to KVM in several places.

Also, migration breaks the guest since it is very unlikely the HPT has
the same address in source and destination, but we push the incoming
value of SDR1 to KVM anyway.

This patch introduces a new virtual hypervisor hook so that the spapr
code can provide the correct value of SDR1 to be pushed to KVM each
time kvmppc_put_books_sregs() is called.

It allows to get rid of all the hacking in the spapr/kvmppc code and
it fixes migration of nested KVM PR.

Suggested-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-27 13:05:41 +10:00
Cédric Le Goater
15fcedb26f ppc/pnv: check for OPAL firmware file presence
and exit before uselessly trying to load it if the file does not
exists.

Issue discovered by Coverity Scan.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-27 13:05:41 +10:00
Greg Kurz
30b3bc5aa9 spapr_pci: make index property mandatory
PHBs can be created with an index property, in which case the machine
code automatically sets all the MMIO windows at addresses derived from
the index. Alternatively, they can be manually created without index,
but the user has to provide addresses for all MMIO windows.

The non-index way happens to be more trouble than it's worth: it's
difficult to use, keeps requiring (potentially incompatible) changes
when some new parameter needs adding, and is awkward to check for
collisions. It currently even has a bug that prevents to use two
non-index PHBs because their child DRCs are all derived from the
same index == -1 value, and, thus, collide.

This patch hence makes the index property mandatory. As a consequence,
the PHB's memory regions and BUID are now always configured according
to the index, and it is no longer possible to set them from the command
line.

This DOES BREAK backwards compat, but we don't think the non-index
PHB feature was used in practice (at least libvirt doesn't) and the
simplification is worth it.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-27 13:05:41 +10:00
Greg Kurz
332f7721cb spapr: introduce helpers to migrate HPT chunks and the end marker
This consolidates some duplicated code in a dedicated helpers.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-27 13:05:41 +10:00
Greg Kurz
14b0d74887 ppc/kvm: generalize the use of kvmppc_get_htab_fd()
The use of KVM_PPC_GET_HTAB_FD is open-coded in kvmppc_read_hptes()
and kvmppc_write_hpte().

This patch modifies kvmppc_get_htab_fd() so that it can be used
everywhere we need to access the in-kernel htab:
- add an index argument
  => only kvmppc_read_hptes() passes an actual index, all other users
     pass 0
- add an errp argument to propagate error messages to the caller.
  => spapr migration code prints the error
  => hpte helpers pass &error_abort to keep the current behavior
     of hw_error()

While here, this also fixes a bug in kvmppc_write_hpte() so that it
opens the htab fd for writing instead of reading as it currently does.
This never broke anything because we currently never call this code,
as explained in the changelog of commit c138593380:

"This support updating htab managed by the hypervisor. Currently
 we don't have any user for this feature. This actually bring the
 store_hpte interface in-line with the load_hpte one. We may want
 to use this when we want to emulate henter hcall in qemu for HV
 kvm."

The above is still true today.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-27 13:05:41 +10:00
Greg Kurz
82be8e7394 ppc/kvm: change kvmppc_get_htab_fd() to return -errno on error
When kvmppc_get_htab_fd() fails, its return value is propagated up to
qemu_savevm_state_iterate() or to qemu_savevm_state_complete_precopy().
All savevm handlers expect to receive a negative errno on error.

Let's patch kvmppc_get_htab_fd() accordingly.

While here, let's change htab_load() in the spapr code to also
propagate the error, since it doesn't make sense to abort() if
we couldn't get the htab fd from KVM.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-27 13:05:41 +10:00
Benjamin Herrenschmidt
58b6283586 ppc: Fix OpenPIC model
Apple uses an IBM MPIC2A without timers, it has 64 sources.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-27 13:05:41 +10:00
Benjamin Herrenschmidt
4f7265ff17 ppc/ide/macio: Add missing registers
The timing register exists on all variants of MacIO IDE, we just
store and return its value.

The interrupts register only exists on KeyLargo but it doesn't
hurt to have it. The lack of this register causes MacOS X to
hangs under some circumstances.

Both are 32-bit only. The HW might support smaller access sizes
but no known OS uses them.

Because the core IDE subsystem doesn't provide us with a way
to query the main (level) interrupt state, nor do we have a way
to know that DBDMA issued a (edge) interrupt, we reflect both
through a private pair of qirq's in order to maintain the
register state.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-27 13:05:41 +10:00
Benjamin Herrenschmidt
3c0622897e ppc/mac: Advertise a high clock frequency for NewWorld Macs
We use 900Mhz, otherwise MacOS X 10.5 refuses to install.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-27 13:05:41 +10:00
Mark Cave-Ayland
c8bd35260d ppc: QOMify g3beige machine
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-27 13:05:41 +10:00
BALATON Zoltan
4c46f372b0 ppc4xx: Add more PLB registers
These registers are present in 440 SoCs (and maybe in others too) and
U-Boot accesses them when printing register info. We don't emulate
these but add them to avoid crashing when they are read or written.

Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-27 13:05:41 +10:00
Kamil Rytarowski
39d96847c9 Replace round_page() with TARGET_PAGE_ALIGN()
This change fixes conflict with the DragonFly BSD headers.

Signed-off-by: Kamil Rytarowski <n54@gmx.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-09-26 09:06:02 +03:00
Igor Mammedov
79e0793614 numa: cpu: calculate/set default node-ids after all -numa CLI options are parsed
Calculating default node-ids for CPUs in possible_cpu_arch_ids()
is rather fragile since defaults calculation uses nb_numa_nodes but
callback might be potentially called early before all -numa CLI
options are parsed, which would lead to cpus assigned only upto
nb_numa_nodes at the time possible_cpu_arch_ids() is called.

Issue was introduced by
(7c88e65 numa: mirror cpu to node mapping in MachineState::possible_cpus)
and for example CLI:
  -smp 4 -numa node,cpus=0 -numa node
would set props.node-id in possible_cpus array for every non
explicitly mapped CPU to the first node.

Issue is not visible to guest nor to mgmt interface due to
  1) implictly mapped cpus are forced to the first node in
     case of partial mapping
  2) in case of default mapping possible_cpu_arch_ids() is
     called after all -numa options are parsed (resulting
     in correct mapping).

However it's fragile to rely on late execution of
possible_cpu_arch_ids(), therefore add machine specific
callback that returns node-id for CPU and use it to calculate/
set defaults at machine_numa_finish_init() time when all -numa
options are parsed.

Reported-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1496314408-163972-1-git-send-email-imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-09-19 16:51:33 -03:00
Igor Mammedov
4482e05cbb cpu: make cpu_generic_init() abort QEMU on error
Almost every user of cpu_generic_init() checks for
returned NULL and then reports failure in a custom way
and aborts process.
Some users assume that call can't fail and don't check
for failure, though they should have checked for it.

In either cases cpu_generic_init() failure is fatal,
so instead of checking for failure and reporting
it various ways, make cpu_generic_init() report
errors in consistent way and terminate QEMU on failure.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <1505318697-77161-3-git-send-email-imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-09-19 09:09:32 -03:00
Greg Kurz
d492a75cfe spapr_events: use QTAILQ_FOREACH_SAFE() in spapr_clear_pending_events()
QTAILQ_FOREACH_SAFE() must be used when removing the current element
inside the loop block.

This fixes a user-after-free error introduced by commit 5625817423
and reported by Coverity (CID 1381017).

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-15 10:29:48 +10:00
Greg Kurz
3b2fcedd52 spapr_cpu_core: cleaning up qdev_get_machine() calls
This patch removes the qdev_get_machine() calls that are made
in spapr_cpu_core.c in situations where we can get an existing
pointer for the MachineState by either passing it as an argument
to the function or by using other already available pointers.

Credits to Daniel Henrique Barboza for the idea and the changelog
text.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-15 10:29:48 +10:00
Greg Kurz
96dbc9af35 spapr_pci: don't create 64-bit MMIO window if we don't need to
When running a pseries-2.2 or older machine type, we get the following
lines in info mtree:

address-space: memory
...
ffffffffffffffff-ffffffffffffffff (prio 0, i/o): alias
 pci@800000020000000.mmio64-alias @pci@800000020000000.mmio
  ffffffffffffffff-ffffffffffffffff

address-space: cpu-memory
...
ffffffffffffffff-ffffffffffffffff (prio 0, i/o): alias
 pci@800000020000000.mmio64-alias @pci@800000020000000.mmio
  ffffffffffffffff-ffffffffffffffff

The same thing occurs when running a pseries-2.7 with

    -global spapr-pci-host-bridge.mem_win_size=2147483648

This happens because we always create a 64-bit MMIO window, even if
we didn't explicitely requested it (ie, mem64_win_size == 0) and the
32-bit window is below 2GiB. It doesn't seem to have an impact on the
guest though because spapr_populate_pci_dt() doesn't advertise the
bogus windows when mem64_win_size == 0.

Since these memory regions don't induce any state, we can safely
choose to not create them when their address is equal to -1,
without breaking migration from existing setups.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-15 10:29:48 +10:00
Greg Kurz
1d36da769a spapr_pci: convert sprintf() to g_strdup_printf()
In order to follow a QEMU common practice.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-15 10:29:48 +10:00
Greg Kurz
1bbadc759e spapr_cpu_core: fail gracefully with non-pseries machine types
Since commit 7cca3e466e ("ppc: spapr: Move VCPU ID calculation into
sPAPR"), QEMU aborts when started with a *-spapr-cpu-core device and
a non-pseries machine.

Let's rely on the already existing call to object_dynamic_cast() instead
of using the SPAPR_MACHINE() macro.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-15 10:29:48 +10:00
Greg Kurz
9ba255365e spapr_pci: handle FDT creation errors with _FDT()
libfdt failures when creating the FDT should cause QEMU to terminate.

Let's use the _FDT() macro which does just that instead of propagating
the error to the caller. spapr_populate_pci_child_dt() no longer needs
to return a value in this case.

Note that, on the way, this get rids of the following nonsensical lines:

    g_assert(!ret);
    if (ret) {

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-15 10:29:48 +10:00
Greg Kurz
99372e785e spapr_pci: use the common _FDT() helper
All other users in hw/ppc already consider an error when building
the FDT to be fatal, even on hotplug paths. There's no valid reason
for spapr_pci to behave differently. So let's used the common _FDT()
helper which terminates QEMU when libfdt fails.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-15 10:29:48 +10:00
Cédric Le Goater
30bf9ed168 spapr: fix CAS-generated reset
The OV5_MMU_RADIX_300 requires special handling in the CAS negotiation
process. It is cleared from the option vector of the guest before
evaluating the changes and re-added later. But, when testing for a
possible CAS reset :

    spapr->cas_reboot = spapr_ovec_diff(ov5_updates,
                                        ov5_cas_old, spapr->ov5_cas);

the bit OV5_MMU_RADIX_300 will each time be seen as removed from the
previous OV5 set, hence generating a reset loop.

Fix this problem by also clearing the same bit in the ov5_cas_old set.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-15 10:29:48 +10:00
Cédric Le Goater
21f3f8db0e ppc/xive: fix OV5_XIVE_EXPLOIT bits
On POWER9, the Client Architecture Support (CAS) negotiation process
determines whether the guest operates in XIVE Legacy compatibility or
in XIVE exploitation mode. Now that we have initial guest support for
the XIVE interrupt controller, let's fix the bits definition which have
evolved in the latest specs.

The platform advertises the XIVE Exploitation Mode support using the
property "ibm,arch-vec-5-platform-support-vec-5", byte 23 bits 0-1 :

 - 0b00 XIVE legacy mode Only
 - 0b01 XIVE exploitation mode Only
 - 0b10 XIVE legacy or exploitation mode

The OS asks for XIVE Exploitation Mode support using the property
"ibm,architecture-vec-5", byte 23 bits 0-1:

 - 0b00 XIVE legacy mode Only
 - 0b01 XIVE exploitation mode Only

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-15 10:29:48 +10:00
Greg Kurz
4c563d9df5 spapr: only update SDR1 once per-cpu during CAS
Commit b55d295e3e added the possibility to support HPT resizing with KVM.
In the case of PR, we need to pass the userspace address of the HPT to KVM
using the SDR1 slot.
This is handled by kvmppc_update_sdr1() which uses CPU_FOREACH() to update
all CPUs. It is hence not needed to call kvmppc_update_sdr1() for each CPU.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-15 10:29:48 +10:00
Greg Kurz
549ce59e2b spapr_pci: use g_strdup_printf()
Building strings with g_strdup_printf() instead of snprintf() is
a QEMU common practice.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-15 10:29:48 +10:00
Greg Kurz
d049bde69d spapr_pci: drop useless check in spapr_populate_pci_child_dt()
spapr_phb_get_loc_code() either returns a non-null pointer, or aborts
if g_strdup_printf() failed to allocate memory.

Signed-off-by: Greg Kurz <groug@kaod.org>
[dwg: Grammatical fix to commit message]
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-15 10:29:48 +10:00
Greg Kurz
8f68760561 spapr_pci: drop useless check in spapr_phb_vfio_get_loc_code()
g_strdup_printf() either returns a non-null pointer, or aborts if it
failed to allocate memory.

Signed-off-by: Greg Kurz <groug@kaod.org>
[dwg: Grammatical fix to commit message]
Acked-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-15 10:29:48 +10:00
Daniel Henrique Barboza
c86c1affae hw/ppc/spapr.c: cleaning up qdev_get_machine() calls
This patch removes the qdev_get_machine() calls that are made in
spapr.c in situations where we can get an existing pointer for
the MachineState by either passing it as an argument to the function
or by using other already available pointers.

The following changes were made:

- spapr_node0_size: static function that is called two times:
at spapr_setup_hpt_and_vrma and ppc_spapr_init. In both cases we can
pass an existing MachineState pointer to it.

- spapr_build_fdt: MachineState pointer can be retrieved from
the existing sPAPRMachineState pointer.

- spapr_boot_set: the opaque in the first arg is a sPAPRMachineState
pointer as we can see inside ppc_spapr_init:

    qemu_register_boot_set(spapr_boot_set, spapr);

We can get a MachineState pointer from it.

- spapr_machine_device_plug and spapr_machine_device_unplug_request: the
MachineState, sPAPRMachineState, MachineClass and sPAPRMachineClass pointers
can all be retrieved from the HotplugHandler pointer.

Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-15 10:29:48 +10:00