Commit Graph

414 Commits

Author SHA1 Message Date
Juan Quintela 9907e842d7 migration: Rename save_live_setup() to save_setup()
We are going to use it now for more than save live regions.
Once there rename qemu_savevm_state_begin() to qemu_savevm_state_setup().

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20170628095228.4661-2-quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2017-07-10 17:52:21 +01:00
David Gibson 4f9242fc93 spapr: Make DRC reset force DRC into known state
The reset handler for DRCs attempts several state transitions which are
subject to various checks and restrictions.  But at reset time we know
there is no guest, so we can ignore most of the usual sequencing rules and
just set the DRC back to a known state.  In fact, it's safer to do so.

The existing code also has several redundant checks for
drc->awaiting_release inside a block which has already tested that.  This
patch removes those and sets the DRC to a fixed initial state based only
on whether a device is currently plugged or not.

With DRCs correctly reset to a state based on device presence, we don't
need to force state transitions as cold plugged devices are processed.
This allows us to remove all the callers of the set_*_state() methods from
outside spapr_drc.c.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-06-30 14:03:32 +10:00
Daniel Henrique Barboza aca8bf9f1c hw/ppc/spapr.c: consecutive 'spapr->patb_entry = 0' statements
In ppc_spapr_reset(), if the guest is using HPT, the code was executing:

    } else {
        spapr->patb_entry = 0;
        spapr_setup_hpt_and_vrma(spapr);
    }

And, at the end of spapr_setup_hpt_and_vrma:

    /* We're setting up a hash table, so that means we're not radix */
    spapr->patb_entry = 0;

Resulting in spapr->patb_entry being assigned to 0 twice in a row.

Given that 'spapr_setup_hpt_and_vrma' is also called inside
'spapr_check_setup_free_hpt' of spapr_hcall.c, this trivial patch removes
the 'patb_entry = 0' assignment from the 'else' clause inside ppc_spapr_reset
to avoid this behavior.

Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-06-30 14:03:31 +10:00
Greg Kurz 46f7afa370 spapr: fix migration of ICPState objects from/to older QEMU
Commit 5bc8d26de2 ("spapr: allocate the ICPState object from under
sPAPRCPUCore") moved ICPState objects from the machine to CPU cores.
This is an improvement since we no longer allocate ICPState objects
that will never be used. But it has the side-effect of breaking
migration of older machine types from older QEMU versions.

This patch allows spapr to register dummy "icp/server" entries to vmstate.
These entries use a dedicated VMStateDescription that can swallow and
discard state of an incoming migration stream, and that don't send anything
on outgoing migration.

As for real ICPState objects, the instance_id is the cpu_index of the
corresponding vCPU, which happens to be equal to the generated instance_id
of older machine types.

The machine can unregister/register these entries when CPUs are dynamically
plugged/unplugged.

This is only available for pseries-2.9 and older machines, thanks to a
compat property.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-06-30 14:03:31 +10:00
Bharata B Rao d39c90f5f3 spapr: Fix migration of Radix guests
Fix migration of radix guests by ensuring that we issue
KVM_PPC_CONFIGURE_V3_MMU for radix case post migration.

Reported-by: Nageswara R Sastry <rnsastry@linux.vnet.ibm.com>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-06-30 14:03:31 +10:00
Bharata B Rao 3a38429748 spapr: Add a "no HPT" encoding to HTAB migration stream
Add a "no HPT" encoding (using value -1) to the HTAB migration
stream (in the place of HPT size) when the guest doesn't allocate HPT.
This will help the target side to match target HPT with the source HPT
and thus enable successful migration.

Suggested-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-06-30 14:03:31 +10:00
David Gibson d5fc133eed ppc: Rework CPU compatibility testing across migration
Migrating between different CPU versions is a bit complicated for ppc.
A long time ago, we ensured identical CPU versions at either end by
checking the PVR had the same value.  However, this breaks under KVM
HV, because we always have to use the host's PVR - it's not
virtualized.  That would mean we couldn't migrate between hosts with
different PVRs, even if the CPUs are close enough to compatible in
practice (sometimes identical cores with different surrounding logic
have different PVRs, so this happens in practice quite often).

So, we removed the PVR check, but instead checked that several flags
indicating supported instructions matched.  This turns out to be a bad
idea, because those instruction masks are not architected information, but
essentially a TCG implementation detail.  So changes to qemu internal CPU
modelling can break migration - this happened between qemu-2.6 and
qemu-2.7.  That was addressed by 146c11f1 "target-ppc: Allow eventual
removal of old migration mistakes".

Now, verification of CPU compatibility across a migration basically doesn't
happen.  We simply ignore the PVR of the incoming migration, and hope the
cpu on the destination is close enough to work.

Now that we've cleaned up handling of processor compatibility modes
for pseries machine type, we can do better.  For new machine types
(pseries-2.10+) We allow migration if:

    * The source and destination PVRs are for the same type of CPU, as
      determined by CPU class's pvr_match function
OR  * When the source was in a compatibility mode, and the destination CPU
      supports the same compatibility mode

For older machine types we retain the existing behaviour - current CAS
code will usually set a compat mode which would break backwards
migration if we made them use the new behaviour. [Fixed from an
earlier version by Greg Kurz].

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Tested-by: Andrea Bolognani <abologna@redhat.com>
2017-06-30 14:03:31 +10:00
David Gibson 66d5c492dd pseries: Reset CPU compatibility mode
Currently, the CPU compatibility mode is set when the cpu is initialized,
then again when the guest negotiates features.  This means if a guest
negotiates a compatibility mode, then reboots, that compatibility mode
will be retained across the reset.

Usually that will get overridden when features are negotiated on the next
boot, but it's still not really correct.  This patch moves the initial set
up of the compatibility mode from cpu init to reset time.  The mode *is*
retained if the reboot was caused by the feature negotiation (it might
be important in that case, though it's unlikely).

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Tested-by: Andrea Bolognani <abologna@redhat.com>
2017-06-30 14:03:31 +10:00
David Gibson 7843c0d60d pseries: Move CPU compatibility property to machine
Server class POWER CPUs have a "compat" property, which is used to set the
backwards compatibility mode for the processor.  However, this only makes
sense for machine types which don't give the guest access to hypervisor
privilege - otherwise the compatibility level is under the guest's control.

To reflect this, this removes the CPU 'compat' property and instead
creates a 'max-cpu-compat' property on the pseries machine.  Strictly
speaking this breaks compatibility, but AFAIK the 'compat' option was
never (directly) used with -device or device_add.

The option was used with -cpu.  So, to maintain compatibility, this
patch adds a hack to the cpu option parsing to strip out any compat
options supplied with -cpu and set them on the machine property
instead of the now deprecated cpu property.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Tested-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Greg Kurz <groug@kaod.org>
Tested-by: Andrea Bolognani <abologna@redhat.com>
2017-06-30 14:03:31 +10:00
Peter Xu 15c3850325 migration: move skip_section_footers
Move it into MigrationState, revert its meaning and renaming it to
send_section_footer, with a property bound to it. Same trick is played
like previous patches.

Removing savevm_skip_section_footers().

Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1498536619-14548-9-git-send-email-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2017-06-28 11:18:39 +02:00
Peter Xu 71dd4c1a56 migration: move skip_configuration out
It was in SaveState but now moved to MigrationState altogether, reverted
its meaning, then renamed to "send_configuration". Again, using
HW_COMPAT_2_3 for old PC/SPAPR machines, and accel_register_prop() for
xen_init().

Removing savevm_skip_configuration().

Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1498536619-14548-8-git-send-email-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2017-06-28 11:18:38 +02:00
Peter Xu 5272298c48 migration: move global_state.optional out
Put it into MigrationState then we can use the properties to specify
whether to enable storing global state.

Removing global_state_set_optional() since now we can use HW_COMPAT_2_3
for x86/power, and AccelClass.global_props for Xen.

Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1498536619-14548-6-git-send-email-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2017-06-28 11:18:38 +02:00
Marc-André Lureau 9ed442b8ae pc-dimm: use get_uint() for dimm properties
TYPE_PC_DIMM's property PC_DIMM_ADDR_PROP is defined with
DEFINE_PROP_UINT64().

TYPE_PC_DIMM's property PC_DIMM_NODE_PROP is defined with
DEFINE_PROP_UINT32().

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20170607163635.17635-22-marcandre.lureau@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2017-06-20 14:31:32 +02:00
Peter Maydell 735286a4f8 migration/next for 20170613
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJZP6n5AAoJEPSH7xhYctcj04oQAJczMfc2X8vTwII6lN9klf+T
 Cy32B4WB8FBO9M7oJYD/yytJ3ibcLuMwKwTy/GGfaTspuYDI/HrplUD3Pt+trDPc
 fUxmTNjK9vE9foPAwOTSwTGsdOp5ICoZuDjHTj8gtHmfFLclDxxJMojtthMJ1Csc
 qn9oJzjLn3izn8C6CY6oXGnqOt6gy2lz+RqNKlve/bwxaVdQIXTXCVsLWwQZuj48
 VI9qAFw9TsgSBi9dlTYpVfdMvItO73SVYd2c1ETzL0YSNK3S/Yhpww7fyK8TQNpO
 Y8xXMMBMybHZej1ixHXh01CRmEnBZXpjLCIXnWwxQGXxTH8p7F+W1+lhDTL4IIXR
 Py0EwiPUj4sPyTW2htSnDBRtE1uHcJlDtsFAAmsEqfeASet7ueE2bkfKwWUftqTs
 GZ7ikseIb9F0eQKjecYcEfaLtYNn+0UflgVkimW1gXIeuO58VYLpa8vdiUV3eKJn
 UCDDHGYKf7QJQLpSzYWXGRT4HJOQvaCbJ0a03hKceYyLB6rJv96khajirbczKZ92
 cja0EJfDy5S9fBulWRveHKLUAFMrR3zA4DhlK0pb591uIs4iMcKH3egHQZpv0uf0
 iifWNI+AFuorhQfdhV2G4Zg1g/fwI2RRJK7HdBOklulUrcr0caPvjjGdbA3Q0Hf6
 u61pWdr+Yb3XPaqlC2AH
 =EFHC
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/juanquintela/tags/migration/20170613' into staging

migration/next for 20170613

# gpg: Signature made Tue 13 Jun 2017 10:01:45 BST
# gpg:                using RSA key 0xF487EF185872D723
# gpg: Good signature from "Juan Quintela <quintela@redhat.com>"
# gpg:                 aka "Juan Quintela <quintela@trasno.org>"
# Primary key fingerprint: 1899 FF8E DEBF 58CC EE03  4B82 F487 EF18 5872 D723

* remotes/juanquintela/tags/migration/20170613:
  migration: Move migration.h to migration/
  migration: Move remaining exported functions to migration/misc.h
  migration: create global_state.c
  migration: ram_control_* are implemented in qemu_file
  migration: Commands are only used inside migration.c
  migration: Move constants to savevm.h
  migration: Move dump_vmsate_json_to_file() to misc.h
  migration: Split registration functions from vmstate.h
  migration: Move self_announce_delay() to misc.h
  migration: Remove MigrationState from migration_channel_incomming()
  ram: Now POSTCOPY_ACTIVE is the same that STATUS_ACTIVE
  ram: Print block stats also in the complete case
  migration: Don't try to set *errp directly
  migration: isolate return path on src

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-06-13 13:51:29 +01:00
Juan Quintela c4b63b7cc5 migration: Move remaining exported functions to migration/misc.h
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
2017-06-13 11:00:45 +02:00
Juan Quintela 84a899de8c migration: create global_state.c
It don't belong anywhere else, just the global state where everybody
can stick other things.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2017-06-13 11:00:45 +02:00
Juan Quintela f2a8f0a631 migration: Split registration functions from vmstate.h
They are indpendent, and nowadays almost every device register things
with qdev->vmsd.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
2017-06-13 11:00:44 +02:00
Greg Kurz ad265631c0 xics: introduce macros for ICP/ICS link properties
These properties are part of the XICS API. They deserve to appear
explicitely in the XICS header file.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-06-09 12:12:34 +10:00
Thomas Huth 4871dd4c3f hw/ppc/spapr: Adjust firmware name for PCI bridges
SLOF uses "pci" as name for PCI bridges nodes in the device tree instead
of "pci-bridges", so booting via bootindex from a device behind a PCI
bridge currently does not work since QEMU passes the wrong name in the
"qemu,boot-list" property. Fix it by changing the name of the PCI bridge
nodes to "pci" instead.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1459170
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-06-08 14:38:27 +10:00
David Gibson 0be4e88621 spapr: Change DRC attach & detach methods to functions
DRC objects have attach & detach methods, but there's only one
implementation.  Although there are some differences in its behaviour for
different DRC types, the overall structure is the same, so while we might
want different method implementations for some parts, we're unlikely to
want them for the top-level functions.

So, replace them with direct function calls.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-06-08 14:38:26 +10:00
David Gibson 454b580ae9 spapr: Don't misuse DR-indicator in spapr_recover_pending_dimm_state()
With some combinations of migration and hotplug we can lost temporary state
indicating how many DRCs (guest side hotplug handles) are still connected
to a DIMM object in the process of removal.  When we hit that situation
spapr_recover_pending_dimm_state() is used to scan more extensively and
work out the right number.

It does this using drc->indicator state to determine what state of
disconnection the DRC is in.  However, this is not safe, because the
indicator state is guest settable - in fact it's more-or-less a purely
guest->host notification mechanism which should have no bearing on the
internals of hotplug state management.

So, replace the test for this with a test on drc->dev, which is a purely
qemu side managed variable, and updated the same BQL critical section as
the indicator state.

This does introduce an off-by-one change, because the indicator state was
updated before the call to spapr_lmb_release() on the current DRC, whereas
drc->dev is updated afterwards.  That's corrected by always decrementing
the nr_lmbs value instead of only doing so in the case where we didn't
have to recover information.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-06-08 14:38:26 +10:00
Greg Kurz 8a9e0e7b89 spapr: fix memory leak in spapr_memory_pre_plug()
The string returned by object_property_get_str() is dynamically allocated.

(Spotted by Coverity, CID 1375942)

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-06-08 11:05:31 +10:00
Peter Maydell e02bbe1956 ppc patch queue 2017-06-06
Accumulated patches for ppc targets and the pseries machine type.
 
 The big thing in this batch is a start on a substantial cleanup of the
 pseries hotplug mechanisms, which were pretty confusing.  For now
 these shouldn't cause substantial behavioural changes, but I am hoping
 these lead to clearer code and eventually to fixes for the bugs we
 have in hotplug handling, particularly when hotplug and migration are
 combined.
 
 The remaining patches are mostly bugfixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJZNhgSAAoJEGw4ysog2bOSERwP/A7T7UJ8XXWit9QXGCi+G83w
 +RUuHxjA9qFqrg1zYqFyLg3ctGl93Sxu7mzI5MOIKwAVXlTsE6+84TH7zBc18DPB
 fekPWmzJ6jfiVO+1Zg1JPWorMfIHDDc2v6Q6qPfD8KWbt02yPfrXbKlivQB4hVZ4
 Qb4VJdjZgBDcVy79xhcW5k6v8dVw8PdSyDmkQrBhccI0noLerhI41Mgt7QQaWQRH
 Le3ziexUpWelVCRQB0FqE/PIWo2+NY/e0pumX7Aqtjs/G35KjOXy0ja3yKLjfeUW
 Z4NugIO2I2hncERa68YFar/BqG26DX8KCErNMDkn7LyZcoDAQWhcDH+65G1BNuf2
 jW+KApMNm+N1vXabbz8P9BbLjuZpRQQhyPOxB3I8UGaTYGtCPe/lUCe2/V8EbKNa
 VFavc1UuLftOZuJj/rYGJeU/4JBU6srbAKCO3VVK4Tnd8DyiT3QCpUWEkjv+J6jo
 co35oYBavLfQPMr+rsX15lgbmZwg7iBV+dgKLa2+cwmKXzCf7aYe38aJy7nRBmhb
 ivhH3bKtdysy0qq4UYaCgW06qQcVF0QMJaxFQ0X7I+GBNwHA7wdZD/i6IMcO6Z7H
 7gQdavBTdukgKb2+pVjR58H13ieHXuBxktonhOz70rvEDVa4xx8pxhnZlpSiH2ha
 RzpkhanrwEeECG6Lke/3
 =QDWB
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.10-20170606' into staging

ppc patch queue 2017-06-06

Accumulated patches for ppc targets and the pseries machine type.

The big thing in this batch is a start on a substantial cleanup of the
pseries hotplug mechanisms, which were pretty confusing.  For now
these shouldn't cause substantial behavioural changes, but I am hoping
these lead to clearer code and eventually to fixes for the bugs we
have in hotplug handling, particularly when hotplug and migration are
combined.

The remaining patches are mostly bugfixes.

# gpg: Signature made Tue 06 Jun 2017 03:48:50 BST
# gpg:                using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg:                 aka "David Gibson (kernel.org) <dwg@kernel.org>"
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-2.10-20170606:
  spapr: Remove some non-useful properties on DRC objects
  spapr: Eliminate spapr_drc_get_type_str()
  spapr: Move configure-connector state into DRC
  spapr: Clean up spapr_dr_connector_by_*()
  spapr: Introduce DRC subclasses
  spapr/drc: don't migrate DRC of cold-plugged CPUs and LMBs
  spapr: Allow boot from vhost-*-scsi backends
  ppc/pnv: check the return value of fdt_setprop()
  spapr_nvram: Check return value from blk_getlength()
  target/ppc: Fixup set_spr error in h_register_process_table
  target-ppc: Fix openpic timer read register offset
  spapr: Make DRC get_index and get_type methods into plain functions
  spapr: Abolish DRC set_configured method
  spapr: Abolish DRC get_fdt method
  spapr: Move DRC RTAS calls into spapr_drc.c
  migration: Mark CPU states dirty before incoming migration/loadvm
  migration: remove register_savevm()

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-06-06 14:30:06 +01:00
David Gibson b8fdd530be spapr: Move configure-connector state into DRC
Currently the sPAPRMachineState contains a list of sPAPRConfigureConnector
structures which store intermediate state for the ibm,configure-connector
RTAS call.

This was an attempt to separate this state from the core of the DRC state.
However the configure connector process is intimately tied to the DRC
model, so there's really no point trying to have two levels of interface
here.

Moving the configure-connector state into its corresponding DRC allows
removal of a number of helpers for maintaining the anciliary list.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-06-06 09:24:17 +10:00
David Gibson fbf5539718 spapr: Clean up spapr_dr_connector_by_*()
* Change names to something less ludicrously verbose
 * Now that we have QOM subclasses for the different DRC types, use a QOM
   typename instead of a PAPR type value parameter

The latter allows removal of the get_type_shift() helper.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-06-06 09:24:08 +10:00
David Gibson 2d33581899 spapr: Introduce DRC subclasses
Currently we only have a single QOM type for all DRCs, but lots of
places where we switch behaviour based on the DRC's PAPR defined type.
This is a poor use of our existing type system.

So, instead create QOM subclasses for each PAPR defined DRC type.  We
also introduce intermediate subclasses for physical and logical DRCs,
a division which will be useful later on.

Instead of being stored in the DRC object itself, the PAPR type is now
stored in the class structure.  There are still many places where we
switch directly on the PAPR type value, but this at least provides the
basis to start to remove those.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-06-06 09:23:46 +10:00
Felipe Franciosi c4e13492af spapr: Allow boot from vhost-*-scsi backends
The current implementation of spapr_get_fw_dev_path() doesn't take into
consideration vhost-*-scsi devices. This makes said devices unbootable
on PPC as SLOF is unable to work out the path to scan boot disks.

This makes VMs bootable on spapr when using vhost-*-scsi by implementing
a disk path for VHostSCSICommon (which currently includes both
vhost-user-scsi and vhost-scsi).

Signed-off-by: Felipe Franciosi <felipe@nutanix.com>
Signed-off-by: Mike Cui <cui@nutanix.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-06-06 09:19:01 +10:00
David Gibson 0b55aa91c9 spapr: Make DRC get_index and get_type methods into plain functions
These two methods only have one implementation, and the spec they're
implementing means any other implementation is unlikely, verging on
impossible.

So replace them with simple functions.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
2017-06-06 08:53:24 +10:00
Igor Mammedov 99861ecbc5 spapr: cleanup spapr_fixup_cpu_numa_dt() usage
even though spapr_fixup_cpu_numa_dt() has no effect on FDT
if numa is disabled, don't call it uselessly. It makes it
obvious at call sites that function is needed only when numa
is enabled.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1496161442-96665-7-git-send-email-imammedo@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-06-05 14:59:09 -03:00
Igor Mammedov 15f8b14228 numa: move numa_node from CPUState into target specific classes
Move vcpu's associated numa_node field out of generic CPUState
into inherited classes that actually care about cpu<->numa mapping,
i.e: ARMCPU, PowerPCCPU, X86CPU.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1496161442-96665-6-git-send-email-imammedo@redhat.com>
[ehabkost: s/CPU is belonging to/CPU belongs to/ on comments]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-06-05 14:59:09 -03:00
Igor Mammedov a0ceb640d0 numa: consolidate cpu_preplug fixups/checks for pc/arm/spapr
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <1496161442-96665-2-git-send-email-imammedo@redhat.com>
[ehabkost: Fix indentation]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-06-05 14:59:08 -03:00
Daniel Henrique Barboza 16ee99805e hw/ppc/spapr.c: recover pending LMB unplug info in spapr_lmb_release
When a LMB hot unplug starts, the current DRC LMB status is stored at
spapr->pending_dimm_unplugs QTAILQ. This queue isn't migrated, thus
if a migration occurs in the middle of a LMB unplug the
spapr_lmb_release callback will lost track of the LMB unplug progress.

This patch implements a new recover function spapr_recover_pending_dimm_state
that is used inside spapr_lmb_release to recover this DRC LMB release
status that is lost during the migration.

Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
[dwg: Minor stylistic changes, simplify error handling]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-25 11:31:33 +10:00
Daniel Henrique Barboza 318347234d hw/ppc: removing drc->detach_cb and drc->detach_cb_opaque
The pointer drc->detach_cb is being used as a way of informing
the detach() function inside spapr_drc.c which cb to execute. This
information can also be retrieved simply by checking drc->type and
choosing the right callback based on it. In this context, detach_cb
is redundant information that must be managed.

After the previous spapr_lmb_release change, no detach_cb_opaques
are being used by any of the three callbacks functions. This is
yet another information that is now unused and, on top of that, can't
be migrated either.

This patch makes the following changes:

- removal of detach_cb_opaque. the 'opaque' argument was removed from
the callbacks and from the detach() function of sPAPRConnectorClass. The
attribute detach_cb_opaque of sPAPRConnector was removed.

- removal of detach_cb from the detach() call. The function pointer
detach_cb of sPAPRConnector was removed. detach() now uses a
switch(drc->type) to execute the apropriate callback. To achieve this,
spapr_core_release, spapr_lmb_release and spapr_phb_remove_pci_device_cb
callbacks were made public to be visible inside detach().

Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-25 11:31:33 +10:00
David Gibson 0cffce56ae hw/ppc/spapr.c: adding pending_dimm_unplugs to sPAPRMachineState
The LMB DRC release callback, spapr_lmb_release(), uses an opaque
parameter, a sPAPRDIMMState struct that stores the current LMBs that
are allocated to a DIMM (nr_lmbs). After each call to this callback,
the nr_lmbs is decremented by one and, when it reaches zero, the callback
proceeds with the qdev calls to hot unplug the LMB.

Using drc->detach_cb_opaque is problematic because it can't be migrated in
the future DRC migration work. This patch makes the following changes to
eliminate the usage of this opaque callback inside spapr_lmb_release:

- sPAPRDIMMState was moved from spapr.c and added to spapr.h. A new
attribute called 'addr' was added to it. This is used as an unique
identifier to associate a sPAPRDIMMState to a PCDIMM element.

- sPAPRMachineState now hosts a new QTAILQ called 'pending_dimm_unplugs'.
This queue of sPAPRDIMMState elements will store the DIMM state of DIMMs
that are currently going under an unplug process.

- spapr_lmb_release() will now retrieve the nr_lmbs value by getting the
correspondent sPAPRDIMMState. A helper function called spapr_dimm_get_address
was created to fetch the address of a PCDIMM device inside spapr_lmb_release.
When nr_lmbs reaches zero and the callback proceeds with the qdev hot unplug
calls, the sPAPRDIMMState struct is removed from spapr->pending_dimm_unplugs.

After these changes, the opaque argument for spapr_lmb_release is now
unused and is passed as NULL inside spapr_del_lmbs. This and the other
opaque arguments can now be safely removed from the code.

As an additional cleanup made by this patch, the spapr_del_lmbs function
was merged with spapr_memory_unplug_request. The former was being called
only by the latter and both were small enough to fit one single function.

Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
[dwg: Minor stylistic cleanups]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-25 11:31:28 +10:00
Laurent Vivier c871bc70bb spapr: add pre_plug function for memory
This allows to manage errors before the memory
has started to be hotplugged. We already have
the function for the CPU cores.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
[dwg: Fixed a couple of style nits]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-24 17:27:39 +10:00
David Gibson 459264ef24 pseries: Restore support for total vcpus not a multiple of threads-per-core for old machine types
As of pseries-2.7 and later, we require the total number of guest vcpus to
be a multiple of the threads-per-core.  pseries-2.6 and earlier machine
types, however, are supposed to allow this for the sake of migration from
old qemu versions which allowed this.

Unfortunately, 8149e29 "pseries: Enforce homogeneous threads-per-core"
broke this by not considering the old machine type case.  This fixes it by
only applying the check when the machine type supports hotpluggable cpus.
By not-entirely-coincidence, that corresponds to the same time when we
started enforcing total threads being a multiple of threads-per-core.

Fixes: 8149e2992f

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Greg Kurz <groug@kaod.org>
2017-05-24 11:39:53 +10:00
Greg Kurz 3d85885a1b spapr: fix error reporting in xics_system_init()
If the user explicitely asked for kernel-irqchip support and "xics-kvm"
initialization fails, we shouldn't fallback to emulated "xics" as we
do now. It is also awkward to print an error message when we have an
errp pointer argument.

Let's use the errp argument to report the error and let the caller decide.
This simplifies the code as we don't need a local Error * here.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-24 11:39:53 +10:00
Greg Kurz 07572c0653 spapr: ensure core_slot isn't NULL in spapr_core_unplug()
If we go that far on the path of hot-removing a core and we find out that
the core-id is invalid, then we have a serious bug.

Let's make it explicit with an assert() instead of dereferencing a NULL
pointer.

This fixes Coverity issue CID 1375404.

Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-24 11:39:53 +10:00
Bharata B Rao 06ec79e865 spapr: Consolidate HPT freeing code into a routine
Consolidate the code that frees HPT into a separate routine
spapr_free_hpt() as the same chunk of code is called from two places.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-24 11:39:52 +10:00
Greg Kurz 175d2aa038 spapr: sanitize error handling in spapr_ics_create()
The spapr_ics_create() function handles errors in a rather convoluted
way, with two local Error * variables. Moreover, failing to parent the
ICS object to the machine should be considered as a bug but it is
currently ignored.

This patch addresses both issues.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-24 11:39:52 +10:00
Greg Kurz f63ebfe0ac ppc/xics: simplify prototype of xics_spapr_init()
This function only does hypercall and RTAS-call registration, and thus
never returns an error. This patch adapt the prototype to reflect that.

Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-24 11:39:52 +10:00
Stefan Hajnoczi ba9915e1f8 x86 and machine queue, 2017-05-11
Highlights:
 * New "-numa cpu" option
 * NUMA distance configuration
 * migration/i386 vmstatification
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJZFLh3AAoJECgHk2+YTcWmppQQAJe9Y5a3VwHXqvHbwBHX2ysn
 RZDAUPd9DWpbM+UUydyKOVIZ7u5RXbbVq4E0NeCD8VYYd+grZB5Wo1cAzy3b4U2j
 2s+MDqaPMtZtGoqxTsyQOVoVxazT5Kf1zglK+iUEzik44J7LGdro+ty2Z7Ut2c11
 q9rE/GNS78czBm7c4lxgkxXW4N95K/tEGlLtDQ7uct//3U/ZimF+mO6GcbVFlOWT
 4iEbOz2sqvBVv22nLJRufiPgFNIW4hizAz5KBWxwGFCCKvT3N6yYNKKjzEpCw+jE
 lpjIRODU02yIZZZY841fLRtyrk7p4zORS8jRaHTdEJgb5bGc/YazxxVL8nzRQT1W
 VxFwAMd+UNrDkV24hpN++Ln2O+b3kwcGZ7uA/qu9d5WvSYUKXlHqcMJ35q6zuhAI
 /ecfYO7EZfVP86VjIt5IH04iV8RChA9Q6de+kQEFa6wHUxufeCOwCFqukGo8zj07
 plX8NcjnzYmSXKnYjHOHao4rKT+DiJhRB60rFiMeKP/qvKbZPjtgsIeonhHm53qZ
 /QwkhowahHKkpAnetIl0QHm8KS4YudAofMi/Fl+he4gRkEbSQVAo6iQb2L4cjcLC
 LNSDDsIVWGem4gCR+vcsFqB3lggRDfltHXm15JKh92UMpOr6RI6s8pD55T7EdnPC
 CfdxWB5kYM6/lLbOHj94
 =48wH
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'ehabkost/tags/x86-and-machine-pull-request' into staging

x86 and machine queue, 2017-05-11

Highlights:
* New "-numa cpu" option
* NUMA distance configuration
* migration/i386 vmstatification

# gpg: Signature made Thu 11 May 2017 08:16:07 PM BST
# gpg:                using RSA key 0x2807936F984DC5A6
# gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>"
# gpg: Note: This key has expired!
# Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF  D1AA 2807 936F 984D C5A6

* ehabkost/tags/x86-and-machine-pull-request: (29 commits)
  migration/i386: Remove support for pre-0.12 formats
  vmstatification: i386 FPReg
  migration/i386: Remove old non-softfloat 64bit FP support
  tests: check -numa node,cpu=props_list usecase
  numa: add '-numa cpu,...' option for property based node mapping
  numa: remove node_cpu bitmaps as they are no longer used
  numa: use possible_cpus for not mapped CPUs check
  machine: call machine init from wrapper
  numa: remove no longer need numa_post_machine_init()
  tests: numa: add case for QMP command query-cpus
  QMP: include CpuInstanceProperties into query_cpus output output
  virt-arm: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu()
  spapr: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu()
  pc: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu()
  numa: do default mapping based on possible_cpus instead of node_cpu bitmaps
  numa: mirror cpu to node mapping in MachineState::possible_cpus
  numa: add check that board supports cpu_index to node mapping
  virt-arm: add node-id property to CPU
  pc: add node-id property to CPU
  spapr: add node-id property to sPAPR core
  ...

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-05-15 14:12:03 +01:00
Igor Mammedov 722387e78d spapr: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu()
it's safe to remove thread node_id != core node_id error
branch as machine_set_cpu_numa_node() also does mismatch
check and is called even before any CPU is created.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <1494415802-227633-10-git-send-email-imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11 16:08:49 -03:00
Igor Mammedov 0b8497f08c spapr: add node-id property to sPAPR core
it will allow switching from cpu_index to core based numa
mapping in follow up patches.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <1494415802-227633-3-git-send-email-imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11 16:08:48 -03:00
Igor Mammedov ea089eebbd numa: move source of default CPUs to NUMA node mapping into boards
Originally CPU threads were by default assigned in
round-robin fashion. However it was causing issues in
guest since CPU threads from the same socket/core could
be placed on different NUMA nodes.
Commit fb43b73b (pc: fix default VCPU to NUMA node mapping)
fixed it by grouping threads within a socket on the same node
introducing cpu_index_to_socket_id() callback and commit
20bb648d (spapr: Fix default NUMA node allocation for threads)
reused callback to fix similar issues for SPAPR machine
even though socket doesn't make much sense there.

As result QEMU ended up having 3 default distribution rules
used by 3 targets /virt-arm, spapr, pc/.

In effort of moving NUMA mapping for CPUs into possible_cpus,
generalize default mapping in numa.c by making boards decide
on default mapping and let them explicitly tell generic
numa code to which node a CPU thread belongs to by replacing
cpu_index_to_socket_id() with @cpu_index_to_instance_props()
which provides default node_id assigned by board to specified
cpu_index.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <1494415802-227633-2-git-send-email-imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11 16:08:48 -03:00
Laurent Vivier 3bfe57165b numa: equally distribute memory on nodes
When there are more nodes than available memory to put the minimum
allowed memory by node, all the memory is put on the last node.

This is because we put (ram_size / nb_numa_nodes) &
~((1 << mc->numa_mem_align_shift) - 1); on each node, and in this
case the value is 0. This is particularly true with pseries,
as the memory must be aligned to 256MB.

To avoid this problem, this patch uses an error diffusion algorithm [1]
to distribute equally the memory on nodes.

We introduce numa_auto_assign_ram() function in MachineClass
to keep compatibility between machine type versions.
The legacy function is used with pseries-2.9, pc-q35-2.9 and
pc-i440fx-2.9 (and previous), the new one with all others.

Example:

qemu-system-ppc64 -S -nographic  -nodefaults -monitor stdio -m 1G -smp 8 \
                  -numa node -numa node -numa node \
                  -numa node -numa node -numa node

Before:

(qemu) info numa
6 nodes
node 0 cpus: 0 6
node 0 size: 0 MB
node 1 cpus: 1 7
node 1 size: 0 MB
node 2 cpus: 2
node 2 size: 0 MB
node 3 cpus: 3
node 3 size: 0 MB
node 4 cpus: 4
node 4 size: 0 MB
node 5 cpus: 5
node 5 size: 1024 MB

After:
(qemu) info numa
6 nodes
node 0 cpus: 0 6
node 0 size: 0 MB
node 1 cpus: 1 7
node 1 size: 256 MB
node 2 cpus: 2
node 2 size: 0 MB
node 3 cpus: 3
node 3 size: 256 MB
node 4 cpus: 4
node 4 size: 256 MB
node 5 cpus: 5
node 5 size: 256 MB

[1] https://en.wikipedia.org/wiki/Error_diffusion

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-Id: <20170502162955.1610-2-lvivier@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
[ehabkost: s/ram_size/size/ at numa_default_auto_assign_ram()]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11 16:08:47 -03:00
David Gibson 9bf502fe12 spapr: Don't accidentally advertise HTM support on POWER9
Logic in spapr_populate_pa_features() enables the bit advertising
Hardware Transactional Memory (HTM) in the guest's device tree only when
KVM advertises its availability with the KVM_CAP_PPC_HTM feature.

However, this assumes that the HTM bit is off in the base template used for
the device tree value.  That is true for POWER8, but not for POWER9.

It looks like that was accidentally changed in 9fb4541 "spapr: Enable ISA
3.0 MMU mode selection via CAS".

Fixes: 9fb4541f58

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
2017-05-11 09:45:15 +10:00
Suraj Jitindar Singh 545d6e2b5c target/ppc: Enable RADIX mmu mode for pseries TCG guest
Now that we have added all the infrastructure we can enable a pseries TCG
guest to use radix.

In order to do this we have to add the appropriate bits to the
ibm,arch-vec-5-platform-support vector to represent that we support both
hash and radix mmu models.

A radix guest can now be booted in pseries tcg mode by specifying:
-cpu POWER9

Note that we assume hash, that is we allocate a hpt, until a guest tells
us otherwise via a H_REGISTER_PROCESS_TABLE call with radix specified - in
which case we free the hpt. If we were right and the guest is hash then
there's nothing for us to do.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-11 09:45:15 +10:00
Cédric Le Goater 71cd4dace9 spapr: remove the 'nr_servers' field from the machine
xics_system_init() does not need 'nr_servers' anymore as it is only
used to define the 'interrupt-controller' node in the device tree. So
let's just compute the value when calling spapr_dt_xics().

This also gives us an opportunity to simplify the xics_system_init()
routine and introduce a specific spapr_ics_create() helper to create
the sPAPR ICS object.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-04-26 12:41:55 +10:00
Cédric Le Goater 5bc8d26de2 spapr: allocate the ICPState object from under sPAPRCPUCore
Today, all the ICPs are created before the CPUs, stored in an array
under the sPAPR machine and linked to the CPU when the core threads
are realized. This modeling brings some complexity when a lookup in
the array is required and it can be simplified by allocating the ICPs
when the CPUs are.

This is the purpose of this proposal which introduces a new 'icp_type'
field under the machine and creates the ICP objects of the right type
(KVM or not) before the PowerPCCPU object are.

This change allows more cleanups : the removal of the icps array under
the sPAPR machine and the removal of the xics_get_cpu_index_by_dt_id()
helper.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-04-26 12:00:42 +10:00