qemu-e2k

Commit Graph

Author	SHA1	Message	Date
Igor Mammedov	32a354dc6c	numa: forbid '-numa node, mem' for 5.1 and newer machine types Deprecation period is run out and it's a time to flip the switch introduced by `cd5ff8333a`. Disable legacy option for new machine types (since 5.1) and amend documentation. '-numa node,memdev' shall be used instead of disabled option with new machine types. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <20200609135635.761587-1-imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-26 09:39:39 -04:00
Markus Armbruster	934df91296	qdev: Make qdev_prop_set_drive() match the other helpers qdev_prop_set_drive() can fail. None of the other qdev_prop_set_FOO() can; they abort on error. To clean up this inconsistency, rename qdev_prop_set_drive() to qdev_prop_set_drive_err(), and create a qdev_prop_set_drive() that aborts on error. Coccinelle script to update callers: @ depends on !(file in "hw/core/qdev-properties-system.c")@ expression dev, name, value; symbol error_abort; @@ - qdev_prop_set_drive(dev, name, value, &error_abort); + qdev_prop_set_drive(dev, name, value); @@ expression dev, name, value, errp; @@ - qdev_prop_set_drive(dev, name, value, errp); + qdev_prop_set_drive_err(dev, name, value, errp); Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200622094227.1271650-14-armbru@redhat.com>	2020-06-23 16:07:07 +02:00
Markus Armbruster	ce189ab230	qdev: Convert bus-less devices to qdev_realize() with Coccinelle All remaining conversions to qdev_realize() are for bus-less devices. Coccinelle script: // only correct for bus-less @dev! @@ expression errp; expression dev; @@ - qdev_init_nofail(dev); + qdev_realize(dev, NULL, &error_fatal); @ depends on !(file in "hw/core/qdev.c") && !(file in "hw/core/bus.c")@ expression errp; expression dev; symbol true; @@ - object_property_set_bool(OBJECT(dev), true, "realized", errp); + qdev_realize(DEVICE(dev), NULL, errp); @ depends on !(file in "hw/core/qdev.c") && !(file in "hw/core/bus.c")@ expression errp; expression dev; symbol true; @@ - object_property_set_bool(dev, true, "realized", errp); + qdev_realize(DEVICE(dev), NULL, errp); Note that Coccinelle chokes on ARMSSE typedef vs. macro in hw/arm/armsse.c. Worked around by temporarily renaming the macro for the spatch run. Signed-off-by: Markus Armbruster <armbru@redhat.com> Acked-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200610053247.1583243-57-armbru@redhat.com>	2020-06-15 22:06:04 +02:00
Markus Armbruster	3c6ef471ee	sysbus: Convert to sysbus_realize() etc. with Coccinelle Convert from qdev_realize(), qdev_realize_and_unref() with null @bus argument to sysbus_realize(), sysbus_realize_and_unref(). Coccinelle script: @@ expression dev, errp; @@ - qdev_realize(DEVICE(dev), NULL, errp); + sysbus_realize(SYS_BUS_DEVICE(dev), errp); @@ expression sysbus_dev, dev, errp; @@ + sysbus_dev = SYS_BUS_DEVICE(dev); - qdev_realize_and_unref(dev, NULL, errp); + sysbus_realize_and_unref(sysbus_dev, errp); - sysbus_dev = SYS_BUS_DEVICE(dev); @@ expression sysbus_dev, dev, errp; expression expr; @@ sysbus_dev = SYS_BUS_DEVICE(dev); ... when != dev = expr; - qdev_realize_and_unref(dev, NULL, errp); + sysbus_realize_and_unref(sysbus_dev, errp); @@ expression dev, errp; @@ - qdev_realize_and_unref(DEVICE(dev), NULL, errp); + sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), errp); @@ expression dev, errp; @@ - qdev_realize_and_unref(dev, NULL, errp); + sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), errp); Whitespace changes minimized manually. Signed-off-by: Markus Armbruster <armbru@redhat.com> Acked-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200610053247.1583243-46-armbru@redhat.com> [Conflicts in hw/misc/empty_slot.c and hw/sparc/leon3.c resolved]	2020-06-15 22:05:28 +02:00
Markus Armbruster	9fc7fc4d39	qom: Less verbose object_initialize_child() All users of object_initialize_child() pass the obvious child size argument. Almost all pass &error_abort and no properties. Tiresome. Rename object_initialize_child() to object_initialize_child_with_props() to free the name. New convenience wrapper object_initialize_child() automates the size argument, and passes &error_abort and no properties. Rename object_initialize_childv() to object_initialize_child_with_propsv() for consistency. Convert callers with this Coccinelle script: @@ expression parent, propname, type; expression child, size; symbol error_abort; @@ - object_initialize_child(parent, propname, OBJECT(child), size, type, &error_abort, NULL) + object_initialize_child(parent, propname, child, size, type, &error_abort, NULL) @@ expression parent, propname, type; expression child; symbol error_abort; @@ - object_initialize_child(parent, propname, child, sizeof(*child), type, &error_abort, NULL) + object_initialize_child(parent, propname, child, type) @@ expression parent, propname, type; expression child; symbol error_abort; @@ - object_initialize_child(parent, propname, &child, sizeof(child), type, &error_abort, NULL) + object_initialize_child(parent, propname, &child, type) @@ expression parent, propname, type; expression child, size, err; expression list props; @@ - object_initialize_child(parent, propname, child, size, type, err, props) + object_initialize_child_with_props(parent, propname, child, size, type, err, props) Note that Coccinelle chokes on ARMSSE typedef vs. macro in hw/arm/armsse.c. Worked around by temporarily renaming the macro for the spatch run. Signed-off-by: Markus Armbruster <armbru@redhat.com> Acked-by: Alistair Francis <alistair.francis@wdc.com> [Rebased: machine opentitan is new (commit `fe0fe4735e`)] Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200610053247.1583243-37-armbru@redhat.com>	2020-06-15 22:05:28 +02:00
Markus Armbruster	3e80f6902c	qdev: Convert uses of qdev_create() with Coccinelle This is the transformation explained in the commit before previous. Takes care of just one pattern that needs conversion. More to come in this series. Coccinelle script: @ depends on !(file in "hw/arm/highbank.c")@ expression bus, type_name, dev, expr; @@ - dev = qdev_create(bus, type_name); + dev = qdev_new(type_name); ... when != dev = expr - qdev_init_nofail(dev); + qdev_realize_and_unref(dev, bus, &error_fatal); @@ expression bus, type_name, dev, expr; identifier DOWN; @@ - dev = DOWN(qdev_create(bus, type_name)); + dev = DOWN(qdev_new(type_name)); ... when != dev = expr - qdev_init_nofail(DEVICE(dev)); + qdev_realize_and_unref(DEVICE(dev), bus, &error_fatal); @@ expression bus, type_name, expr; identifier dev; @@ - DeviceState dev = qdev_create(bus, type_name); + DeviceState dev = qdev_new(type_name); ... when != dev = expr - qdev_init_nofail(dev); + qdev_realize_and_unref(dev, bus, &error_fatal); @@ expression bus, type_name, dev, expr, errp; symbol true; @@ - dev = qdev_create(bus, type_name); + dev = qdev_new(type_name); ... when != dev = expr - object_property_set_bool(OBJECT(dev), true, "realized", errp); + qdev_realize_and_unref(dev, bus, errp); @@ expression bus, type_name, expr, errp; identifier dev; symbol true; @@ - DeviceState dev = qdev_create(bus, type_name); + DeviceState dev = qdev_new(type_name); ... when != dev = expr - object_property_set_bool(OBJECT(dev), true, "realized", errp); + qdev_realize_and_unref(dev, bus, errp); The first rule exempts hw/arm/highbank.c, because it matches along two control flow paths there, with different @type_name. Covered by the next commit's manual conversions. Missing #include "qapi/error.h" added manually. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200610053247.1583243-10-armbru@redhat.com> [Conflicts in hw/misc/empty_slot.c and hw/sparc/leon3.c resolved]	2020-06-15 22:00:10 +02:00
Markus Armbruster	981c3dcd94	qdev: Convert to qdev_unrealize() with Coccinelle For readability, and consistency with qbus_realize(). Coccinelle script: @ depends on !(file in "hw/core/qdev.c")@ typedef DeviceState; DeviceState *dev; symbol false, error_abort; @@ - object_property_set_bool(OBJECT(dev), false, "realized", &error_abort); + qdev_unrealize(dev); @ depends on !(file in "hw/core/qdev.c") && !(file in "hw/core/bus.c")@ expression dev; symbol false, error_abort; @@ - object_property_set_bool(OBJECT(dev), false, "realized", &error_abort); + qdev_unrealize(DEVICE(dev)); Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200610053247.1583243-8-armbru@redhat.com>	2020-06-15 21:36:30 +02:00
Leonardo Bras	0911a60c76	ppc/spapr: Add hotremovable flag on DIMM LMBs on drmem_v2 On reboot, all memory that was previously added using object_add and device_add is placed in this DIMM area. The new SPAPR_LMB_FLAGS_HOTREMOVABLE flag helps Linux to put this memory in the correct memory zone, so no unmovable allocations are made there, allowing the object to be easily hot-removed by device_del and object_del. This new flag was accepted in Power Architecture documentation. Signed-off-by: Leonardo Bras <leobras.c@gmail.com> Reviewed-by: Bharata B Rao <bharata@linux.ibm.com> Message-Id: <20200511200201.58537-1-leobras.c@gmail.com> [dwg: Fixed syntax error spotted by Cédric Le Goater] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-05-27 15:29:36 +10:00
Philippe Mathieu-Daudé	8e5c952b37	hw: Remove unnecessary DEVICE() cast The DEVICE() macro is defined as: #define DEVICE(obj) OBJECT_CHECK(DeviceState, (obj), TYPE_DEVICE) which expands to: ((DeviceState )object_dynamic_cast_assert((Object )(obj), (name), __FILE__, __LINE__, __func__)) This assertion can only fail when @obj points to something other than its stated type, i.e. when we're in undefined behavior country. Remove the unnecessary DEVICE() casts when we already know the pointer is of DeviceState type. Patch created mechanically using spatch with this script: @@ typedef DeviceState; DeviceState *s; @@ - DEVICE(s) + s Acked-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Paul Durrant <paul@xen.org> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Acked-by: John Snow <jsnow@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20200512070020.22782-4-f4bug@amsat.org>	2020-05-15 07:08:52 +02:00
Markus Armbruster	b69c3c21a5	qdev: Unrealize must not fail Devices may have component devices and buses. Device realization may fail. Realization is recursive: a device's realize() method realizes its components, and device_set_realized() realizes its buses (which should in turn realize the devices on that bus, except bus_set_realized() doesn't implement that, yet). When realization of a component or bus fails, we need to roll back: unrealize everything we realized so far. If any of these unrealizes failed, the device would be left in an inconsistent state. Must not happen. device_set_realized() lets it happen: it ignores errors in the roll back code starting at label child_realize_fail. Since realization is recursive, unrealization must be recursive, too. But how could a partly failed unrealize be rolled back? We'd have to re-realize, which can fail. This design is fundamentally broken. device_set_realized() does not roll back at all. Instead, it keeps unrealizing, ignoring further errors. It can screw up even for a device with no buses: if the lone dc->unrealize() fails, it still unregisters vmstate, and calls listeners' unrealize() callback. bus_set_realized() does not roll back either. Instead, it stops unrealizing. Fortunately, no unrealize method can fail, as we'll see below. To fix the design error, drop parameter @errp from all the unrealize methods. Any unrealize method that uses @errp now needs an update. This leads us to unrealize() methods that can fail. Merely passing it to another unrealize method cannot cause failure, though. Here are the ones that do other things with @errp: * virtio_serial_device_unrealize() Fails when qbus_set_hotplug_handler() fails, but still does all the other work. On failure, the device would stay realized with its resources completely gone. Oops. Can't happen, because qbus_set_hotplug_handler() can't actually fail here. Pass &error_abort to qbus_set_hotplug_handler() instead. * hw/ppc/spapr_drc.c's unrealize() Fails when object_property_del() fails, but all the other work is already done. On failure, the device would stay realized with its vmstate registration gone. Oops. Can't happen, because object_property_del() can't actually fail here. Pass &error_abort to object_property_del() instead. * spapr_phb_unrealize() Fails and bails out when remove_drcs() fails, but other work is already done. On failure, the device would stay realized with some of its resources gone. Oops. remove_drcs() fails only when chassis_from_bus()'s object_property_get_uint() fails, and it can't here. Pass &error_abort to remove_drcs() instead. Therefore, no unrealize method can fail before this patch. device_set_realized()'s recursive unrealization via bus uses object_property_set_bool(). Can't drop @errp there, so pass &error_abort. We similarly unrealize with object_property_set_bool() elsewhere, always ignoring errors. Pass &error_abort instead. Several unrealize methods no longer handle errors from other unrealize methods: virtio_9p_device_unrealize(), virtio_input_device_unrealize(), scsi_qdev_unrealize(), ... Much of the deleted error handling looks wrong anyway. One unrealize methods no longer ignore such errors: usb_ehci_pci_exit(). Several realize methods no longer ignore errors when rolling back: v9fs_device_realize_common(), pci_qdev_unrealize(), spapr_phb_realize(), usb_qdev_realize(), vfio_ccw_realize(), virtio_device_realize(). Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200505152926.18877-17-armbru@redhat.com>	2020-05-15 07:08:14 +02:00
Markus Armbruster	40c2281cc3	Drop more @errp parameters after previous commit Several functions can't fail anymore: ich9_pm_add_properties(), device_add_bootindex_property(), ppc_compat_add_property(), spapr_caps_add_properties(), PropertyInfo.create(). Drop their @errp parameter. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200505152926.18877-16-armbru@redhat.com>	2020-05-15 07:08:14 +02:00
Markus Armbruster	d2623129a7	qom: Drop parameter @errp of object_property_add() & friends The only way object_property_add() can fail is when a property with the same name already exists. Since our property names are all hardcoded, failure is a programming error, and the appropriate way to handle it is passing &error_abort. Same for its variants, except for object_property_add_child(), which additionally fails when the child already has a parent. Parentage is also under program control, so this is a programming error, too. We have a bit over 500 callers. Almost half of them pass &error_abort, slightly fewer ignore errors, one test case handles errors, and the remaining few callers pass them to their own callers. The previous few commits demonstrated once again that ignoring programming errors is a bad idea. Of the few ones that pass on errors, several violate the Error API. The Error ** argument must be NULL, &error_abort, &error_fatal, or a pointer to a variable containing NULL. Passing an argument of the latter kind twice without clearing it in between is wrong: if the first call sets an error, it no longer points to NULL for the second call. ich9_pm_add_properties(), sparc32_ledma_realize(), sparc32_dma_realize(), xilinx_axidma_realize(), xilinx_enet_realize() are wrong that way. When the one appropriate choice of argument is &error_abort, letting users pick the argument is a bad idea. Drop parameter @errp and assert the preconditions instead. There's one exception to "duplicate property name is a programming error": the way object_property_add() implements the magic (and undocumented) "automatic arrayification". Don't drop @errp there. Instead, rename object_property_add() to object_property_try_add(), and add the obvious wrapper object_property_add(). Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200505152926.18877-15-armbru@redhat.com> [Two semantic rebase conflicts resolved]	2020-05-15 07:07:58 +02:00
Markus Armbruster	7eecec7d12	qom: Drop object_property_set_description() parameter @errp object_property_set_description() and object_class_property_set_description() fail only when property @name is not found. There are 85 calls of object_property_set_description() and object_class_property_set_description(). None of them can fail: * 84 immediately follow the creation of the property. * The one in spapr_rng_instance_init() refers to a property created in spapr_rng_class_init(), from spapr_rng_properties[]. Every one of them still gets to decide what to pass for @errp. 51 calls pass &error_abort, 32 calls pass NULL, one receives the error and propagates it to &error_abort, and one propagates it to &error_fatal. I'm actually surprised none of them violates the Error API. What are we gaining by letting callers handle the "property not found" error? Use when the property is not known to exist is simpler: you don't have to guard the call with a check. We haven't found such a use in 5+ years. Until we do, let's make life a bit simpler and drop the @errp parameter. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200505152926.18877-8-armbru@redhat.com> [One semantic rebase conflict resolved]	2020-05-15 07:06:49 +02:00
Peter Maydell	b894c6ed4a	ppc patch queue for 2020-04-07 First pull request for qemu-5.1. This includes: * Removal of all remaining cases where we had CAS triggered reboots * A number of improvements to NMI injection * Support for partition scoped radix translation in softmmu * Some fixes for NVDIMM handling * A handful of other minor fixes -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAl6zlgcACgkQbDjKyiDZ s5LhIQ//YRqYuR9JIcIjcL4qFKqk93RrE8KFoxY4Qri7+o6Zru1ATqpVru4tixpd YN0ntF3oMDV/uveQAG771n5iAX7TgbKiOaqIP/qnL6aUEtG4t3KvPhEIZr9Z3kkW eGL8vzObGlkTHJUdGbUaMrpxJZDLW9MADqTVa1PfDGThk3jKCcMqAInBQwFwNifY lAoHJi0SkF8i7ib6dT1Vp+EPw1SYmnLEFyrQU6+jshvxsb9FGNot0widQeSGCJme uolBiO63gxc4AjAt/5PvtAHe1SY9UGUheHp9hMSGoNrFfrCaMgheE8bOsS3MmPJ0 2kEIW4ZIq+CSqnlNlUciaPWn2X5INkXt+XAZyuTSbGC51yLGGpio5fn5CGdDL3wA +mefdJaYvfv5e5UuM38Lv6D7WyPczh2wIDvCOaJP4Lcr+yv0FOgSQOkd6LtnejqV tFqIAVpI7HeNUDmkt/dWRsje6L5gjfPzhA2c1Qm5r7pac4jQXu4POCFP964KXJ1W Ix7qaVOLVcNfSBbHKu79tRHRZjWDiK0SplrHfO6aSUJ/whJ2raT3O8DL9Rbj1M4/ QDYdMvockuwZRWZeYs1+A0LJ3LcPYVpVRvOjGpZEex8DQZ05+Elys33DMEM9MXpK fOiRu/Op286QxEKAkv/xaMMsJpYZ2k+AJXA+7nOCq0SNj0YvF0c= =INvG -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-5.1-20200507' into staging ppc patch queue for 2020-04-07 First pull request for qemu-5.1. This includes: * Removal of all remaining cases where we had CAS triggered reboots * A number of improvements to NMI injection * Support for partition scoped radix translation in softmmu * Some fixes for NVDIMM handling * A handful of other minor fixes # gpg: Signature made Thu 07 May 2020 06:00:55 BST # gpg: using RSA key 75F46586AE61A66CC44E87DC6C38CACA20D9B392 # gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" [full] # gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" [full] # gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" [full] # gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" [unknown] # Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392 * remotes/dgibson/tags/ppc-for-5.1-20200507: target-ppc: fix rlwimi, rlwinm, rlwnm for Clang-9 spapr_nvdimm: Tweak error messages spapr_nvdimm.c: make 'label-size' mandatory target/ppc: Add support for Radix partition-scoped translation target/ppc: Rework ppc_radix64_walk_tree() for partition-scoped translation target/ppc: Extend ppc_radix64_check_prot() with a 'partition_scoped' bool target/ppc: Introduce ppc_radix64_xlate() for Radix tree translation spapr: Don't allow unplug of NVLink2 devices target/ppc: Assert if HV mode is set when running under a pseries machine target/ppc: Introduce a relocation bool in ppc_radix64_handle_mmu_fault() target/ppc: Enforce that the root page directory size must be at least 5 spapr: Drop CAS reboot flag spapr/cas: Separate CAS handling from rebuilding the FDT spapr: Simplify selection of radix/hash during CAS ppc/pnv: Add support for NMI interface ppc/spapr: tweak change system reset helper spapr: Don't check capabilities removed between CAS calls target/ppc: Improve syscall exception logging Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-05-07 10:55:12 +01:00
Greg Kurz	087820e37f	spapr: Drop CAS reboot flag The CAS reboot flag is false by default and all the locations that could set it to true have been dropped. This means that all code blocks depending on the flag being set is dead code and the other code blocks should be executed always. Just do that and drop the now uneeded CAS reboot flag. Fix a comment on the way to make checkpatch happy. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <158514994893.478799.11772512888322840990.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-05-07 11:10:50 +10:00
Alexey Kardashevskiy	91067db1ab	spapr/cas: Separate CAS handling from rebuilding the FDT At the moment "ibm,client-architecture-support" ("CAS") is implemented in SLOF and QEMU assists via the custom H_CAS hypercall which copies an updated flatten device tree (FDT) blob to the SLOF memory which it then uses to update its internal tree. When we enable the OpenFirmware client interface in QEMU, we won't need to copy the FDT to the guest as the client is expected to fetch the device tree using the client interface. This moves FDT rebuild out to a separate helper which is going to be called from the "ibm,client-architecture-support" handler and leaves writing FDT to the guest in the H_CAS handler. This should not cause any behavioral change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20200310050733.29805-3-aik@ozlabs.ru> Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <158514994229.478799.2178881312094922324.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-05-07 11:10:50 +10:00
Greg Kurz	b4b83312e7	spapr: Simplify selection of radix/hash during CAS The guest can select the MMU mode by setting bits 0-1 of byte 24 in OV5 to to 0b00 for hash or 0b01 for radix. As required by the architecture, we terminate the boot process if any other value is found there. The usual way to negotiate features in OV5 is basically ANDing the bitfield provided by the guest and the bitfield of features supported by QEMU, previously populated at machine init. For some not documented reason, MMU is treated differently : bit 1 of byte 24 (the radix/hash bit) is cleared from the guest OV5 and explicitely set in the final negotiated OV5 if radix was requested. Since the only expected input from the guest is the radix/hash bit being set or not, it seems more appropriate to handle this like we do for XIVE. Set the radix bit in spapr->ov5 at machine init if it has a chance to work (ie. power9, either TCG or a radix capable KVM) and rely exclusively on spapr_ovec_intersect() to set the radix bit in spapr->ov5_cas. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <158514993621.478799.4204740354545734293.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-05-07 11:10:50 +10:00
Nicholas Piggin	b5b7f39181	ppc/spapr: tweak change system reset helper Rather than have the helper take an optional vector address override, instead have its caller modify env->nip itself. This is more consistent when adding pnv nmi support, and also with mce injection added later. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Message-Id: <20200325144147.221875-2-npiggin@gmail.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-05-07 11:10:50 +10:00
Cornelia Huck	541aaa1df8	hw: add compat machines for 5.1 Add 5.1 machine types for arm/i440fx/q35/s390x/spapr. Acked-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com> Message-id: 20200429144605.7262-1-cohuck@redhat.com	2020-05-06 10:12:16 -04:00
Peter Maydell	b319df5537	ppc patch queue 2020-03-17 Here's my final pull request for the qemu-5.0 soft freeze. Sorry this is just under the wire - I hit some last minute problems that took a while to fix up and retest. Highlights are: * Numerous fixes for the FWNMI feature * A handful of cleanups to the device tree construction code * Numerous fixes for the spapr-vscsi device * A number of fixes and cleanups for real mode (MMU off) softmmu handling * Fixes for handling of the PAPR RMA * Better handling of hotplug/unplug events during boot * Assorted other fixes -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAl5wnnsACgkQbDjKyiDZ s5JdpQ//eY/AOTs09UhvKxt8DN7lC2WyHGxYSncb2Tj2zaJyPPX9p296IDBMw+KX Cafr6LzwLjpcpOyf/EWzg7qYGbNYoYgRWoOkHI/9pHsrIH3ZvhmnyTVQI5CffeEb EDDXJUQo/2sFpAGeODr5zz+zAQUGzt6ZZUxAiQAF9RYc9ohUGD2x5c86Asx6ZTZo /14bd3qnrcy1x+TxDetb1idFxFr2DsdYqpHAi88zHm+UaWzxYrb7kakd+YbqI24N tYryf5SdtGrWAAdF/7nq2PQJFzskx+t0QearU+ruovRydxYbUtBpkr5HauoVuQXR LiV270sDYDS/D1vvQQKzLxkUuvWmbZ0rB+2BAtS1rwq2sOKqYyQEAkTWfGtSXcf8 7fuZm2i1G78MuYGTOLCrF1u0owUB3QYHvt1NUW09GyWS8X3mahtj2fRe1RtPV/5d NL217bcd32fkMoGCg/lFvK9sCQzR6zJGKkJvOGMVW4ahHCLixpjIWabWtdXjfguT UahRPvlX7fzeVT+DISfjqyxwL+THnTvB3CTMWG2cktf0K1ke4SXcQ0mPyksN1NuC QocfPCr1TN2ri8g9dAPwQmOkojnNs9izpIWRYSl3avTJFNseNPxuHQALXj2Y3Y/O EoYxLN+cqPukQ1O3GxEj5QMKe8V/0986mxWnuS/dMohQOoy+zV4= =BPnR -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-5.0-20200317' into staging ppc patch queue 2020-03-17 Here's my final pull request for the qemu-5.0 soft freeze. Sorry this is just under the wire - I hit some last minute problems that took a while to fix up and retest. Highlights are: * Numerous fixes for the FWNMI feature * A handful of cleanups to the device tree construction code * Numerous fixes for the spapr-vscsi device * A number of fixes and cleanups for real mode (MMU off) softmmu handling * Fixes for handling of the PAPR RMA * Better handling of hotplug/unplug events during boot * Assorted other fixes # gpg: Signature made Tue 17 Mar 2020 09:55:07 GMT # gpg: using RSA key 75F46586AE61A66CC44E87DC6C38CACA20D9B392 # gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" [full] # gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" [full] # gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" [full] # gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" [unknown] # Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392 * remotes/dgibson/tags/ppc-for-5.0-20200317: (45 commits) pseries: Update SLOF firmware image ppc/spapr: Ignore common "ibm,nmi-interlock" Linux bug ppc/spapr: Implement FWNMI System Reset delivery target/ppc: allow ppc_cpu_do_system_reset to take an alternate vector ppc/spapr: Allow FWNMI on TCG ppc/spapr: Fix FWNMI machine check interrupt delivery ppc/spapr: Add FWNMI System Reset state ppc/spapr: Change FWNMI names ppc/spapr: Fix FWNMI machine check failure handling spapr: Rename DT functions to newer naming convention spapr: Move creation of ibm,architecture-vec-5 property spapr: Move creation of ibm,dynamic-reconfiguration-memory dt node spapr/rtas: Reserve space for RTAS blob and log pseries: Update SLOF firmware image ppc/spapr: Move GPRs setup to one place target/ppc: Fix rlwinm on ppc64 spapr/xive: use SPAPR_IRQ_IPI to define IPI ranges exposed to the guest hw/scsi/spapr_vscsi: Convert debug fprintf() to trace event hw/scsi/spapr_vscsi: Prevent buffer overflow hw/scsi/spapr_vscsi: Do not mix SRP IU size with DMA buffer size ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-03-18 15:07:57 +00:00
Nicholas Piggin	0e236d3477	ppc/spapr: Implement FWNMI System Reset delivery PAPR requires that if "ibm,nmi-register" succeeds, then the hypervisor delivers all system reset and machine check exceptions to the registered addresses. System Resets are delivered with registers set to the architected state, and with no interlock. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Message-Id: <20200316142613.121089-8-npiggin@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-03-17 17:00:22 +11:00
Nicholas Piggin	9aa2528070	target/ppc: allow ppc_cpu_do_system_reset to take an alternate vector Provide for an alternate delivery location, -1 defaults to the architected address. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Message-Id: <20200316142613.121089-7-npiggin@gmail.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-03-17 17:00:22 +11:00
Nicholas Piggin	edfdbf9c6b	ppc/spapr: Add FWNMI System Reset state The FWNMI option must deliver system reset interrupts to their registered address, and there are a few constraints on the handler addresses specified in PAPR. Add the system reset address state and checks. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Message-Id: <20200316142613.121089-4-npiggin@gmail.com> Reviewed-by: Greg Kurz <groug@kaod.org> Reviwed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-03-17 17:00:22 +11:00
Nicholas Piggin	8af7e1fe6f	ppc/spapr: Change FWNMI names The option is called "FWNMI", and it involves more than just machine checks, also machine checks can be delivered without the FWNMI option, so re-name various things to reflect that. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Message-Id: <20200316142613.121089-3-npiggin@gmail.com> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-03-17 17:00:22 +11:00
David Gibson	91335a5e15	spapr: Rename DT functions to newer naming convention In the spapr code we've been gradually moving towards a convention that functions which create pieces of the device tree are called spapr_dt_(). This patch speeds that along by renaming most of the things that don't yet match that so that they do. For now we leave the _dt_populate() functions which are actual methods used in the DRCClass::dt_populate method. While we're there we remove a few comments that don't really say anything useful. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org>	2020-03-17 17:00:19 +11:00
David Gibson	1e0e11085a	spapr: Move creation of ibm,architecture-vec-5 property This is currently called from spapr_dt_cas_updates() which is a hang over from when we created this only as a diff to the DT at CAS time. Now that we fully rebuild the DT at CAS time, just create it along with the rest of the properties in /chosen. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org>	2020-03-17 16:59:22 +11:00
David Gibson	fa523f0dd3	spapr: Move creation of ibm,dynamic-reconfiguration-memory dt node Currently this node with information about hotpluggable memory is created from spapr_dt_cas_updates(). But that's just a hangover from when we created it only as a diff to the device tree at CAS time. Now that we fully rebuild the DT as CAS time, it makes more sense to create this along with the rest of the memory information in the device tree. So, move it to spapr_populate_memory(). The patch is huge, but it's nearly all just code motion. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org>	2020-03-17 15:08:50 +11:00
Alexey Kardashevskiy	4dba872219	spapr/rtas: Reserve space for RTAS blob and log At the moment SLOF reserves space for RTAS and instantiates the RTAS blob which is 20 bytes binary blob calling an hypercall. The rest of the RTAS area is a log which SLOF has no idea about but QEMU does. This moves RTAS sizing to QEMU and this overrides the size from SLOF. The only remaining problem is that SLOF copies the number of bytes it reserved (2KB for now) so QEMU needs to reserve at least this much; SLOF will be fixed separately to check that rtas-size from QEMU is enough for those 20 bytes for the H_RTAS hcall. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20200316011841.99970-1-aik@ozlabs.ru> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-03-17 15:08:50 +11:00
Alexey Kardashevskiy	395a20d3cc	ppc/spapr: Move GPRs setup to one place At the moment "pseries" starts in SLOF which only expects the FDT blob pointer in r3. As we are going to introduce a OpenFirmware support in QEMU, we will be booting OF clients directly and these expect a stack pointer in r1, Linux looks at r3/r4 for the initramdisk location (although vmlinux can find this from the device tree but zImage from distro kernels cannot). This extends spapr_cpu_set_entry_state() to take more registers. This should cause no behavioral change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20200310050733.29805-2-aik@ozlabs.ru> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-03-17 15:08:50 +11:00
David Gibson	425f0b7adb	spapr: Clean up RMA size calculation Move the calculation of the Real Mode Area (RMA) size into a helper function. While we're there clean it up and correct it in a few ways: * Add comments making it clearer where the various constraints come from * Remove a pointless check that the RMA fits within Node 0 (we've just clamped it so that it does) Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>	2020-03-17 15:08:47 +11:00
David Gibson	1052ab67f4	spapr: Don't clamp RMA to 16GiB on new machine types In spapr_machine_init() we clamp the size of the RMA to 16GiB and the comment saying why doesn't make a whole lot of sense. In fact, this was done because the real mode handling code elsewhere limited the RMA in TCG mode to the maximum value configurable in LPCR[RMLS], 16GiB. But, * Actually LPCR[RMLS] has been able to encode a 256GiB size for a very long time, we just didn't implement it properly in the softmmu * LPCR[RMLS] shouldn't really be relevant anyway, it only was because we used to abuse the RMOR based translation mode in order to handle the fact that we're not modelling the hypervisor parts of the cpu We've now removed those limitations in the modelling so the 16GiB clamp no longer serves a function. However, we can't just remove the limit universally: that would break migration to earlier qemu versions, where the 16GiB RMLS limit still applies, no matter how bad the reasons for it are. So, we replace the 16GiB clamp, with a clamp to a limit defined in the machine type class. We set it to 16 GiB for machine types 4.2 and earlier, but set it to 0 meaning unlimited for the new 5.0 machine type. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>	2020-03-17 09:41:15 +11:00
David Gibson	8897ea5a9f	spapr: Don't attempt to clamp RMA to VRMA constraint The Real Mode Area (RMA) is the part of memory which a guest can access when in real (MMU off) mode. Of course, for a guest under KVM, the MMU isn't really turned off, it's just in a special translation mode - Virtual Real Mode Area (VRMA) - which looks like real mode in guest mode. The mechanics of how this works when using the hash MMU (HPT) put a constraint on the size of the RMA, which depends on the size of the HPT. So, the latter part of spapr_setup_hpt_and_vrma() clamps the RMA we advertise to the guest based on this VRMA limit. There are several things wrong with this: 1) spapr_setup_hpt_and_vrma() doesn't actually clamp, it takes the minimum of Node 0 memory size and the VRMA limit. That will often work the same as clamping, but there can be other constraints on RMA size which supersede Node 0 memory size. We have real bugs caused by this (currently worked around in the guest kernel) 2) Some callers of spapr_setup_hpt_and_vrma() are in a situation where we're past the point that we can actually advertise an RMA limit to the guest 3) But most fundamentally, the VRMA limit depends on host configuration (page size) which shouldn't be visible to the guest, but this partially exposes it. This can cause problems with migration in certain edge cases, although we will mostly get away with it. In practice, this clamping is almost never applied anyway. With 64kiB pages and the normal rules for sizing of the HPT, the theoretical VRMA limit will be 4x(guest memory size) and so never hit. It will hit with 4kiB pages, where it will be (guest memory size)/4. However all mainstream distro kernels for POWER have used a 64kiB page size for at least 10 years. So, simply replace this logic with a check that the RMA we've calculated based only on guest visible configuration will fit within the host implied VRMA limit. This can break if running HPT guests on a host kernel with 4kiB page size. As noted that's very rare. There also exist several possible workarounds: * Change the host kernel to use 64kiB pages * Use radix MMU (RPT) guests instead of HPT * Use 64kiB hugepages on the host to back guest memory * Increase the guest memory size so that the RMA hits one of the fixed limits before the RMA limit. This is relatively easy on POWER8 which has a 16GiB limit, harder on POWER9 which has a 1TiB limit. * Use a guest NUMA configuration which artificially constrains the RMA within the VRMA limit (the RMA must always fit within Node 0). Previously, on KVM, we also temporarily reduced the rma_size to 256M so that the we'd load the kernel and initrd safely, regardless of the VRMA limit. This was a) confusing, b) could significantly limit the size of images we could load and c) introduced a behavioural difference between KVM and TCG. So we remove that as well. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Greg Kurz <groug@kaod.org>	2020-03-17 09:41:15 +11:00
David Gibson	6a84737c80	spapr,ppc: Simplify signature of kvmppc_rma_size() This function calculates the maximum size of the RMA as implied by the host's page size of structure of the VRMA (there are a number of other constraints on the RMA size which will supersede this one in many circumstances). The current interface takes the current RMA size estimate, and clamps it to the VRMA derived size. The only current caller passes in an arguably wrong value (it will match the current RMA estimate in some but not all cases). We want to fix that, but for now just keep concerns separated by having the KVM helper function just return the VRMA derived limit, and let the caller combine it with other constraints. We call the new function kvmppc_vrma_limit() to more clearly indicate its limited responsibility. The helper should only ever be called in the KVM enabled case, so replace its !CONFIG_KVM stub with an assert() rather than a dummy value. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cedric Le Goater <clg@fr.ibm.com> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>	2020-03-17 09:41:15 +11:00
David Gibson	9943266ec3	spapr: Don't use weird units for MIN_RMA_SLOF MIN_RMA_SLOF records the minimum about of RMA that the SLOF firmware requires. It lets us give a meaningful error if the RMA ends up too small, rather than just letting SLOF crash. It's currently stored as a number of megabytes, which is strange for global constants. Move that megabyte scaling into the definition of the constant like most other things use. Change from M to MiB in the associated message while we're at it. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>	2020-03-17 09:41:15 +11:00
Felipe Franciosi	64a7b8de42	qom/object: Use common get/set uint helpers Several objects implemented their own uint property getters and setters, despite them being straightforward (without any checks/validations on the values themselves) and identical across objects. This makes use of an enhanced API for object_property_add_uintXX_ptr() which offers default setters. Some of these setters used to update the value even if the type visit failed (eg. because the value being set overflowed over the given type). The new setter introduces a check for these errors, not updating the value if an error occurred. The error is propagated. Signed-off-by: Felipe Franciosi <felipe@nutanix.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 23:02:24 +01:00
Philippe Mathieu-Daudé	ea0ac7f6f8	hw: Make MachineClass::is_default a boolean type There's no good reason for it to be type int, change it to bool. Suggested-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200207161948.15972-3-philmd@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2020-02-28 14:57:19 -05:00
Paolo Bonzini	ca6155c0f2	Merge tag 'patchew/20200219160953.13771-1-imammedo@redhat.com' of https://github.com/patchew-project/qemu into HEAD This series removes ad hoc RAM allocation API (memory_region_allocate_system_memory) and consolidates it around hostmem backend. It allows to * resolve conflicts between global -mem-prealloc and hostmem's "policy" option, fixing premature allocation before binding policy is applied * simplify complicated memory allocation routines which had to deal with 2 ways to allocate RAM. * reuse hostmem backends of a choice for main RAM without adding extra CLI options to duplicate hostmem features. A recent case was -mem-shared, to enable vhost-user on targets that don't support hostmem backends [1] (ex: s390) * move RAM allocation from individual boards into generic machine code and provide them with prepared MemoryRegion. * clean up deprecated NUMA features which were tied to the old API (see patches) - "numa: remove deprecated -mem-path fallback to anonymous RAM" - (POSTPONED, waiting on libvirt side) "forbid '-numa node,mem' for 5.0 and newer machine types" - (POSTPONED) "numa: remove deprecated implicit RAM distribution between nodes" Introduce a new machine.memory-backend property and wrapper code that aliases global -mem-path and -mem-alloc into automatically created hostmem backend properties (provided memory-backend was not set explicitly given by user). A bulk of trivial patches then follow to incrementally convert individual boards to using machine.memory-backend provided MemoryRegion. Board conversion typically involves: * providing MachineClass::default_ram_size and MachineClass::default_ram_id so generic code could create default backend if user didn't explicitly provide memory-backend or -m options * dropping memory_region_allocate_system_memory() call * using convenience MachineState::ram MemoryRegion, which points to MemoryRegion allocated by ram-memdev On top of that for some boards: * missing ram_size checks are added (typically it were boards with fixed ram size) * ram_size fixups are replaced by checks and hard errors, forcing user to provide correct "-m" values instead of ignoring it and continuing running. After all boards are converted, the old API is removed and memory allocation routines are cleaned up.	2020-02-25 09:19:00 +01:00
Alexey Kardashevskiy	87262806cb	spapr: Allow changing offset for -kernel image This allows moving the kernel in the guest memory. The option is useful for step debugging (as Linux is linked at 0x0); it also allows loading grub which is normally linked to run at 0x20000. This uses the existing kernel address by default. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20200203032943.121178-6-aik@ozlabs.ru> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-02-21 09:15:04 +11:00
Shivaprasad G Bhat	ee3a71e366	spapr: Add NVDIMM device support Add support for NVDIMM devices for sPAPR. Piggyback on existing nvdimm device interface in QEMU to support virtual NVDIMM devices for Power. Create the required DT entries for the device (some entries have dummy values right now). The patch creates the required DT node and sends a hotplug interrupt to the guest. Guest is expected to undertake the normal DR resource add path in response and start issuing PAPR SCM hcalls. The device support is verified based on the machine version unlike x86. This is how it can be used .. Ex : For coldplug, the device to be added in qemu command line as shown below -object memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/tmp/nvdimm0,share=yes,size=1073872896 -device nvdimm,label-size=128k,uuid=75a3cdd7-6a2f-4791-8d15-fe0a920e8e9e,memdev=memnvdimm0,id=nvdimm0,slot=0 For hotplug, the device to be added from monitor as below object_add memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/tmp/nvdimm0,share=yes,size=1073872896 device_add nvdimm,label-size=128k,uuid=75a3cdd7-6a2f-4791-8d15-fe0a920e8e9e,memdev=memnvdimm0,id=nvdimm0,slot=0 Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com> Signed-off-by: Bharata B Rao <bharata@linux.ibm.com> [Early implementation] Message-Id: <158131058078.2897.12767731856697459923.stgit@lep8c.aus.stglabs.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-02-21 09:15:04 +11:00
Michael S. Tsirkin	a784926819	ppc: function to setup latest class options We are going to add more init for the latest machine, so move the setup to a function so we don't have to change the DEFINE_SPAPR_MACHINE macro each time. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20200207064628.1196095-1-mst@redhat.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-02-21 09:15:03 +11:00
Igor Mammedov	ab74e54311	ppc/spapr: use memdev for RAM memory_region_allocate_system_memory() API is going away, so replace it with memdev allocated MemoryRegion. The later is initialized by generic code, so board only needs to opt in to memdev scheme by providing MachineClass::default_ram_id and using MachineState::ram instead of manually initializing RAM memory region. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200219160953.13771-68-imammedo@redhat.com>	2020-02-19 16:50:01 +00:00
Aravinda Prasad	e0aeef7a35	ppc: spapr: Activate the FWNMI functionality This patch sets the default value of SPAPR_CAP_FWNMI_MCE to SPAPR_CAP_ON for machine type 5.0. Signed-off-by: Aravinda Prasad <arawinda.p@gmail.com> Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com> Message-Id: <20200130184423.20519-8-ganeshgr@linux.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-02-03 11:33:11 +11:00
Aravinda Prasad	2500fb423a	migration: Include migration support for machine check handling This patch includes migration support for machine check handling. Especially this patch blocks VM migration requests until the machine check error handling is complete as these errors are specific to the source hardware and is irrelevant on the target hardware. Signed-off-by: Aravinda Prasad <arawinda.p@gmail.com> [Do not set FWNMI cap in post_load, now its done in .apply hook] Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com> Message-Id: <20200130184423.20519-7-ganeshgr@linux.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-02-03 11:33:11 +11:00
Aravinda Prasad	9ac703ac5f	target/ppc: Handle NMI guest exit Memory error such as bit flips that cannot be corrected by hardware are passed on to the kernel for handling. If the memory address in error belongs to guest then the guest kernel is responsible for taking suitable action. Patch [1] enhances KVM to exit guest with exit reason set to KVM_EXIT_NMI in such cases. This patch handles KVM_EXIT_NMI exit. [1] https://www.spinics.net/lists/kvm-ppc/msg12637.html (e20bbd3d and related commits) Signed-off-by: Aravinda Prasad <arawinda.p@gmail.com> Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <20200130184423.20519-4-ganeshgr@linux.ibm.com> [dwg: #ifdefs to fix compile for 32-bit target] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-02-03 11:33:10 +11:00
Aravinda Prasad	9d953ce447	ppc: spapr: Introduce FWNMI capability Introduce fwnmi an spapr capability and add a helper function which tries to enable it, which would be used by following patch of the series. This patch by itself does not change the existing behavior. Signed-off-by: Aravinda Prasad <arawinda.p@gmail.com> [eliminate cap_ppc_fwnmi, add fwnmi cap to migration state and reprhase the commit message] Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20200130184423.20519-3-ganeshgr@linux.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-02-03 11:33:10 +11:00
David Gibson	37965dfe4d	spapr: Enable DD2.3 accelerated count cache flush in pseries-5.0 machine For POWER9 DD2.2 cpus, the best current Spectre v2 indirect branch mitigation is "count cache disabled", which is configured with: -machine cap-ibs=fixed-ccd However, this option isn't available on DD2.3 CPUs with KVM, because they don't have the count cache disabled. For POWER9 DD2.3 cpus, it is "count cache flush with assist", configured with: -machine cap-ibs=workaround,cap-ccf-assist=on However this option isn't available on DD2.2 CPUs with KVM, because they don't have the special CCF assist instruction this relies on. On current machine types, we default to "count cache flush w/o assist", that is: -machine cap-ibs=workaround,cap-ccf-assist=off This runs, with mitigation on both DD2.2 and DD2.3 host cpus, but has a fairly significant performance impact. It turns out we can do better. The special instruction that CCF assist uses to trigger a count cache flush is a no-op on earlier CPUs, rather than trapping or causing other badness. It doesn't, of itself, implement the mitigation, but if we have count-cache-disabled, then the count cache flush is unnecessary, and so using the count cache flush mitigation is harmless. Therefore for the new pseries-5.0 machine type, enable cap-ccf-assist by default. Along with that, suppress throwing an error if cap-ccf-assist is selected but KVM doesn't support it, as long as KVM is giving us count-cache-disabled. To allow TCG to work out of the box, even though it doesn't implement the ccf flush assist, downgrade the error in that case to a warning. This matches several Spectre mitigations where we allow TCG to operate for debugging, since we don't really make guarantees about TCG security properties anyway. While we're there, make the TCG warning for this case match that for other mitigations. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Tested-by: Michael Ellerman <mpe@ellerman.id.au>	2020-02-03 11:33:02 +11:00
Aleksandar Markovic	6cdda0ff4b	hw/core/loader: Let load_elf() populate a field with CPU-specific flags While loading the executable, some platforms (like AVR) need to detect CPU type that executable is built for - and, with this patch, this is enabled by reading the field 'e_flags' of the ELF header of the executable in question. The change expands functionality of the following functions: - load_elf() - load_elf_as() - load_elf_ram() - load_elf_ram_sym() The argument added to these functions is called 'pflags' and is of type 'uint32_t*' (that matches 'pointer to 'elf_word'', 'elf_word' being the type of the field 'e_flags', in both 32-bit and 64-bit variants of ELF header). Callers are allowed to pass NULL as that argument, and in such case no lookup to the field 'e_flags' will happen, and no information will be returned, of course. CC: Richard Henderson <rth@twiddle.net> CC: Peter Maydell <peter.maydell@linaro.org> CC: Edgar E. Iglesias <edgar.iglesias@gmail.com> CC: Michael Walle <michael@walle.cc> CC: Thomas Huth <huth@tuxfamily.org> CC: Laurent Vivier <laurent@vivier.eu> CC: Philippe Mathieu-Daudé <f4bug@amsat.org> CC: Aleksandar Rikalo <aleksandar.rikalo@rt-rk.com> CC: Aurelien Jarno <aurelien@aurel32.net> CC: Jia Liu <proljc@gmail.com> CC: David Gibson <david@gibson.dropbear.id.au> CC: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> CC: BALATON Zoltan <balaton@eik.bme.hu> CC: Christian Borntraeger <borntraeger@de.ibm.com> CC: Thomas Huth <thuth@redhat.com> CC: Artyom Tarasenko <atar4qemu@gmail.com> CC: Fabien Chouteau <chouteau@adacore.com> CC: KONRAD Frederic <frederic.konrad@adacore.com> CC: Max Filippov <jcmvbkbc@gmail.com> Reviewed-by: Aleksandar Rikalo <aleksandar.rikalo@rt-rk.com> Signed-off-by: Michael Rolnik <mrolnik@gmail.com> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com> Message-Id: <1580079311-20447-24-git-send-email-aleksandar.markovic@rt-rk.com>	2020-01-29 19:28:52 +01:00
Peter Xu	1df2c9a26f	migration: Define VMSTATE_INSTANCE_ID_ANY Define the new macro VMSTATE_INSTANCE_ID_ANY for callers who wants to auto-generate the vmstate instance ID. Previously it was hard coded as -1 instead of this macro. It helps to change this default value in the follow up patches. No functional change. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2020-01-20 09:10:23 +01:00
Cédric Le Goater	baa45b1710	spapr/xive: remove redundant check in spapr_match_nvt() spapr_match_nvt() is a XIVE operation and is used by the machine to look for a matching target when an event notification is being delivered. An assert checks that spapr_match_nvt() is called only when the machine has selected the XIVE interrupt mode but it is redundant with the XIVE_PRESENTER() dynamic cast. Apply the cast to spapr->active_intc and remove the assert. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20200106163207.4608-1-clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-01-08 11:01:59 +11:00
Daniel Henrique Barboza	9b6c1da5e9	spapr.c: remove 'out' label in spapr_dt_cas_updates() 'out' can be replaced by 'return' with the appropriate return value. CC: David Gibson <david@gibson.dropbear.id.au> CC: qemu-ppc@nongnu.org Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20200106182425.20312-2-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-01-08 11:01:59 +11:00
Bharata B Rao	905db91697	ppc/spapr: Support reboot of secure pseries guest A pseries guest can be run as a secure guest on Ultravisor-enabled POWER platforms. When such a secure guest is reset, we need to release/reset a few resources both on ultravisor and hypervisor side. This is achieved by invoking this new ioctl KVM_PPC_SVM_OFF from the machine reset path. As part of this ioctl, the secure guest is essentially transitioned back to normal mode so that it can reboot like a regular guest and become secure again. This ioctl has no effect when invoked for a normal guest. If this ioctl fails for a secure guest, the guest is terminated. Signed-off-by: Bharata B Rao <bharata@linux.ibm.com> Message-Id: <20191219031445.8949-3-bharata@linux.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-01-08 11:01:59 +11:00
David Gibson	d1d32d6255	spapr: Simplify ovec diff spapr_ovec_diff(ov, old, new) has somewhat complex semantics. ov is set to those bits which are in new but not old, and it returns as a boolean whether or not there are any bits in old but not new. It turns out that both callers only care about the second, not the first. This is basically equivalent to a bitmap subset operation, which is easier to understand and implement. So replace spapr_ovec_diff() with spapr_ovec_subset(). Cc: Mike Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cedric Le Goater <clg@fr.ibm.com>	2019-12-17 10:39:48 +11:00
David Gibson	0c21e07354	spapr: Fold h_cas_compose_response() into h_client_architecture_support() spapr_h_cas_compose_response() handles the last piece of the PAPR feature negotiation process invoked via the ibm,client-architecture-support OF call. Its only caller is h_client_architecture_support() which handles most of the rest of that process. I believe it was placed in a separate file originally to handle some fiddly dependencies between functions, but mostly it's just confusing to have the CAS process split into two pieces like this. Now that compose response is simplified (by just generating the whole device tree anew), it's cleaner to just fold it into h_client_architecture_support(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cedric Le Goater <clg@fr.ibm.com> Reviewed-by: Greg Kurz <groug@kaod.org>	2019-12-17 10:39:48 +11:00
David Gibson	97b32a6afa	spapr: Improve handling of fdt buffer size Previously, spapr_build_fdt() constructed the device tree in a fixed buffer of size FDT_MAX_SIZE. This is a bit inflexible, but more importantly it's awkward for the case where we use it during CAS. In that case the guest firmware supplies a buffer and we have to awkwardly check that what we generated fits into it afterwards, after doing a lot of size checks during spapr_build_fdt(). Simplify this by having spapr_build_fdt() take a 'space' parameter. For the CAS case, we pass in the buffer size provided by SLOF, for the machine init case, we continue to pass FDT_MAX_SIZE. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cedric Le Goater <clg@fr.ibm.com> Reviewed-by: Greg Kurz <groug@kaod.org>	2019-12-17 10:39:48 +11:00
Vladimir Sementsov-Ogievskiy	cdcca22aab	ppc: well form kvmppc_hint_smt_possible error hint helper Make kvmppc_hint_smt_possible hint append helper well formed: rename errp to errp_in, as it is IN-parameter here (which is unusual for errp), rename function to be kvmppc_error_append_*_hint. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20191127191434.20945-1-vsementsov@virtuozzo.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-12-17 10:39:48 +11:00
Cédric Le Goater	932de7aef8	ppc/spapr: Implement the XiveFabric interface The CAM line matching sequence in the pseries machine does not change much apart from the use of the new QOM interfaces. There is an extra indirection because of the sPAPR IRQ backend of the machine. Only the XIVE backend implements the new 'match_nvt' handler. Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20191125065820.927-11-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-12-17 10:39:48 +11:00
Cornelia Huck	3eb74d2087	hw: add compat machines for 5.0 Add 5.0 machine types for arm/i440fx/q35/s390x/spapr. For i440fx and q35, unversioned cpu models are still translated to -v1; I'll leave changing this (if desired) to the respective maintainers. Signed-off-by: Cornelia Huck <cohuck@redhat.com> Message-Id: <20191112104811.30323-1-cohuck@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>	2019-12-14 10:25:50 +01:00
Evgeny Yakovlev	5f2585772f	virtio-blk: advertise F_WCE (F_FLUSH) if F_CONFIG_WCE is advertised Virtio spec 1.1 (and earlier), 5.2.5.2 Driver Requirements: Device Initialization: "Devices SHOULD always offer VIRTIO_BLK_F_FLUSH, and MUST offer it if they offer VIRTIO_BLK_F_CONFIG_WCE" Currently F_CONFIG_WCE and F_WCE are not connected to each other. Qemu will advertise F_CONFIG_WCE if config-wce argument is set for virtio-blk device. And F_WCE is advertised only if underlying block backend actually has it's caching enabled. Fix this by advertising F_WCE if F_CONFIG_WCE is also advertised. To preserve backwards compatibility with newer machine types make this behaviour governed by "x-enable-wce-if-config-wce" virtio-blk-device property and introduce hw_compat_4_2 with new property being off by default for all machine types <= 4.2 (but don't introduce 4.3 machine type itself yet). Signed-off-by: Evgeny Yakovlev <wrfsh@yandex-team.ru> Message-Id: <1572978137-189218-1-git-send-email-wrfsh@yandex-team.ru> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-12-13 11:22:06 +00:00
Alexey Kardashevskiy	a49f62b9fd	spapr: Add /chosen to FDT only at reset time to preserve kernel and initramdisk Since "spapr: Render full FDT on ibm,client-architecture-support" we build the entire flatten device tree (FDT) twice - at the reset time and when "ibm,client-architecture-support" (CAS) is called. The full FDT from CAS is then applied on top of the SLOF internal device tree. This is mostly ok, however there is a case when the QEMU is started with -initrd and for some reason the guest decided to move/unpack the init RAM disk image - the guest correctly notifies SLOF about the change but at CAS it is overridden with the QEMU initial location addresses and the guest may fail to boot if the original initrd memory was changed. This fixes the problem by only adding the /chosen node at the reset time to prevent the original QEMU's linux,initrd-start/linux,initrd-end to override the updated addresses. This only treats /chosen differently as we know there is a special case already and it is unlikely anything else will need to change /chosen at CAS we are better off not touching /chosen after we handed it over to SLOF. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20191024041308.5673-1-aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Laurent Vivier <lvivier@redhat.com>	2019-11-18 11:50:33 +01:00
Greg Kurz	47c8c915b1	spapr: Don't request to unplug the same core twice We must not call spapr_drc_detach() on a detached DRC otherwise bad things can happen, ie. QEMU hangs or crashes. This is easily demonstrated with a CPU hotplug/unplug loop using QMP. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <157185826035.3073024.1664101000438499392.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-10-24 09:37:54 +11:00
David Gibson	54255c1f65	spapr: Move SpaprIrq::nr_xirqs to SpaprMachineClass For the benefit of peripheral device allocation, the number of available irqs really wants to be the same on a given machine type version, regardless of what irq backends we are using. That's the case now, but only because we make sure the different SpaprIrq instances have the same value except for the special legacy one. Since this really only depends on machine type version, move the value to SpaprMachineClass instead of SpaprIrq. This also puts the code to set it to the lower value on old machine types right next to setting legacy_irq_allocation, which needs to go hand in hand. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org>	2019-10-24 09:36:55 +11:00
David Gibson	8cbe71ecb8	spapr: Remove SpaprIrq::nr_msis The nr_msis value we use here has to line up with whether we're using legacy or modern irq allocation. Therefore it's safer to derive it based on legacy_irq_allocation rather than having SpaprIrq contain a canned value. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org>	2019-10-24 09:36:55 +11:00
David Gibson	05289273c0	spapr, xics, xive: Move dt_populate from SpaprIrq to SpaprInterruptController This method depends only on the active irq controller. Now that we've formalized the notion of active controller we can dispatch directly through that, rather than dispatching via SpaprIrq with the dual version having to do a second conditional dispatch. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org>	2019-10-24 09:36:55 +11:00
David Gibson	328d8eb24d	spapr, xics, xive: Move print_info from SpaprIrq to SpaprInterruptController This method depends only on the active irq controller. Now that we've formalized the notion of active controller we can dispatch directly through that, rather than dispatching via SpaprIrq with the dual version having to do a second conditional dispatch. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org>	2019-10-24 09:36:55 +11:00
Greg Kurz	29cb418749	spapr: Set VSMT to smp_threads by default Support for setting VSMT is available in KVM since linux-4.13. Most distros that support KVM on POWER already have it. It thus seem reasonable enough to have the default machine to set VSMT to smp_threads. This brings contiguous VCPU ids and thus brings their upper bound down to the machine's max_cpus. This is especially useful for XIVE KVM devices, which may thus allocate only one VP descriptor per VCPU. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <157010411885.246126.12610015369068227139.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-10-24 09:36:55 +11:00
Tao Xu	0533ef5f20	numa: Introduce MachineClass::auto_enable_numa for implicit NUMA node Add MachineClass::auto_enable_numa field. When it is true, a NUMA node is expected to be created implicitly. Acked-by: David Gibson <david@gibson.dropbear.id.au> Suggested-by: Igor Mammedov <imammedo@redhat.com> Suggested-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Tao Xu <tao3.xu@intel.com> Message-Id: <20190905083238.1799-1-tao3.xu@intel.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2019-10-15 18:18:08 -03:00
David Gibson	ca62823b79	spapr: Use less cryptic representation of which irq backends are supported SpaprIrq::ov5 stores the value for a particular byte in PAPR option vector 5 which indicates whether XICS, XIVE or both interrupt controllers are available. As usual for PAPR, the encoding is kind of overly complicated and confusing (though to be fair there are some backwards compat things it has to handle). But to make our internal code clearer, have SpaprIrq encode more directly which backends are available as two booleans, and derive the OV5 value from that at the point we need it. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org>	2019-10-04 19:08:23 +10:00
Alexey Kardashevskiy	e68cd0cb5c	spapr: Render full FDT on ibm,client-architecture-support The ibm,client-architecture-support call is a way for the guest to negotiate capabilities with a hypervisor. It is implemented as: - the guest calls SLOF via client interface; - SLOF calls QEMU (H_CAS hypercall) with an options vector from the guest; - QEMU returns a device tree diff (which uses FDT format with an additional header before it); - SLOF walks through the partial diff tree and updates its internal tree with the values from the diff. This changes QEMU to simply re-render the entire tree and send it as an update. SLOF can handle this already mostly, [1] is needed before this can be applied. This stores the resulting tree in the spapr machine to have the latest valid FDT copy possible (this should not matter much as H_UPDATE_DT happens right after that but nevertheless). The benefit is reduced code size as there is no need for another set of DT rendering helpers such as spapr_fixup_cpu_dt(). The downside is that the updates are bigger now (as they include all nodes and properties) but the difference on a '-smp 256,threads=1' system before/after is 2.35s vs. 2.5s. [1] https://patchwork.ozlabs.org/patch/1152915/ Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-10-04 19:08:21 +10:00
Alexey Kardashevskiy	744a928cce	spapr: Stop providing RTAS blob SLOF implements one itself so let's remove it from QEMU. It is one less image and simpler setup as the RTAS blob never stays in its initial place anyway as the guest OS always decides where to put it. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-10-04 10:25:23 +10:00
Alexey Kardashevskiy	5ced78955f	spapr: Do not put empty properties for -kernel/-initrd/-append We are going to use spapr_build_fdt() for the boot time FDT and as an update for SLOF during handling of H_CAS. SLOF will apply all properties from the QEMU's FDT which is usually ok unless there are properties changed by grub or guest kernel. The properties are: bootargs, linux,initrd-start, linux,initrd-end, linux,stdout-path, linux,rtas-base, linux,rtas-entry. Resetting those during CAS will most likely cause grub failure. Don't create such properties if we're booting without "-kernel" and "-initrd" so they won't get included into the DT update blob and therefore the guest is more likely to boot successfully. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [dwg: Tweaked commit message based on Greg Kurz's input] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-10-04 10:25:23 +10:00
Alexey Kardashevskiy	3a17e38f6e	spapr: Skip leading zeroes from memory@ DT node names The device tree build by QEMU at the machine reset time is used by SLOF to build its internal device tree but the node names are not preserved exactly so when QEMU provides a device tree update in response to H_CAS, it might become tricky to match a node from the update blob to the actual node in SLOF. This removed leading zeroes from "memory@" nodes and makes the DTC checker happy. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org>	2019-10-04 10:25:23 +10:00
Alexey Kardashevskiy	f767b1ac57	spapr: Fixes a leak in CAS Add a missing g_free(fdt) if the resulting tree is bigger than the space allocated by SLOF. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org>	2019-10-04 10:25:23 +10:00
David Gibson	db5127b28a	spapr: Move handling of special NVLink numa node from reset to init The number of NUMA nodes in the system is fixed from the command line. Therefore, there's no need to recalculate it at reset time, and we can determine the special gpu_numa_id value used for NVLink2 devices at init time. This simplifies the reset path a bit which will make further improvements easier. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>	2019-10-04 10:25:23 +10:00
David Gibson	daa36379ce	spapr: Simplify handling of pre ISA 3.0 guest workaround handling Certain old guest versions don't understand the radix MMU introduced with POWER ISA 3.0, but incorrectly select it if presented with the option at CAS time. We workaround this in qemu by explicitly excluding the radix (and other ISA 3.0 linked) options if the guest doesn't explicitly note support for ISA 3.0. This is handled by the 'cas_legacy_guest_workaround' flag, which is pretty vague. Rename it to 'cas_pre_isa3_guest' to be clearer about what it's for. In addition, we unnecessarily call spapr_populate_pa_features() with different options when initially constructing the device tree and when adjusting it at CAS time. At the initial construct time cas_pre_isa3_guest is already false, so we can still use the flag, rather than explicitly overriding it to be false at the callsite. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>	2019-10-04 10:25:23 +10:00
Greg Kurz	f041d6af55	spapr: Report kvm_irqchip_in_kernel() in 'info pic' Unless the machine was started with kernel-irqchip=on, we cannot easily tell if we're actually using an in-kernel or an emulated irqchip. This information is important enough that it is worth printing it in 'info pic'. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <156829860985.2073005.5893493824873412773.stgit@bahia.tls.ibm.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-10-04 10:25:23 +10:00
Laurent Vivier	58c46efa45	pseries: do not allow memory-less/cpu-less NUMA node When we hotplug a CPU on memory-less/cpu-less node, the linux kernel crashes. This happens because linux kernel needs to know the NUMA topology at start to be able to initialize the distance lookup table. On pseries, the topology is provided by the firmware via the existing CPUs and memory information. Thus a node without memory and CPU cannot be discovered by the kernel. To avoid the kernel crash, do not allow to start pseries with empty nodes. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Message-Id: <20190830161345.22436-1-lvivier@redhat.com> [dwg: Rework to cope with movement of numa state from globals to MachineState] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-10-04 10:25:23 +10:00
Dr. David Alan Gilbert	ce62df5378	migration: register_savevm_live doesn't need dev Commit `78dd48df3` removed the last caller of register_savevm_live for an instantiable device (rather than a single system wide device); so trim out the parameter. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20190822115433.12070-1-dgilbert@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-09-12 11:15:03 +01:00
Peter Maydell	f884294bd7	Machine + x86 queue, 2019-09-03 Bug fixes: * Fix die-id validation regression (Eduardo Habkost) * vmmouse: Properly reset state (Jan Kiszka) * hostmem-file: fix pmem file size check (Stefan Hajnoczi) * Keep query-hotpluggable-cpus output compatible with older QEMU if '-smp dies' is not set (Igor Mammedov) * migration: Do not re-read the clock on pre_save in case of paused guest (Maxiwell S. Garcia) Cleanups: * NUMA code cleanups (Tao Xu) * Remove stale externs from includes (Alex Bennée) Features: * qapi: report the default CPU type for each machine (Daniel P. Berrangé) -----BEGIN PGP SIGNATURE----- iQJIBAABCAAyFiEEWjIv1avE09usz9GqKAeTb5hNxaYFAl1u08EUHGVoYWJrb3N0 QHJlZGhhdC5jb20ACgkQKAeTb5hNxaaKGQ//WQY+JQgXj2M7i5bAuz1lkR0QKJvh n++70ugqNmmlj1YH7LKmZNll0tz+auo25PLgEBOamPZPFQXxkRhPBxTUnOdQJ1UC bSwyRzHrFluVITXD/nGkIXgmP4rjXil5QBWTxneWb7zYsXDGBEnauZnC1YsXzc9T 5LISvc5zEz6pEzz5s3LdUJ947jTui/dDHVHupeyK/5bPkiPoKVoymsd4p8rvAmFw 4obMftjuFzklm8oLPKpHYAm7VvXj5yb92/FE/ZKdaahcLPGStWixiHJ7xJlGMBti GqcWca+2sdbsraOz4Pg05x//vbOgiwIECqgKJRlJSAnG7Roz7E6J/xXQIYIkhpkL Sn0+s181WtFeNFlQgEP056iTUCq81oBjek2XzgsXzuQyFip5IJGLLQox4E+w0ty6 7houoCkJD70ddl3sEj/koXi6rBeswNStfuxVYxUgwYa7HecehNvVD5q9NlElRhev Lce4szuWJzHBbhW5ubGmN6rCbXNa+mPrBunrDwbjApl12DFkr163dj9DsyN/DUgy MmfsgqpKZ+g18VSajck2QtvTg+9Oqv0bv3SWtpDwzDxS9VULz0r2wfcN9TZDipV0 qCZWg39BpCIgdd4s5L0q6bamC9+eSwoByFx54WrkoQT81odHJqUHNsCE9wnoNvmG aZlV3idjGmsTFiE= =u5HZ -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/ehabkost/tags/machine-next-pull-request' into staging Machine + x86 queue, 2019-09-03 Bug fixes: * Fix die-id validation regression (Eduardo Habkost) * vmmouse: Properly reset state (Jan Kiszka) * hostmem-file: fix pmem file size check (Stefan Hajnoczi) * Keep query-hotpluggable-cpus output compatible with older QEMU if '-smp dies' is not set (Igor Mammedov) * migration: Do not re-read the clock on pre_save in case of paused guest (Maxiwell S. Garcia) Cleanups: * NUMA code cleanups (Tao Xu) * Remove stale externs from includes (Alex Bennée) Features: * qapi: report the default CPU type for each machine (Daniel P. Berrangé) # gpg: Signature made Tue 03 Sep 2019 21:57:37 BST # gpg: using RSA key 5A322FD5ABC4D3DBACCFD1AA2807936F984DC5A6 # gpg: issuer "ehabkost@redhat.com" # gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>" [full] # Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF D1AA 2807 936F 984D C5A6 * remotes/ehabkost/tags/machine-next-pull-request: migration: Do not re-read the clock on pre_save in case of paused guest x86: do not advertise die-id in query-hotpluggbale-cpus if '-smp dies' is not set i386/vmmouse: Properly reset state hostmem-file: fix pmem file size check qapi: report the default CPU type for each machine pc: Don't make die-id mandatory unless necessary pc: Improve error message when die-id is omitted pc: Fix error message on die-id validation numa: move numa global variable numa_info into MachineState numa: move numa global variable have_numa_distance into MachineState numa: move numa global variable nb_numa_nodes into MachineState hw/arm: simplify arm_load_dtb includes: remove stale [smp\|max]_cpus externs Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2019-09-04 14:44:54 +01:00
Tao Xu	7e721e7b10	numa: move numa global variable numa_info into MachineState Move existing numa global numa_info (renamed as "nodes") into NumaState. Reviewed-by: Igor Mammedov <imammedo@redhat.com> Suggested-by: Igor Mammedov <imammedo@redhat.com> Suggested-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Tao Xu <tao3.xu@intel.com> Message-Id: <20190809065731.9097-5-tao3.xu@intel.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2019-09-03 11:26:55 -03:00
Tao Xu	aa57020774	numa: move numa global variable nb_numa_nodes into MachineState Add struct NumaState in MachineState and move existing numa global nb_numa_nodes(renamed as "num_nodes") into NumaState. And add variable numa_support into MachineClass to decide which submachines support NUMA. Reviewed-by: Igor Mammedov <imammedo@redhat.com> Suggested-by: Igor Mammedov <imammedo@redhat.com> Suggested-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Tao Xu <tao3.xu@intel.com> Message-Id: <20190809065731.9097-3-tao3.xu@intel.com> [ehabkost: include hw/boards.h again to fix build failures] Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2019-09-03 11:26:55 -03:00
Greg Kurz	b1e8156743	spapr: Set compat mode in spapr_core_plug() A recent change in spapr_machine_reset() showed that resetting the compat mode in spapr_machine_reset() for the boot vCPU and in spapr_cpu_reset() for all other vCPUs was fragile. The fix was thus to reset the compat mode for all vCPUs in spapr_machine_reset(), but we still have to propagate it to hot-plugged CPUs. This is still performed from spapr_cpu_reset(), hence resulting in ppc_set_compat() being called twice for every vCPU at machine reset. Apart from wasting cycles, which isn't really an issue during machine reset, this seems to indicate that spapr_cpu_reset() isn't the best place to set the compat mode. A natural candidate for CPU-hotplug specific code is spapr_core_plug(). Also, it sits in the same file as spapr_machine_reset() : this makes it easier for someone who wants to know when the compat PVR is set. Call ppc_set_compat() from there. This doesn't need to be done for initial vCPUs since the compat PVR is 0 and spapr_machine_reset() sets the appropriate value later. No need to do this on manually added vCPUS on the destination QEMU during migration since the compat PVR is part of the migrated vCPU state. Both conditions can be checked with spapr_drc_hotplugged(). Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <156701285312.499757.7807417667750711711.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-08-29 09:46:07 +10:00
Alexey Kardashevskiy	6c3829a265	spapr_pci: Advertise BAR reallocation capability The pseries guests do not normally allocate PCI resources and rely on the system firmware doing so. Furthermore at least at some point in the past the pseries guests won't even allowed to change BARs, probably it is still the case for phyp. So since the initial commit we have [1] which prevents resource reallocation. This is not a problem until we want specific BAR alignments, for example, PAGE_SIZE==64k to make sure we can still map MMIO BARs directly. For the boot time devices we handle this in SLOF [2] but since QEMU's RTAS does not allocate BARs, the guest does this instead and does not align BARs even if Linux is given pci=resource_alignment=16@pci:0:0 as PCI_PROBE_ONLY makes Linux ignore alignment requests. ARM folks added a dial to control PCI_PROBE_ONLY via the device tree [3]. This makes use of the dial to advertise to the guest that we can handle BAR reassignments. This limits the change to the latest pseries machine to avoid old guests explosion. We do not remove the flag from [1] as pseries guests are still supported under phyp so having that removed may cause problems. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/powerpc/platforms/pseries/setup.c?h=v5.1#n773 [2] https://git.qemu.org/?p=SLOF.git;a=blob;f=board-qemu/slof/pci-phb.fs;h=06729bcf77a0d4e900c527adcd9befe2a269f65d;hb=HEAD#l338 [3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f81c11af Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20190719043734.108462-1-aik@ozlabs.ru> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-08-29 09:46:07 +10:00
Laurent Vivier	ce03a193e1	pseries: Fix compat_pvr on reset If we a migrate P8 machine to a P9 machine, the migration fails on destination with: error while loading state for instance 0x1 of device 'cpu' load of migration failed: Operation not permitted This is caused because the compat_pvr field is only present for the first CPU. Originally, spapr_machine_reset() calls ppc_set_compat() to set the value max_compat_pvr for the first cpu and this was propagated to all CPUs by spapr_cpu_reset(). Now, as spapr_cpu_reset() is called before that, the value is not propagated to all CPUs and the migration fails. To fix that, propagate the new value to all CPUs in spapr_machine_reset(). Fixes: `25c9780d38` ("spapr: Reset CAS & IRQ subsystem after devices") Signed-off-by: Laurent Vivier <lvivier@redhat.com> Message-Id: <20190826090812.19080-1-lvivier@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-08-29 09:46:07 +10:00
Peter Maydell	f3b8f18ebf	Monitor patches for 2019-08-21 -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAl1dZKsSHGFybWJydUBy ZWRoYXQuY29tAAoJEDhwtADrkYZTJ4QP/10izA+dSofQ9404GRq3TNzwRCKugU44 nES9CqDh6x5emx+ADQWYkugblgfH9GOvUaAUNtY+uFaEr55yC/F+VWeVXvyjt5U6 ZpPZqIRDOHo2+PZrddr/KcKmiomS6plz03m9bzb3pYN1yIl2ZzgClAhAqWQLk0WB wwiY+YsJ83YR4sdiRMZkuF+UL7N8fSqYvIIj0yzM8+8ONDor9n16PoPeFg3JSsyG aMxXIUnSBZAVtClaNkUPtS0Wf9XEuqoG1rvMRV4Vv+eeb7fwA414DqanRJdLlGMA yNRtFcVyztCfjgVEXnY9JJlFe6pDkoe8ycoimQ4YA60C9c1DIMHqyjFWXRHfDwk8 bYMSX6CTpfoEvbTfmwqYR6KSkb/KuXiFDmcYlTYFvIt3grhhdHQbru9vy+E5sm/b j3CPV2DTCkeGY+oZFfKIaQT9yoWZOhmMY5doMTYyinXygPTGQROUrHtzUeRXKmJZ arqDRmh+mlEiGETNeYQCI45eYCSDYxO+UNrhszxhmv6B1+ixhIrV2oXhi61vVBeY yngY4EILbuA2Z/E4BevJk91ESWJTr3UP13c6p7yf21iN4BD1KkHy5HoXCgYfQDeV 4kar49g6WQ/VQEiwhi65Xd0OwstynkcV69F+kMagVMgaLeRsdU5ikGJQzxTeWJRl SPpc7oDwuAS+ =2F3E -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/armbru/tags/pull-monitor-2019-08-21' into staging Monitor patches for 2019-08-21 # gpg: Signature made Wed 21 Aug 2019 16:35:07 BST # gpg: using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653 # gpg: issuer "armbru@redhat.com" # gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full] # gpg: aka "Markus Armbruster <armbru@pond.sub.org>" [full] # Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653 * remotes/armbru/tags/pull-monitor-2019-08-21: monitor/qmp: Update comment for commit `4eaca8de26` qdev: Collect HMP handlers command handlers in qdev-monitor.c qapi: Move query-target from misc.json to machine.json hw/core: Move cpu.c, cpu.h from qom/ to hw/core/ Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2019-08-22 10:31:21 +01:00
Markus Armbruster	2e5b09fd0e	hw/core: Move cpu.c, cpu.h from qom/ to hw/core/ Suggested-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190709152053.16670-2-armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> [Rebased onto merge commit 95a9457fd44; missed instances of qom/cpu.h in comments replaced]	2019-08-21 13:24:01 +02:00
Greg Kurz	e1588bcdd2	spapr/irq: Drop spapr_irq_msi_reset() PHBs already take care of clearing the MSIs from the bitmap during reset or unplug. No need to do this globally from the machine code. Rather add an assert to ensure that PHBs have acted as expected. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <156415228966.1064338.190189424190233355.stgit@bahia.lan> Reviewed-by: Cédric Le Goater <clg@kaod.org> [dwg: Fix crash in qtest case where spapr->irq_map can be NULL at the new assert()] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-08-21 17:17:39 +10:00
Nicholas Piggin	93eac7b8f4	spapr: Implement ibm,suspend-me This has been useful to modify and test the Linux pseries suspend code but it requires modification to the guest to call it (due to being gated by other unimplemented features). It is not otherwise used by Linux yet, but work is slowly progressing there. This allows a (lightly modified) guest kernel to suspend with `echo mem > /sys/power/state` and be resumed with system_wakeup monitor command. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Message-Id: <20190722061752.22114-2-npiggin@gmail.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-08-21 17:17:39 +10:00
Michael Roth	0fb6bd0732	spapr: initial implementation for H_TPM_COMM/spapr-tpm-proxy This implements the H_TPM_COMM hypercall, which is used by an Ultravisor to pass TPM commands directly to the host's TPM device, or a TPM Resource Manager associated with the device. This also introduces a new virtual device, spapr-tpm-proxy, which is used to configure the host TPM path to be used to service requests sent by H_TPM_COMM hcalls, for example: -device spapr-tpm-proxy,id=tpmp0,host-path=/dev/tpmrm0 By default, no spapr-tpm-proxy will be created, and hcalls will return H_FUNCTION. The full specification for this hypercall can be found in docs/specs/ppc-spapr-uv-hcalls.txt Since SVM-related hcalls like H_TPM_COMM use a reserved range of 0xEF00-0xEF80, we introduce a separate hcall table here to handle them. Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com Message-Id: <20190717205842.17827-3-mdroth@linux.vnet.ibm.com> [dwg: Corrected #include for upstream change] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-08-21 17:17:12 +10:00
Nicholas Piggin	107413142b	spapr: Implement H_JOIN This has been useful to modify and test the Linux pseries suspend code but it requires modification to the guest to call it (due to being gated by other unimplemented features). It is not otherwise used by Linux yet, but work is slowly progressing there. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Message-Id: <20190718034214.14948-5-npiggin@gmail.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-08-21 17:17:12 +10:00
Nicholas Piggin	3a6e6224a9	spapr: Implement H_PROD H_PROD is added, and H_CEDE is modified to test the prod bit according to PAPR. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Message-Id: <20190718034214.14948-3-npiggin@gmail.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-08-21 17:17:12 +10:00
Nicholas Piggin	03ef074c04	spapr: Implement dispatch tracking for tcg Implement cpu_exec_enter/exit on ppc which calls into new methods of the same name in PPCVirtualHypervisorClass. These are used by spapr to implement the splpar VPA dispatch counter initially. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Message-Id: <20190718034214.14948-2-npiggin@gmail.com> [dwg: Removed unnecessary CONFIG_USER_ONLY checks as suggested by gkurz] Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-08-21 17:17:11 +10:00
David Gibson	d15d4ad64f	spapr_pci: Allow 2MiB and 16MiB IOMMU pagesizes by default We've had the qemu and kernel KVM infrastructure to handle larger TCE page sizes for a while, but forgot to update the defaults to actually allow them. This turns that change on. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-08-21 17:16:22 +10:00
Cornelia Huck	9aec2e52ce	hw: add compat machines for 4.2 Add 4.2 machine types for arm/i440fx/q35/s390x/spapr. For i440fx and q35, unversioned cpu models are still translated to -v1, as `0788a56bd1` ("i386: Make unversioned CPU models be aliases") states this should only transition to the latest cpu model version in 4.3 (or later). Signed-off-by: Cornelia Huck <cohuck@redhat.com> Message-Id: <20190724103524.20916-1-cohuck@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-08-21 11:32:11 +10:00
Markus Armbruster	54d31236b9	sysemu: Split sysemu/runstate.h off sysemu/sysemu.h sysemu/sysemu.h is a rather unfocused dumping ground for stuff related to the system-emulator. Evidence: * It's included widely: in my "build everything" tree, changing sysemu/sysemu.h still triggers a recompile of some 1100 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h, down from 5400 due to the previous two commits). * It pulls in more than a dozen additional headers. Split stuff related to run state management into its own header sysemu/runstate.h. Touching sysemu/sysemu.h now recompiles some 850 objects. qemu/uuid.h also drops from 1100 to 850, and qapi/qapi-types-run-state.h from 4400 to 4200. Touching new sysemu/runstate.h recompiles some 500 objects. Since I'm touching MAINTAINERS to add sysemu/runstate.h anyway, also add qemu/main-loop.h. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190812052359.30071-30-armbru@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> [Unbreak OS-X build]	2019-08-16 13:37:36 +02:00
Markus Armbruster	b58c5c2dd2	numa: Move remaining NUMA declarations from sysemu.h to numa.h Commit `e35704ba9c` "numa: Move NUMA declarations from sysemu.h to numa.h" left a few NUMA-related macros behind. Move them now. Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20190812052359.30071-26-armbru@redhat.com>	2019-08-16 13:31:53 +02:00
Markus Armbruster	a27bd6c779	Include hw/qdev-properties.h less In my "build everything" tree, changing hw/qdev-properties.h triggers a recompile of some 2700 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h). Many places including hw/qdev-properties.h (directly or via hw/qdev.h) actually need only hw/qdev-core.h. Include hw/qdev-core.h there instead. hw/qdev.h is actually pointless: all it does is include hw/qdev-core.h and hw/qdev-properties.h, which in turn includes hw/qdev-core.h. Replace the remaining uses of hw/qdev.h by hw/qdev-properties.h. While there, delete a few superfluous inclusions of hw/qdev-core.h. Touching hw/qdev-properties.h now recompiles some 1200 objects. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Daniel P. Berrangé" <berrange@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20190812052359.30071-22-armbru@redhat.com>	2019-08-16 13:31:53 +02:00
Markus Armbruster	650d103d3e	Include hw/hw.h exactly where needed In my "build everything" tree, changing hw/hw.h triggers a recompile of some 2600 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h). The previous commits have left only the declaration of hw_error() in hw/hw.h. This permits dropping most of its inclusions. Touching it now recompiles less than 200 objects. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-Id: <20190812052359.30071-19-armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>	2019-08-16 13:31:52 +02:00
Markus Armbruster	ca77ee28e0	Include migration/qemu-file-types.h a lot less In my "build everything" tree, changing migration/qemu-file-types.h triggers a recompile of some 2600 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h). The culprit is again hw/hw.h, which supposedly includes it for convenience. Include migration/qemu-file-types.h only where it's needed. Touching it now recompiles less than 200 objects. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190812052359.30071-10-armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>	2019-08-16 13:31:52 +02:00
Markus Armbruster	71e8a91585	Include sysemu/reset.h a lot less In my "build everything" tree, changing sysemu/reset.h triggers a recompile of some 2600 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h). The main culprit is hw/hw.h, which supposedly includes it for convenience. Include sysemu/reset.h only where it's needed. Touching it now recompiles less than 200 objects. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20190812052359.30071-9-armbru@redhat.com>	2019-08-16 13:31:52 +02:00
David Gibson	25c9780d38	spapr: Reset CAS & IRQ subsystem after devices This fixes a nasty regression in qemu-4.1 for the 'pseries' machine, caused by the new "dual" interrupt controller model. Specifically, qemu can crash when used with KVM if a 'system_reset' is requested while there's active I/O in the guest. The problem is that in spapr_machine_reset() we: 1. Reset the CAS vector state spapr_ovec_cleanup(spapr->ov5_cas); 2. Reset all devices qemu_devices_reset() 3. Reset the irq subsystem spapr_irq_reset(); However (1) implicitly changes the interrupt delivery mode, because whether we're using XICS or XIVE depends on the CAS state. We don't properly initialize the new irq mode until (3) though - in particular setting up the KVM devices. During (2), we can temporarily drop the BQL allowing some irqs to be delivered which will go to an irq system that's not properly set up. Specifically, if the previous guest was in (KVM) XIVE mode, the CAS reset will put us back in XICS mode. kvm_kernel_irqchip() still returns true, because XIVE was using KVM, however XICs doesn't have its KVM components intialized and kernel_xics_fd == -1. When the irq is delivered it goes via ics_kvm_set_irq() which assert()s that kernel_xics_fd != -1. This change addresses the problem by delaying the CAS reset until after the devices reset. The device reset should quiesce all the devices so we won't get irqs delivered while we mess around with the IRQ. The CAS reset and irq re-initialize should also now be under the same BQL critical section so nothing else should be able to interrupt it either. We also move the spapr_irq_msi_reset() used in one of the legacy irq modes, since it logically makes sense at the same point as the spapr_irq_reset() (it's essentially an equivalent operation for older machine types). Since we don't need to switch between different interrupt controllers for those old machine types it shouldn't actually be broken in those cases though. Cc: Cédric Le Goater <clg@kaod.org> Fixes: `b2e22477` "spapr: add a 'reset' method to the sPAPR IRQ backend" Fixes: `13db0cd9` "spapr: introduce a new sPAPR IRQ backend supporting XIVE and XICS" Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-08-13 15:59:21 +10:00
Igor Mammedov	cd5ff8333a	machine: show if CLI option '-numa node,mem' is supported in QAPI schema Legacy '-numa node,mem' option has a number of issues and mgmt often defaults to it. Unfortunately it's no possible to replace it with an alternative '-numa memdev' without breaking migration compatibility. What's possible though is to deprecate it, keeping option working with old machine types only. In order to help users to find out if being deprecated CLI option '-numa node,mem' is still supported by particular machine type, add new "numa-mem-supported" property to output of query-machines. "numa-mem-supported" is set to 'true' for machines that currently support NUMA, but it will be flipped to 'false' later on, once deprecation period expires and kept 'true' only for old machine types that used to support the legacy option so it won't break existing configuration that are using it. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <1560172207-378962-1-git-send-email-imammedo@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2019-07-05 17:08:03 -03:00
Like Xu	fe6b6346e9	hw/ppc: Replace global smp variables with machine smp properties The global smp variables in ppc are replaced with smp machine properties. A local variable of the same name would be introduced in the declaration phase if it's used widely in the context OR replace it on the spot if it's only used once. No semantic changes. Signed-off-by: Like Xu <like.xu@linux.intel.com> Message-Id: <20190518205428.90532-5-like.xu@linux.intel.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2019-07-05 17:07:36 -03:00
Like Xu	a0628599fa	machine: Refactor smp-related call chains to pass MachineState To get rid of the global smp_* variables we're currently using, it's recommended to pass MachineState in the list of incoming parameters for functions that use global smp variables, thus some redundant parameters are dropped. It's applied for legacy smbios_(), _machine_reset(), hot_add_cpu() and mips *_create_cpu(). Suggested-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Like Xu <like.xu@linux.intel.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-Id: <20190518205428.90532-3-like.xu@linux.intel.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2019-07-05 17:07:36 -03:00
Peter Maydell	a050901d4b	ppc patch queue 2019-06-12 Next pull request against qemu-4.1. The big thing here is adding support for hot plug of P2P bridges, and PCI devices under P2P bridges on the "pseries" machine (which doesn't use SHPC). Other than that there's just a handful of fixes and small enhancements. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAl0AkgwACgkQbDjKyiDZ s5Jyug//cwxP+t1t2CNHtffKwiXFzuEKx9YSNE1V0wog6aB40EbPKU72FzCq6FfA lev+pZWV9AwVMzFYe4VM/7Lqh7WFMYDT3DOXaZwfANs4471vYtgvPi21L2TBj80d hMszlyLWMLY9ByOzCxIq3xnbivGpA94G2q9rKbwXdK4T/5i62Pe3SIfgG+gXiiwW +YlHWCPX0I1cJz2bBs9ElXdl7ONWnn+7uDf7gNfWkTKuiUq6Ps7mxzy3GhJ1T7nz OFKmQ5dKzLJsgOULSSun8kWpXBmnPffkM3+fCE07edrWZVor09fMCk4HvtfaRy2K FFa2Kvzn/V/70TL+44dsSX4QcwdcHQztiaMO7UGPq9CMswx5L7gsNmfX6zvK1Nrb 1t7ORZKNJ72hMyvDPSMiGU2DpVjO3ZbBlSL4/xG8Qeal4An0kgkN5NcFlB/XEfnz dsKu9XzuGSeD1bWz1Mgcf1x7lPDBoHIKLcX6notZ8epP/otu4ywNFvAkPu4fk8s0 4jQGajIT7328SmzpjXClsmiEskpKsEr7hQjPRhu0hFGrhVc+i9PjkmbDl0TYRAf6 N6k6gJQAi+StJde2rcua1iS7Ra+Tka6QRKy+EctLqfqOKPb2VmkZ6fswQ3nfRRlT LgcTHt2iJcLeud2klVXs1e4pKXzXchkVyFL4ucvmyYG5VeimMzU= =ERgu -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-4.1-20190612' into staging ppc patch queue 2019-06-12 Next pull request against qemu-4.1. The big thing here is adding support for hot plug of P2P bridges, and PCI devices under P2P bridges on the "pseries" machine (which doesn't use SHPC). Other than that there's just a handful of fixes and small enhancements. # gpg: Signature made Wed 12 Jun 2019 06:47:56 BST # gpg: using RSA key 75F46586AE61A66CC44E87DC6C38CACA20D9B392 # gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" [full] # gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" [full] # gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" [full] # gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" [unknown] # Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392 * remotes/dgibson/tags/ppc-for-4.1-20190612: ppc/xive: Make XIVE generate the proper interrupt types ppc/pnv: activate the "dumpdtb" option on the powernv machine target/ppc: Use tcg_gen_gvec_bitsel spapr: Allow hot plug/unplug of PCI bridges and devices under PCI bridges spapr: Direct all PCI hotplug to host bridge, rather than P2P bridge spapr: Don't use bus number for building DRC ids spapr: Clean up DRC index construction spapr: Clean up spapr_drc_populate_dt() spapr: Clean up dt creation for PCI buses spapr: Clean up device tree construction for PCI devices spapr: Clean up device node name generation for PCI devices target/ppc: Fix lxvw4x, lxvh8x and lxvb16x spapr_pci: Improve error message Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2019-06-12 14:43:47 +01:00
Markus Armbruster	a8d2532645	Include qemu-common.h exactly where needed No header includes qemu-common.h after this commit, as prescribed by qemu-common.h's file comment. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190523143508.25387-5-armbru@redhat.com> [Rebased with conflicts resolved automatically, except for include/hw/arm/xlnx-zynqmp.h hw/arm/nrf51_soc.c hw/arm/msf2-soc.c block/qcow2-refcount.c block/qcow2-cluster.c block/qcow2-cache.c target/arm/cpu.h target/lm32/cpu.h target/m68k/cpu.h target/mips/cpu.h target/moxie/cpu.h target/nios2/cpu.h target/openrisc/cpu.h target/riscv/cpu.h target/tilegx/cpu.h target/tricore/cpu.h target/unicore32/cpu.h target/xtensa/cpu.h; bsd-user/main.c and net/tap-bsd.c fixed up]	2019-06-12 13:20:20 +02:00
David Gibson	cb60008706	spapr: Direct all PCI hotplug to host bridge, rather than P2P bridge A P2P bridge will attempt to handle the hotplug with SHPC, which doesn't work in the PAPR environment. Instead we want to direct all PCI hotplug actions to the PAPR specific host bridge which will use the PAPR hotplug mechanism. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Michael S. Tsirkin <mst@redhat.com>	2019-06-12 10:41:49 +10:00
David Gibson	9e7d38e8a3	spapr: Clean up spapr_drc_populate_dt() This makes some minor cleanups to spapr_drc_populate_dt(), renaming it to the shorter and more idiomatic spapr_dt_drc() along the way. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Acked-by: Michael S. Tsirkin <mst@redhat.com>	2019-06-12 10:41:49 +10:00
David Gibson	466e883185	spapr: Clean up dt creation for PCI buses Device nodes for PCI bridges (both host and P2P) describe both the bridge device itself and the bus hanging off it, handling of this is a bit of a mess. spapr_dt_pci_device() has a few things it only adds for non-bridges, but always adds #address-cells and #size-cells which should only appear for bridges. But the walking down the subordinate PCI bus is done in one of its callers spapr_populate_pci_devices_dt(). The PHB dt creation in spapr_populate_pci_dt() open codes some similar logic to the bridge case. This patch consolidates things in a bunch of ways: * Bus specific dt info is now created in spapr_dt_pci_bus() used for both P2P bridges and the host bridge. This includes walking subordinate devices * spapr_dt_pci_device() now calls spapr_dt_pci_bus() when called on a P2P bridge * We do detection of bridges with the is_bridge field of the device class, rather than checking PCI config space directly, for consistency with qemu's core PCI code. * Several things are renamed for brevity and clarity Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Michael S. Tsirkin <mst@redhat.com>	2019-06-12 10:41:49 +10:00
Greg Kurz	3725ef1a94	spapr: Don't migrate the hpt_maxpagesize cap to older machine types Commit 0b8c89be7f7b added the hpt_maxpagesize capability to the migration stream. This is okay for new machine types but it breaks backward migration to older QEMUs, which don't expect the extra subsection. Add a compatibility boolean flag to the sPAPR machine class and use it to skip migration of the capability for machine types 4.0 and older. This fixes migration to an older QEMU. Note that the destination will emit a warning: qemu-system-ppc64: warning: cap-hpt-max-page-size lower level (16) in incoming stream than on destination (24) This is expected and harmless though. It is okay to migrate from a lower HPT maximum page size (64k) to a greater one (16M). Fixes: 0b8c89be7f7b "spapr: Add forgotten capability to migration stream" Based-on: <20190522074016.10521-3-clg@kaod.org> Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155853262675.1158324.17301777846476373459.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-05-29 11:39:47 +10:00
Cédric Le Goater	bd94bc0647	spapr: change default interrupt mode to 'dual' Now that XIVE support is complete (QEMU emulated and KVM devices), change the pseries machine to advertise both interrupt modes: XICS (P7/P8) and XIVE (P9). The machine default interrupt modes depends on the version. Current settings are: pseries default interrupt mode 4.1 dual 4.0 xics 3.1 xics 3.0 legacy xics (different IRQ number space layout) Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190522074016.10521-3-clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-05-29 11:39:47 +10:00
David Gibson	eb3cba8272	spapr: Fix phb_placement backwards compatibility When we added support for NVLink2 passthrough devices, we changed the phb_placement hook to handle the placement of NVLink2 bridges' specific resources. For compatibility we use a version that doesn't do this allocation for old machine types. However, because of the delay between when the patch was posted and when it was merged, we ended up with that compatibility hook applying for machine versions 3.1 and earlier whereas it should apply for 4.0 and earlier (since the patch was applied early in the 4.1 tree). Fixes: `ec132efaa8` "spapr: Support NVIDIA V100 GPU with NVLink2" Reported-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Laurent Vivier <lvivier@redhat.com>	2019-05-29 11:39:45 +10:00
David Gibson	64d4a53431	spapr: Add forgotten capability to migration stream spapr machine capabilities are supposed to be sent in the migration stream so that we can sanity check the source and destination have compatible configuration. Unfortunately, when we added the hpt-max-page-size capability, we forgot to add it to the migration state. This means that we can generate spurious warnings when both ends are configured for large pages, or potentially fail to warn if the source is configured for huge pages, but the destination is not. Fixes: `2309832afd` "spapr: Maximum (HPT) pagesize property" Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org>	2019-05-29 11:39:45 +10:00
Suraj Jitindar Singh	83f192d34d	target/ppc: Add ibm,purr and ibm,spurr device-tree properties The ibm,purr and ibm,spurr device tree properties are used to indicate that the processor implements the Processor Utilisation of Resources Register (PURR) and Scaled Processor Utilisation of Resources Registers (SPURR), respectively. Each property has a single value which represents the level of architecture supported. A value of 1 for ibm,purr means support for the version of the PURR defined in book 3 in version 2.02 of the architecture. A value of 1 for ibm,spurr means support for the version of the SPURR defined in version 2.05 of the architecture. Add these properties for all processors for which the PURR and SPURR registers are generated. Fixes: `0da6f3fef9` "spapr: Reorganize CPU dt generation code" Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Message-Id: <20190506014803.21299-1-sjitindarsingh@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-05-29 11:39:44 +10:00
Peter Maydell	9ec34ecc97	ppc patch queue 2019-04-26 Here's the first ppc target pull request for qemu-4.1. This has a number of things that have accumulated while qemu-4.0 was frozen. * A number of emulated MMU improvements from Ben Herrenschmidt * Assorted cleanups fro Greg Kurz * A large set of mostly mechanical cleanups from me to make target/ppc much closer to compliant with the modern coding style * Support for passthrough of NVIDIA GPUs using NVLink2 As well as some other assorted fixes. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAlzCnusACgkQbDjKyiDZ s5LfhhAAuem5UBGKPKPj33c87HC+GGG+S4y89ic3ebyKplWulGgouHCa4Dnc7Y5m 9MfIEcljRDpuRJCEONo6yg9aaRb3cW2Go9TpTwxmF8o1suG/v5bIQIdiRbBuMa2t yhNujVg5kkWSU1G4mCZjL9FS2ADPsxsKZVd73DPEqjlNJg981+2qtSnfR8SXhfnk dSSKxyfC6Hq1+uhGkLI+xtft+BCTWOstjz+efHpZ5l2mbiaMeh7zMKrIXXy/FtKA ufIyxbZznMS5MAZk7t90YldznfwOCqfh3di1kx8GTZ40LkBKbuI5LLHTG0sT75z5 LHwFuLkBgWmS8RyIRRh9opr7ifrayHx8bQFpW368Qu+PbPzUCcTVIrWUfPmaNR74 CkYJvhiYZfTwKtUeP7b2wUkHpZF4KINI4TKNaS4QAlm3DNbO67DFYkBrytpXsSzv smEpe+sqlbY40olw9q4ESP80r+kGdEPLkRjfdj0R7qS4fsqAH1bjuSkNqlPaCTJQ hNsoz2D+f56z0bBq4x8FRzDpqnBkdy4x6PlLxkJuAaV7WAtvq7n7tiMA3TRr/rIB OYFP2xPNajjP8MfyOB94+S4WDltmsgXoM7HyyvrKp2JBpe7mFjpep5fMp5GUpweV OOYrTsN1Nuu3kFpeimEc+IOyp1BWXnJF4vHhKTOqHeqZEs5Fgus= =RpAK -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-4.1-20190426' into staging ppc patch queue 2019-04-26 Here's the first ppc target pull request for qemu-4.1. This has a number of things that have accumulated while qemu-4.0 was frozen. * A number of emulated MMU improvements from Ben Herrenschmidt * Assorted cleanups fro Greg Kurz * A large set of mostly mechanical cleanups from me to make target/ppc much closer to compliant with the modern coding style * Support for passthrough of NVIDIA GPUs using NVLink2 As well as some other assorted fixes. # gpg: Signature made Fri 26 Apr 2019 07:02:19 BST # gpg: using RSA key 75F46586AE61A66CC44E87DC6C38CACA20D9B392 # gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" [full] # gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" [full] # gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" [full] # gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" [unknown] # Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392 * remotes/dgibson/tags/ppc-for-4.1-20190426: (36 commits) target/ppc: improve performance of large BAT invalidations ppc/hash32: Rework R and C bit updates ppc/hash64: Rework R and C bit updates ppc/spapr: Use proper HPTE accessors for H_READ target/ppc: Don't check UPRT in radix mode when in HV real mode target/ppc/kvm: Convert DPRINTF to traces target/ppc/trace-events: Fix trivial typo spapr: Drop duplicate PCI swizzle code spapr_pci: Get rid of duplicate code for node name creation target/ppc: Style fixes for translate/spe-impl.inc.c target/ppc: Style fixes for translate/vmx-impl.inc.c target/ppc: Style fixes for translate/vsx-impl.inc.c target/ppc: Style fixes for translate/fp-impl.inc.c target/ppc: Style fixes for translate.c target/ppc: Style fixes for translate_init.inc.c target/ppc: Style fixes for monitor.c target/ppc: Style fixes for mmu_helper.c target/ppc: Style fixes for mmu-hash64.[ch] target/ppc: Style fixes for mmu-hash32.[ch] target/ppc: Style fixes for misc_helper.c ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2019-04-27 21:34:46 +01:00
Benjamin Herrenschmidt	a2dd4e83e7	ppc/hash64: Rework R and C bit updates With MT-TCG, we are now running translation in a racy way, thus we need to mimic hardware when it comes to updating the R and C bits, by doing byte stores. The current "store_hpte" abstraction is ill suited for this, we replace it with two separate callbacks for setting R and C. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190411080004.8690-4-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-04-26 11:37:57 +10:00
Alexey Kardashevskiy	ec132efaa8	spapr: Support NVIDIA V100 GPU with NVLink2 NVIDIA V100 GPUs have on-board RAM which is mapped into the host memory space and accessible as normal RAM via an NVLink bus. The VFIO-PCI driver implements special regions for such GPUs and emulates an NVLink bridge. NVLink2-enabled POWER9 CPUs also provide address translation services which includes an ATS shootdown (ATSD) register exported via the NVLink bridge device. This adds a quirk to VFIO to map the GPU memory and create an MR; the new MR is stored in a PCI device as a QOM link. The sPAPR PCI uses this to get the MR and map it to the system address space. Another quirk does the same for ATSD. This adds additional steps to sPAPR PHB setup: 1. Search for specific GPUs and NPUs, collect findings in sPAPRPHBState::nvgpus, manage system address space mappings; 2. Add device-specific properties such as "ibm,npu", "ibm,gpu", "memory-block", "link-speed" to advertise the NVLink2 function to the guest; 3. Add "mmio-atsd" to vPHB to advertise the ATSD capability; 4. Add new memory blocks (with extra "linux,memory-usable" to prevent the guest OS from accessing the new memory until it is onlined) and npuphb# nodes representing an NPU unit for every vPHB as the GPU driver uses it for link discovery. This allocates space for GPU RAM and ATSD like we do for MMIOs by adding 2 new parameters to the phb_placement() hook. Older machine types set these to zero. This puts new memory nodes in a separate NUMA node to as the GPU RAM needs to be configured equally distant from any other node in the system. Unlike the host setup which assigns numa ids from 255 downwards, this adds new NUMA nodes after the user configures nodes or from 1 if none were configured. This adds requirement similar to EEH - one IOMMU group per vPHB. The reason for this is that ATSD registers belong to a physical NPU so they cannot invalidate translations on GPUs attached to another NPU. It is guaranteed by the host platform as it does not mix NVLink bridges or GPUs from different NPU in the same IOMMU group. If more than one IOMMU group is detected on a vPHB, this disables ATSD support for that vPHB and prints a warning. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [aw: for vfio portions] Acked-by: Alex Williamson <alex.williamson@redhat.com> Message-Id: <20190312082103.130561-1-aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-04-26 10:41:23 +10:00
Cornelia Huck	9bf2650bc3	hw: add compat machines for 4.1 Add 4.1 machine types for arm/i440fx/q35/s390x/spapr. Signed-off-by: Cornelia Huck <cohuck@redhat.com> Message-Id: <20190411102025.22559-1-cohuck@redhat.com> Acked-by: Greg Kurz <groug@kaod.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2019-04-25 14:16:41 -03:00
Cédric Le Goater	273fef83f6	spapr/irq: Add XIVE sanity checks on non-P9 machines On non-P9 machines, the XIVE interrupt mode is not advertised, see spapr_dt_ov5_platform_support(). Add a couple of checks on the machine configuration to filter bogus setups and prevent OS failures : Interrupt modes CPU/Compat XICS XIVE dual P8/P8 OK QEMU failure (1) OK (3) P9/P8 OK QEMU failure (2) OK (3) P9/P9 OK OK OK (1) CPU exception model is incompatible with XIVE and the presenters will fail to realize. (2) CPU exception model is compatible with XIVE, but the XIVE CAS advertisement is dropped when in POWER8 mode. So we could ended up booting with the XIVE DT properties but without the HCALLs. Avoid confusing Linux with such settings and fail under QEMU. (3) force XICS in machine init Remove the check on XIVE-only machines in spapr_machine_init(), which has now become redundant. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190328100044.11408-1-clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-03-29 10:38:20 +11:00
David Gibson	0a794529bd	spapr: Simplify handling of host-serial and host-model values `27461d69a0` "ppc: add host-serial and host-model machine attributes (CVE-2019-8934)" introduced 'host-serial' and 'host-model' machine properties for spapr to explicitly control the values advertised to the guest in device tree properties with the same names. The previous behaviour on KVM was to unconditionally populate the device tree with the real host serial number and model, which leaks possibly sensitive information about the host to the guest. To maintain compatibility for old machine types, we allowed those props to be set to "passthrough" to take the value from the host as before. Or they could be set to "none" to explicitly omit the device tree items. Special casing specific values on what's otherwise a user supplied string is very ugly. So, this patch simplifies things by implementing the backwards compatibility in a different way: we have a machine class flag set for the older machines, and we only load the host values into the device tree if A) they're not set by the user and B) we have that flag set. This does mean that the "passthrough" functionality is no longer available with the current machine type. That's ok though: if a user or management layer really wants the information passed through they can read it themselves (OpenStack Nova already does something similar for x86). It also means the user can't explicitly ask for the values to be omitted on the old machine types. I think that's an acceptable trade-off: if you care enough about not leaking the host information you can either move to the new machine type, or use a dummy value for the properties. For the new machine type, this also removes an odd inconsistency between running on a POWER and non-POWER (or non-Linux) hosts: if the host information couldn't be read from where we expect (in the host's device tree as exposed by Linux), we'd fallback to omitting the guest device tree items. While we're there, improve some poorly worded comments, and the help text for the properties. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Tested-by: Greg Kurz <groug@kaod.org>	2019-03-29 10:25:50 +11:00
David Gibson	ce2918cbc3	spapr: Use CamelCase properly The qemu coding standard is to use CamelCase for type and structure names, and the pseries code follows that... sort of. There are quite a lot of places where we bend the rules in order to preserve the capitalization of internal acronyms like "PHB", "TCE", "DIMM" and most commonly "sPAPR". That was a bad idea - it frequently leads to names ending up with hard to read clusters of capital letters, and means they don't catch the eye as type identifiers, which is kind of the point of the CamelCase convention in the first place. In short, keeping type identifiers look like CamelCase is more important than preserving standard capitalization of internal "words". So, this patch renames a heap of spapr internal type names to a more standard CamelCase. In addition to case changes, we also make some other identifier renames: VIOsPAPR* -> SpaprVio* The reverse word ordering was only ever used to mitigate the capital cluster, so revert to the natural ordering. VIOsPAPRVTYDevice -> SpaprVioVty VIOsPAPRVLANDevice -> SpaprVioVlan Brevity, since the "Device" didn't add useful information sPAPRDRConnector -> SpaprDrc sPAPRDRConnectorClass -> SpaprDrcClass Brevity, and makes it clearer this is the same thing as a "DRC" mentioned in many other places in the code This is 100% a mechanical search-and-replace patch. It will, however, conflict with essentially any and all outstanding patches touching the spapr code. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-03-12 14:33:05 +11:00
Suraj Jitindar Singh	68f9f70841	target/ppc/spapr: Enable H_PAGE_INIT in-kernel handling The H_CALL H_PAGE_INIT can be used to zero or copy a page of guest memory. Enable the in-kernel H_PAGE_INIT handler. The in-kernel handler takes half the time to complete compared to handling the H_CALL in userspace. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Message-Id: <20190306060608.19935-1-sjitindarsingh@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-03-12 14:33:04 +11:00
Suraj Jitindar Singh	176dcceedd	target/ppc/spapr: Clear partition table entry when allocating hash table If we allocate a hash page table then we know that the guest won't be using process tables, so set the partition table entry maintained for the guest to zero. If this isn't done, then the guest radix bit will remain set in the entry. This means that when the guest calls H_REGISTER_PROCESS_TABLE there will be a mismatch between then flags and the value in spapr->patb_entry, and the call will fail. The guest will then panic: Failed to register process table (rc=-4) kernel BUG at arch/powerpc/platforms/pseries/lpar.c:959 The result being that it isn't possible to boot a hash guest on a P9 system. Also fix a bug in the flags parsing in h_register_process_table() which was introduced by the same patch, and simplify the handling to make it less likely that errors will be introduced in the future. The effect would have been setting the host radix bit LPCR_HR for a hash guest using process tables, which currently isn't supported and so couldn't have been triggered. Fixes: `00fd075e18` "target/ppc/spapr: Set LPCR:HR when using Radix mode" Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Message-Id: <20190305022102.17610-1-sjitindarsingh@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-03-12 14:33:04 +11:00
Suraj Jitindar Singh	2782ad4c41	target/ppc/spapr: Enable mitigations by default for pseries-4.0 machine type There are currently 3 mitigations the availability of which is controlled by the spapr-caps mechanism, cap-cfpc, cap-sbbc, and cap-ibs. Enable these mitigations by default for the pseries-4.0 machine type. By now machine firmware should have been upgraded to allow these settings. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Message-Id: <20190301044609.9626-3-sjitindarsingh@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-03-12 14:33:04 +11:00
Suraj Jitindar Singh	8ff43ee404	target/ppc/spapr: Add SPAPR_CAP_CCF_ASSIST Introduce a new spapr_cap SPAPR_CAP_CCF_ASSIST to be used to indicate the requirement for a hw-assisted version of the count cache flush workaround. The count cache flush workaround is a software workaround which can be used to flush the count cache on context switch. Some revisions of hardware may have a hardware accelerated flush, in which case the software flush can be shortened. This cap is used to set the availability of such hardware acceleration for the count cache flush routine. The availability of such hardware acceleration is indicated by the H_CPU_CHAR_BCCTR_FLUSH_ASSIST flag being set in the characteristics returned from the KVM_PPC_GET_CPU_CHAR ioctl. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Message-Id: <20190301031912.28809-2-sjitindarsingh@gmail.com> [dwg: Small style fixes] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-03-12 12:07:49 +11:00
Suraj Jitindar Singh	edaa799559	target/ppc/spapr: Enable the large decrementer for pseries-4.0 Enable the large decrementer by default for the pseries-4.0 machine type. It is disabled again by default_caps_with_cpu() for pre-POWER9 cpus since they don't support the large decrementer. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Message-Id: <20190301024317.22137-4-sjitindarsingh@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-03-12 12:07:49 +11:00
Suraj Jitindar Singh	a8dafa5251	target/ppc: Implement large decrementer support for TCG Prior to POWER9 the decrementer was a 32-bit register which decremented with each tick of the timebase. From POWER9 onwards the decrementer can be set to operate in a mode called large decrementer where it acts as a n-bit decrementing register which is visible as a 64-bit register, that is the value of the decrementer is sign extended to 64 bits (where n is implementation dependant). The mode in which the decrementer operates is controlled by the LPCR_LD bit in the logical paritition control register (LPCR). >From POWER9 onwards the HDEC (hypervisor decrementer) was enlarged to h-bits, also sign extended to 64 bits (where h is implementation dependant). Note this isn't configurable and is always enabled. On POWER9 the large decrementer and hdec are both 56 bits, as represented by the lrg_decr_bits cpu class property. Since they are the same size we only add one property for now, which could be extended in the case they ever differ in the future. We also add the lrg_decr_bits property for POWER5+/7/8 since it is used to determine the size of the hdec, which is only generated on the POWER5+ processor and later. On these processors it is 32 bits. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190301024317.22137-2-sjitindarsingh@gmail.com> [dwg: Small style fixes] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-03-12 12:07:49 +11:00
Suraj Jitindar Singh	c982f5cf9a	target/ppc/spapr: Add SPAPR_CAP_LARGE_DECREMENTER Add spapr_cap SPAPR_CAP_LARGE_DECREMENTER to be used to control the availability of the large decrementer for a guest. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Message-Id: <20190301024317.22137-1-sjitindarsingh@gmail.com> [dwg: Trivial style fix] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-03-12 12:07:49 +11:00
Greg Kurz	23ff81bdfd	spapr: Simulate CAS for qtest The RTAS event hotplug code for machine types 2.8 and newer depends on the CAS negotiated ov5 in order to work properly. However, there's no CAS when running under qtest. There has been a tentative to trick the code by faking the OV5_HP_EVT bit, but it turned out to break other assumptions in the code and the change got reverted. Go for a more general approach and simulate a CAS when running under qtest. For simplicity, this pseudo CAS simple simulates the case where the guest supports the same features as the machine. It is done at reset time, just before we reset the DRCs, which could potentially exercise the unplug code. This allows to test unplug on spapr with both older and newer machine types. Suggested-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155146875704.147873.10563808578795890265.stgit@bahia.lan> Tested-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-03-12 10:50:59 +11:00
David Hildenbrand	07578b0ad6	qdev: Let the hotplug_handler_unplug() caller delete the device When unplugging a device, at one point the device will be destroyed via object_unparent(). This will, one the one hand, unrealize the removed device hierarchy, and on the other hand, destroy/free the device hierarchy. When chaining hotplug handlers, we want to overwrite a bus hotplug handler by the machine hotplug handler, to be able to perform some part of the plug/unplug and to forward the calls to the bus hotplug handler. For now, the bus hotplug handler would trigger an object_unparent(), not allowing us to perform some unplug action on a device after we forwarded the call to the bus hotplug handler. The device would be gone at that point. machine_unplug_handler(dev) /* eventually do unplug stuff / bus_unplug_handler(dev) / dev is gone, we can't do more unplug stuff / So move the object_unparent() to the original caller of the unplug. For now, keep the unrealize() at the original places of the object_unparent(). For implicitly chained hotplug handlers (e.g. pc code calling acpi hotplug handlers), the object_unparent() has to be done by the outermost caller. So when calling hotplug_handler_unplug() from inside an unplug handler, nothing is to be done. hotplug_handler_unplug(dev) -> calls machine_unplug_handler() machine_unplug_handler(dev) { / eventually do unplug stuff / bus_unplug_handler(dev) -> calls unrealize(dev) / we can do more unplug stuff but device already unrealized / } object_unparent(dev) In the long run, every unplug action should be factored out of the unrealize() function into the unplug handler (especially for PCI). Then we can get rid of the additonal unrealize() calls and object_unparent() will properly unrealize the device hierarchy after the device has been unplugged. hotplug_handler_unplug(dev) -> calls machine_unplug_handler() machine_unplug_handler(dev) { / eventually do unplug stuff / bus_unplug_handler(dev) -> only unplugs, does not unrealize / we can do more unplug stuff */ } object_unparent(dev) -> will unrealize The original approach was suggested by Igor Mammedov for the PCI part, but I extended it to all hotplug handlers. I consider this one step into the right direction. To summarize: - object_unparent() on synchronous unplugs is done by common code -- "Caller of hotplug_handler_unplug" - object_unparent() on asynchronous unplugs ("unplug requests") has to be done manually -- "Caller of hotplug_handler_unplug" Reviewed-by: Igor Mammedov <imammedo@redhat.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20190228122849.4296-2-david@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2019-03-06 11:51:08 -03:00
Eric Auger	dc0ca80eb1	hw/boards: Add a MachineState parameter to kvm_type callback On ARM, the kvm_type will be resolved by querying the KVMState. Let's add the MachineState handle to the callback so that we can retrieve the KVMState handle. in kvm_init, when the callback is called, the kvm_state variable is not yet set. Signed-off-by: Eric Auger <eric.auger@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Message-id: 20190304101339.25970-5-eric.auger@redhat.com [ppc parts] Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2019-03-05 15:55:09 +00:00
Thomas Huth	f6d4dca807	hw/ppc: Use object_initialize_child for correct reference counting Both functions, object_initialize() and object_property_add_child() increase the reference counter of the new object, so one of the references has to be dropped afterwards to get the reference counting right. Otherwise the child object will not be properly cleaned up when the parent gets destroyed. Thus let's use now object_initialize_child() instead to get the reference counting here right. Suggested-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <1550748288-30598-1-git-send-email-thuth@redhat.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-02-26 09:21:25 +11:00
Michael Roth	dae5e39ada	spapr: enable PHB hotplug for default pseries machine type The 'dr_phb_enabled' field of that class can be set as part of machine-specific init code. It will be used to conditionally enable creation of DRC objects and device-tree description to facilitate hotplug of PHBs. Since we can't migrate this state to older machine types, default the option to true and disable it for older machine types. Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <155059673433.1466090.6188091133769611501.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-02-26 09:21:25 +11:00
Greg Kurz	bb2bdd812e	spapr: add hotplug hooks for PHB hotplug Hotplugging PHBs is a machine-level operation, but PHBs reside on the main system bus, so we register spapr machine as the handler for the main system bus. Provide the usual pre-plug, plug and unplug-request handlers. Move the checking of the PHB index to the pre-plug handler. It is okay to do that and assert in the realize function because the pre-plug handler is always called, even for the oldest machine types we support. Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> (Fixed interrupt controller phandle in "interrupt-map" and TCE table size in "ibm,dma-window" FDT fragment, Greg Kurz) Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155059672926.1466090.13612804072190051439.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-02-26 09:21:25 +11:00
Michael Roth	0a0a66cd1b	spapr_pci: provide node start offset via spapr_populate_pci_dt() PHB hotplug re-uses PHB device tree generation code and passes it to a guest via RTAS. Doing this requires knowledge of where exactly in the device tree the node describing the PHB begins. Provide this via a new optional pointer that can be used to store the PHB node's start offset. Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155059671912.1466090.10891589403973703473.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-02-26 09:21:25 +11:00
Nathan Fontenot	3998ccd092	spapr: populate PHB DRC entries for root DT node This add entries to the root OF node to advertise our PHBs as being DR-capable in accordance with PAPR specification. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155059670897.1466090.10843921337591637414.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-02-26 09:21:25 +11:00
Michael Roth	962b6c3650	spapr: create DR connectors for PHBs Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155059670389.1466090.10015601248906623076.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-02-26 09:21:25 +11:00
Greg Kurz	09d876ce2c	spapr/drc: Drop spapr_drc_attach() fdt argument All DRC subtypes have been converted to generate the FDT fragment at configure connector time instead of attach time. The fdt and fdt_offset arguments of spapr_drc_attach() aren't needed anymore. Drop them and make the implementation of the dt_populate() method mandatory. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155059667853.1466090.16527852453054217565.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-02-26 09:21:25 +11:00
Greg Kurz	345b12b99e	spapr: Generate FDT fragment for CPUs at configure connector time Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155059666839.1466090.3833376527523126752.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-02-26 09:21:25 +11:00
Greg Kurz	62d38c9bd3	spapr: Generate FDT fragment for LMBs at configure connector time Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155059666331.1466090.6766540766297333313.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-02-26 09:21:25 +11:00
Benjamin Herrenschmidt	79825f4d58	target/ppc: Rename PATB/PATBE -> PATE That "b" means "base address" and thus shouldn't be in the name of actual entries and related constants. This patch keeps the synthetic patb_entry field of the spapr virtual hypervisor unchanged until I figure out if that has an impact on the migration stream. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190215170029.15641-11-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-02-26 09:21:25 +11:00
Benjamin Herrenschmidt	3054b0ca4b	target/ppc: Fix ordering of hash MMU accesses With mttcg, we can have MMU lookups happening at the same time as the guest modifying the page tables. Since the HPTEs of the hash table MMU contains two words (or double worlds on 64-bit), we need to make sure we read them in the right order, with the correct memory barrier. Additionally, when using emulated SPAPR mode, the hypercalls writing to the hash table must also perform the udpates in the right order. Note: This part is still not entirely correct Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190215170029.15641-7-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-02-26 09:21:25 +11:00
Benjamin Herrenschmidt	00fd075e18	target/ppc/spapr: Set LPCR:HR when using Radix mode The HW relies on LPCR:HR along with the PATE to determine whether to use Radix or Hash mode. In fact it uses LPCR:HR more commonly than the PATE. For us, it's also more efficient to do so, especially since unlike the HW we do not maintain a cache of the current PATE and HV PATE in a generic place. Prepare the grounds for that by ensuring that LPCR:HR is set properly on SPAPR machines. Another option would have been to use a callback to get the PATE but this gets messy when implementing bare metal support, it's much simpler (and faster) to use LPCR. Since existing migration streams may not have it, fix it up in spapr_post_load() as well based on the pseudo-PATE entry that we keep. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190215170029.15641-2-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-02-26 09:21:25 +11:00
Prasad J Pandit	27461d69a0	ppc: add host-serial and host-model machine attributes (CVE-2019-8934) On ppc hosts, hypervisor shares following system attributes - /proc/device-tree/system-id - /proc/device-tree/model with a guest. This could lead to information leakage and misuse.[] Add machine attributes to control such system information exposure to a guest. [] https://wiki.openstack.org/wiki/OSSN/OSSN-0028 Reported-by: Daniel P. Berrangé <berrange@redhat.com> Fix-suggested-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org> Message-Id: <20190218181349.23885-1-ppandit@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-02-26 09:21:25 +11:00
Fabiano Rosas	cc941111a5	spapr: fix out of bounds write in spapr_populate_drmem_v2 buf_len is uint8_t which is not large enough to hold the result of: nr_entries * sizeof(struct sPAPRDrconfCellV2) + sizeof(uint32_t); for a nr_entries greater than 10. This causes the allocated buffer 'int_buf' to be smaller than expected and we eventually overwrite some of glibc's control structures (see "chunk" in https://sourceware.org/glibc/wiki/MallocInternals) The following error is seen while trying to free int_buf: "free(): invalid next size (fast)" Fixes: `a324d6f166` "spapr: Support ibm,dynamic-memory-v2 property" Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Message-Id: <20190213172926.21740-1-farosas@linux.ibm.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-02-17 21:54:02 +11:00
Greg Kurz	5c7adcf422	spapr: Rename xics to intc in interrupt controller agnostic code All this code is used with both the XICS and XIVE interrupt controllers. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-02-17 21:54:02 +11:00
Liam Merwick	4366e1db16	elf: Add optional function ptr to load_elf() to parse ELF notes This patch adds an optional function pointer, 'elf_note_fn', to load_elf() which causes load_elf() to additionally parse any ELF program headers of type PT_NOTE and check to see if the ELF Note is of the type specified by the 'translate_opaque' arg. If a matching ELF Note is found then the specfied function pointer is called to process the ELF note. Passing a NULL function pointer results in ELF Notes being skipped. The first consumer of this functionality is the PVHboot support which needs to read the XEN_ELFNOTE_PHYS32_ENTRY ELF Note while loading the uncompressed kernel binary in order to discover the boot entry address for the x86/HVM direct boot ABI. Signed-off-by: Liam Merwick <liam.merwick@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-02-05 16:50:16 +01:00
Thomas Huth	6e66d0c648	hw/ppc/spapr: Add support for "-vga cirrus" The cirrus VGA card has been enabled in the PPC builds with commit `29f9cef39e` ("ppc: Include vga cirrus card into the compiling process") last year. It also works on the pseries machine, even SLOF contains support for this card, so we can also support this for the "-vga" parameter here. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-02-04 18:44:19 +11:00
Alexey Kardashevskiy	df269271a9	spapr: Drop unused parameters from fdt building helper spapr_load_rtas() handles now RTAS address and size information in the FDT so drop them from spapr_build_fdt(). While we are here, fix a small typo. Fixes: `3f5dabceba` "pseries: Consolidate construction of /rtas device tree node" Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-02-04 18:44:19 +11:00
Cédric Le Goater	a28b9a5a8d	spapr: move the interrupt presenters under machine_data Next step is to remove them from under the PowerPCCPU Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-02-04 18:44:18 +11:00
Greg Kurz	21df5e4ffa	spapr: Forbid setting ic-mode for old machine types Machine types 3.0 and older only know about the legacy XICS backend. Make it clear by erroring out if the user tries to set ic-mode on such machines. Signed-off-by: Greg Kurz <groug@kaod.org> Tested-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-02-04 18:44:18 +11:00

1 2 3 4 5 ...

868 Commits