qemu-e2k

Author	SHA1	Message	Date
Daniel Henrique Barboza	f1aa45fffe	spapr: introduce SpaprMachineState::numa_assoc_array The next step to centralize all NUMA/associativity handling in the spapr machine is to create a 'one stop place' for all things ibm,associativity. This patch introduces numa_assoc_array, a 2 dimensional array that will store all ibm,associativity arrays of all NUMA nodes. This array is initialized in a new spapr_numa_associativity_init() function, called in spapr_machine_init(). It is being initialized with the same values used in other ibm,associativity properties around spapr files (i.e. all zeros, last value is node_id). The idea is to remove all hardcoded definitions and FDT writes of ibm,associativity arrays, doing instead a call to the new helper spapr_numa_write_associativity_dt() helper, that will be able to write the DT with the correct values. We'll start small, handling the trivial cases first. The remaining instances of ibm,associativity will be handled next. Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20200903220639.563090-2-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-09-08 10:08:43 +10:00
Greg Kurz	4a6891b838	spapr: Simplify error handling in spapr_phb_realize() The spapr_phb_realize() function has a local_err variable which is used to: 1) check failures of spapr_irq_findone() and spapr_irq_claim() 2) prepend extra information to the error message Recent work from Markus Armbruster highlighted we get better code when testing the return value of a function, rather than setting up all the local_err boiler plate. For similar reasons, it is now preferred to use ERRP_GUARD() and error_prepend() rather than error_propagate_prepend(). Since spapr_irq_findone() and spapr_irq_claim() return negative values in case of failure, do both changes. This is just cleanup, no functional impact. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <159707843851.1489912.6108405733810934642.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-08-13 21:00:52 +10:00
Reza Arbab	a6030d7e0b	spapr: Add a new level of NUMA for GPUs NUMA nodes corresponding to GPU memory currently have the same affinity/distance as normal memory nodes. Add a third NUMA associativity reference point enabling us to give GPU nodes more distance. This is guest visible information, which shouldn't change under a running guest across migration between different qemu versions, so make the change effective only in new (pseries > 5.0) machine types. Before, `numactl -H` output in a guest with 4 GPUs (nodes 2-5): node distances: node 0 1 2 3 4 5 0: 10 40 40 40 40 40 1: 40 10 40 40 40 40 2: 40 40 10 40 40 40 3: 40 40 40 10 40 40 4: 40 40 40 40 10 40 5: 40 40 40 40 40 10 After: node distances: node 0 1 2 3 4 5 0: 10 40 80 80 80 80 1: 40 10 80 80 80 80 2: 80 80 10 80 80 80 3: 80 80 80 10 80 80 4: 80 80 80 80 10 80 5: 80 80 80 80 80 10 These are the same distances as on the host, mirroring the change made to host firmware in skiboot commit f845a648b8cb ("numa/associativity: Add a new level of NUMA for GPU's"). Signed-off-by: Reza Arbab <arbab@linux.ibm.com> Message-Id: <20200716225655.24289-1-arbab@linux.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-07-20 09:21:39 +10:00
Greg Kurz	a4beb5f5d4	spapr_pci: Robustify support of PCI bridges Some recent error handling cleanups unveiled issues with our support of PCI bridges: 1) QEMU aborts when using non-standard PCI bridge types, unveiled by commit `7ef1553dac` "spapr_pci: Drop some dead error handling" $ qemu-system-ppc64 -M pseries -device pcie-pci-bridge Unexpected error in object_property_find() at qom/object.c:1240: qemu-system-ppc64: -device pcie-pci-bridge: Property '.chassis_nr' not found Aborted (core dumped) This happens because we assume all PCI bridge types to have a "chassis_nr" property. This property only exists with the standard PCI bridge type "pci-bridge" actually. We could possibly revert `7ef1553dac` but it seems much simpler to check the presence of "chassis_nr" earlier. 2) QEMU abort if same "chassis_nr" value is used several times, unveiled by commit `d2623129a7` "qom: Drop parameter @errp of object_property_add() & friends" $ qemu-system-ppc64 -M pseries -device pci-bridge,chassis_nr=1 \ -device pci-bridge,chassis_nr=1 Unexpected error in object_property_try_add() at qom/object.c:1167: qemu-system-ppc64: -device pci-bridge,chassis_nr=1: attempt to add duplicate property '40000100' to object (type 'container') Aborted (core dumped) This happens because we assume that "chassis_nr" values are unique, but nobody enforces that and we end up generating duplicate DRC ids. The PCI code doesn't really care for duplicate "chassis_nr" properties since it is only used to initialize the "Chassis Number Register" of the bridge, with no functional impact on QEMU. So, even if passing the same value several times might look weird, it never broke anything before, so I guess we don't necessarily want to enforce strict checking in the PCI code now. Workaround both issues in the PAPR code: check that the bridge has a unique and non null "chassis_nr" when plugging it into its parent bus. Fixes: `05929a6c5d` ("spapr: Don't use bus number for building DRC ids") Fixes: `7ef1553dac` ("spapr_pci: Drop some dead error handling") Fixes: `d2623129a7` ("qom: Drop parameter @errp of object_property_add() & friends") Reported-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <159431476748.407044.16711294833569014964.stgit@bahia.lan> [dwg: Move check slightly to a better place] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-07-20 09:21:39 +10:00
Markus Armbruster	dcfe480544	error: Avoid unnecessary error_propagate() after error_setg() Replace error_setg(&err, ...); error_propagate(errp, err); by error_setg(errp, ...); Related pattern: if (...) { error_setg(&err, ...); goto out; } ... out: error_propagate(errp, err); return; When all paths to label out are that way, replace by if (...) { error_setg(errp, ...); return; } and delete the label along with the error_propagate(). When we have at most one other path that actually needs to propagate, and maybe one at the end that where propagation is unnecessary, e.g. foo(..., &err); if (err) { goto out; } ... bar(..., &err); out: error_propagate(errp, err); return; move the error_propagate() to where it's needed, like if (...) { foo(..., &err); error_propagate(errp, err); return; } ... bar(..., errp); return; and transform the error_setg() as above. In some places, the transformation results in obviously unnecessary error_propagate(). The next few commits will eliminate them. Bonus: the elimination of gotos will make later patches in this series easier to review. Candidates for conversion tracked down with this Coccinelle script: @@ identifier err, errp; expression list args; @@ - error_setg(&err, args); + error_setg(errp, args); ... when != err error_propagate(errp, err); Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200707160613.848843-34-armbru@redhat.com>	2020-07-10 15:18:08 +02:00
Markus Armbruster	5325cc34a2	qom: Put name parameter before value / visitor parameter The object_property_set_FOO() setters take property name and value in an unusual order: void object_property_set_FOO(Object obj, FOO_TYPE value, const char name, Error **errp) Having to pass value before name feels grating. Swap them. Same for object_property_set(), object_property_get(), and object_property_parse(). Convert callers with this Coccinelle script: @@ identifier fun = { object_property_get, object_property_parse, object_property_set_str, object_property_set_link, object_property_set_bool, object_property_set_int, object_property_set_uint, object_property_set, object_property_set_qobject }; expression obj, v, name, errp; @@ - fun(obj, v, name, errp) + fun(obj, name, v, errp) Chokes on hw/arm/musicpal.c's lcd_refresh() with the unhelpful error message "no position information". Convert that one manually. Fails to convert hw/arm/armsse.c, because Coccinelle gets confused by ARMSSE being used both as typedef and function-like macro there. Convert manually. Fails to convert hw/rx/rx-gdbsim.c, because Coccinelle gets confused by RXCPU being used both as typedef and function-like macro there. Convert manually. The other files using RXCPU that way don't need conversion. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200707160613.848843-27-armbru@redhat.com> [Straightforwad conflict with commit `2336172d9b` "audio: set default value for pcspk.iobase property" resolved]	2020-07-10 15:18:08 +02:00
Markus Armbruster	9bc6bfdf67	qdev: Drop qbus_set_hotplug_handler() parameter @errp qbus_set_hotplug_handler() is a simple wrapper around object_property_set_link(). object_property_set_link() fails when the property doesn't exist, is not settable, or its .check() method fails. These are all programming errors here, so passing &error_abort to qbus_set_hotplug_handler() is appropriate. Most of its callers do. Exceptions: * pcie_cap_slot_init(), shpc_init(), spapr_phb_realize() pass NULL, i.e. they ignore errors. * spapr_machine_init() passes &error_fatal. * s390_pcihost_realize(), virtio_serial_device_realize(), s390_pcihost_plug() pass the error to their callers. The latter two keep going after the error, which looks wrong. Drop the @errp parameter, and instead pass &error_abort to object_property_set_link(). Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Daniel P. Berrangé" <berrange@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20200630090351.1247703-15-armbru@redhat.com>	2020-07-02 06:25:29 +02:00
Markus Armbruster	981c3dcd94	qdev: Convert to qdev_unrealize() with Coccinelle For readability, and consistency with qbus_realize(). Coccinelle script: @ depends on !(file in "hw/core/qdev.c")@ typedef DeviceState; DeviceState *dev; symbol false, error_abort; @@ - object_property_set_bool(OBJECT(dev), false, "realized", &error_abort); + qdev_unrealize(dev); @ depends on !(file in "hw/core/qdev.c") && !(file in "hw/core/bus.c")@ expression dev; symbol false, error_abort; @@ - object_property_set_bool(OBJECT(dev), false, "realized", &error_abort); + qdev_unrealize(DEVICE(dev)); Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200610053247.1583243-8-armbru@redhat.com>	2020-06-15 21:36:30 +02:00
Markus Armbruster	7ef1553dac	spapr_pci: Drop some dead error handling chassis_from_bus() uses object_property_get_uint() to get property "chassis_nr" of the bridge device. Failure would be a programming error. Pass &error_abort, and simplify its callers. Cc: David Gibson <david@gibson.dropbear.id.au> Cc: qemu-ppc@nongnu.org Signed-off-by: Markus Armbruster <armbru@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200505152926.18877-18-armbru@redhat.com>	2020-05-15 07:08:14 +02:00
Markus Armbruster	b69c3c21a5	qdev: Unrealize must not fail Devices may have component devices and buses. Device realization may fail. Realization is recursive: a device's realize() method realizes its components, and device_set_realized() realizes its buses (which should in turn realize the devices on that bus, except bus_set_realized() doesn't implement that, yet). When realization of a component or bus fails, we need to roll back: unrealize everything we realized so far. If any of these unrealizes failed, the device would be left in an inconsistent state. Must not happen. device_set_realized() lets it happen: it ignores errors in the roll back code starting at label child_realize_fail. Since realization is recursive, unrealization must be recursive, too. But how could a partly failed unrealize be rolled back? We'd have to re-realize, which can fail. This design is fundamentally broken. device_set_realized() does not roll back at all. Instead, it keeps unrealizing, ignoring further errors. It can screw up even for a device with no buses: if the lone dc->unrealize() fails, it still unregisters vmstate, and calls listeners' unrealize() callback. bus_set_realized() does not roll back either. Instead, it stops unrealizing. Fortunately, no unrealize method can fail, as we'll see below. To fix the design error, drop parameter @errp from all the unrealize methods. Any unrealize method that uses @errp now needs an update. This leads us to unrealize() methods that can fail. Merely passing it to another unrealize method cannot cause failure, though. Here are the ones that do other things with @errp: * virtio_serial_device_unrealize() Fails when qbus_set_hotplug_handler() fails, but still does all the other work. On failure, the device would stay realized with its resources completely gone. Oops. Can't happen, because qbus_set_hotplug_handler() can't actually fail here. Pass &error_abort to qbus_set_hotplug_handler() instead. * hw/ppc/spapr_drc.c's unrealize() Fails when object_property_del() fails, but all the other work is already done. On failure, the device would stay realized with its vmstate registration gone. Oops. Can't happen, because object_property_del() can't actually fail here. Pass &error_abort to object_property_del() instead. * spapr_phb_unrealize() Fails and bails out when remove_drcs() fails, but other work is already done. On failure, the device would stay realized with some of its resources gone. Oops. remove_drcs() fails only when chassis_from_bus()'s object_property_get_uint() fails, and it can't here. Pass &error_abort to remove_drcs() instead. Therefore, no unrealize method can fail before this patch. device_set_realized()'s recursive unrealization via bus uses object_property_set_bool(). Can't drop @errp there, so pass &error_abort. We similarly unrealize with object_property_set_bool() elsewhere, always ignoring errors. Pass &error_abort instead. Several unrealize methods no longer handle errors from other unrealize methods: virtio_9p_device_unrealize(), virtio_input_device_unrealize(), scsi_qdev_unrealize(), ... Much of the deleted error handling looks wrong anyway. One unrealize methods no longer ignore such errors: usb_ehci_pci_exit(). Several realize methods no longer ignore errors when rolling back: v9fs_device_realize_common(), pci_qdev_unrealize(), spapr_phb_realize(), usb_qdev_realize(), vfio_ccw_realize(), virtio_device_realize(). Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200505152926.18877-17-armbru@redhat.com>	2020-05-15 07:08:14 +02:00
David Gibson	05af7c77f5	spapr: Don't allow unplug of NVLink2 devices Currently, we can't properly handle unplug of NVLink2 devices, because we don't have code to tear down their special memory resources. There's not a lot of impetus to implement that: since hardware NVLink2 devices can't be hot unplugged, the guest side drivers don't usually support unplug anyway. Therefore, simply prevent unplug of NVLink2 devices. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>	2020-05-07 11:10:50 +10:00
David Gibson	7aab589976	spapr: Fix failure path for attempting to hot unplug PCI bridges For various technical reasons we can't currently allow unplug a PCI to PCI bridge on the pseries machine. spapr_pci_unplug_request() correctly generates an error message if that's attempted. But.. if the given errp is not error_abort or error_fatal, it doesn't actually stop trying to unplug the bridge anyway. Fixes: `14e714900f` "spapr: Allow hot plug/unplug of PCI bridges and devices under PCI bridges" Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org>	2020-04-07 08:55:11 +10:00
Damien Hedde	f703a04ce5	add device_legacy_reset function to prepare for reset api change Provide a temporary device_legacy_reset function doing what device_reset does to prepare for the transition with Resettable API. All occurrence of device_reset in the code tree are also replaced by device_legacy_reset. The new resettable API has different prototype and semantics (resetting child buses as well as the specified device). Subsequent commits will make the changeover for each call site individually; once that is complete device_legacy_reset() will be removed. Signed-off-by: Damien Hedde <damien.hedde@greensocs.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Cornelia Huck <cohuck@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20200123132823.1117486-2-damien.hedde@greensocs.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-01-30 16:02:03 +00:00
Marc-André Lureau	4f67d30b5e	qdev: set properties with device_class_set_props() The following patch will need to handle properties registration during class_init time. Let's use a device_class_set_props() setter. spatch --macro-file scripts/cocci-macro-file.h --sp-file ./scripts/coccinelle/qdev-set-props.cocci --keep-comments --in-place --dir . @@ typedef DeviceClass; DeviceClass *d; expression val; @@ - d->props = val + device_class_set_props(d, val) Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20200110153039.1379601-20-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-01-24 20:59:15 +01:00
Markus Armbruster	8ca63ba8c2	error: Clean up unusual names of Error * variables Local Error * variables are conventionally named @err or @local_err, and Error ** parameters @errp. Naming local variables like parameters is confusing. Clean that up. Naming parameters like local variables is also confusing. Left for another day. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20191204093625.14836-17-armbru@redhat.com>	2019-12-18 08:36:15 +01:00
Wei Yang	038adc2f58	core: replace getpagesize() with qemu_real_host_page_size There are three page size in qemu: real host page size host page size target page size All of them have dedicate variable to represent. For the last two, we use the same form in the whole qemu project, while for the first one we use two forms: qemu_real_host_page_size and getpagesize(). qemu_real_host_page_size is defined to be a replacement of getpagesize(), so let it serve the role. [Note] Not fully tested for some arch or device. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20191013021145.16011-3-richardw.yang@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-10-26 15:38:06 +02:00
David Gibson	8cbe71ecb8	spapr: Remove SpaprIrq::nr_msis The nr_msis value we use here has to line up with whether we're using legacy or modern irq allocation. Therefore it's safer to derive it based on legacy_irq_allocation rather than having SpaprIrq contain a canned value. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org>	2019-10-24 09:36:55 +11:00
David Gibson	258aa5ce1c	spapr: Fold spapr_phb_lsi_qirq() into its single caller No point having a two-line helper that's used exactly once, and not likely to be used anywhere else in future. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>	2019-10-04 19:08:22 +10:00
Alexey Kardashevskiy	c4ec08ab70	spapr-pci: Stop providing assigned-addresses QEMU does not allocate PCI resources (BARs) in any case - coldplug devices are configured by the firmware and hotplug devices rely on the guest system to do the assignment via the PCI rescan mechanism. Also in order to create non empty "assigned-addresses", the device has to be enabled (i.e. PCI_COMMAND needs the MMIO bit set) first as otherwise io_regions[i].addr are -1, and devices are not enabled at this point. This removes "assigned-addresses" and leaves it to those who actually do resource allocation. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20190927022651.71642-1-aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-10-04 19:08:21 +10:00
Peter Maydell	f884294bd7	Machine + x86 queue, 2019-09-03 Bug fixes: * Fix die-id validation regression (Eduardo Habkost) * vmmouse: Properly reset state (Jan Kiszka) * hostmem-file: fix pmem file size check (Stefan Hajnoczi) * Keep query-hotpluggable-cpus output compatible with older QEMU if '-smp dies' is not set (Igor Mammedov) * migration: Do not re-read the clock on pre_save in case of paused guest (Maxiwell S. Garcia) Cleanups: * NUMA code cleanups (Tao Xu) * Remove stale externs from includes (Alex Bennée) Features: * qapi: report the default CPU type for each machine (Daniel P. Berrangé) -----BEGIN PGP SIGNATURE----- iQJIBAABCAAyFiEEWjIv1avE09usz9GqKAeTb5hNxaYFAl1u08EUHGVoYWJrb3N0 QHJlZGhhdC5jb20ACgkQKAeTb5hNxaaKGQ//WQY+JQgXj2M7i5bAuz1lkR0QKJvh n++70ugqNmmlj1YH7LKmZNll0tz+auo25PLgEBOamPZPFQXxkRhPBxTUnOdQJ1UC bSwyRzHrFluVITXD/nGkIXgmP4rjXil5QBWTxneWb7zYsXDGBEnauZnC1YsXzc9T 5LISvc5zEz6pEzz5s3LdUJ947jTui/dDHVHupeyK/5bPkiPoKVoymsd4p8rvAmFw 4obMftjuFzklm8oLPKpHYAm7VvXj5yb92/FE/ZKdaahcLPGStWixiHJ7xJlGMBti GqcWca+2sdbsraOz4Pg05x//vbOgiwIECqgKJRlJSAnG7Roz7E6J/xXQIYIkhpkL Sn0+s181WtFeNFlQgEP056iTUCq81oBjek2XzgsXzuQyFip5IJGLLQox4E+w0ty6 7houoCkJD70ddl3sEj/koXi6rBeswNStfuxVYxUgwYa7HecehNvVD5q9NlElRhev Lce4szuWJzHBbhW5ubGmN6rCbXNa+mPrBunrDwbjApl12DFkr163dj9DsyN/DUgy MmfsgqpKZ+g18VSajck2QtvTg+9Oqv0bv3SWtpDwzDxS9VULz0r2wfcN9TZDipV0 qCZWg39BpCIgdd4s5L0q6bamC9+eSwoByFx54WrkoQT81odHJqUHNsCE9wnoNvmG aZlV3idjGmsTFiE= =u5HZ -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/ehabkost/tags/machine-next-pull-request' into staging Machine + x86 queue, 2019-09-03 Bug fixes: * Fix die-id validation regression (Eduardo Habkost) * vmmouse: Properly reset state (Jan Kiszka) * hostmem-file: fix pmem file size check (Stefan Hajnoczi) * Keep query-hotpluggable-cpus output compatible with older QEMU if '-smp dies' is not set (Igor Mammedov) * migration: Do not re-read the clock on pre_save in case of paused guest (Maxiwell S. Garcia) Cleanups: * NUMA code cleanups (Tao Xu) * Remove stale externs from includes (Alex Bennée) Features: * qapi: report the default CPU type for each machine (Daniel P. Berrangé) # gpg: Signature made Tue 03 Sep 2019 21:57:37 BST # gpg: using RSA key 5A322FD5ABC4D3DBACCFD1AA2807936F984DC5A6 # gpg: issuer "ehabkost@redhat.com" # gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>" [full] # Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF D1AA 2807 936F 984D C5A6 * remotes/ehabkost/tags/machine-next-pull-request: migration: Do not re-read the clock on pre_save in case of paused guest x86: do not advertise die-id in query-hotpluggbale-cpus if '-smp dies' is not set i386/vmmouse: Properly reset state hostmem-file: fix pmem file size check qapi: report the default CPU type for each machine pc: Don't make die-id mandatory unless necessary pc: Improve error message when die-id is omitted pc: Fix error message on die-id validation numa: move numa global variable numa_info into MachineState numa: move numa global variable have_numa_distance into MachineState numa: move numa global variable nb_numa_nodes into MachineState hw/arm: simplify arm_load_dtb includes: remove stale [smp\|max]_cpus externs Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2019-09-04 14:44:54 +01:00
Tao Xu	7e721e7b10	numa: move numa global variable numa_info into MachineState Move existing numa global numa_info (renamed as "nodes") into NumaState. Reviewed-by: Igor Mammedov <imammedo@redhat.com> Suggested-by: Igor Mammedov <imammedo@redhat.com> Suggested-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Tao Xu <tao3.xu@intel.com> Message-Id: <20190809065731.9097-5-tao3.xu@intel.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2019-09-03 11:26:55 -03:00
Greg Kurz	572ebd08b3	spapr/pci: Convert types to QEMU coding style The QEMU coding style requires: - to typedef structured types (HACKING) - to use CamelCase for types and structure names (CODING_STYLE) Do that for PCI and Nvlink2 code. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <156701644465.505236.2850655823182656869.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-08-29 09:46:07 +10:00
Daniel Henrique Barboza	02a1536eee	spapr_pci: remove all child functions in function zero unplug There is nothing wrong with how sPAPR handles multifunction PCI hot unplugs. The problem is that x86 does it simpler. Instead of removing each non-zero function and then removing function zero, x86 can remove any function of the slot to trigger the hot unplug. Libvirt will be directly impacted by this difference, in the (hopefully soon) PCI Multifunction hot plug/unplug support. For hot plugs, both x86 and sPAPR will operate the same way: a XML with all desired functions to be added, then consecutive hotplugs of all non-zero functions first, zero last. For hot unplugs, at least in the current state, a XML with the devices to be removed must also be provided because of how sPAPR operates - x86 does not need it - since any function unplug will unplug the whole PCIe slot. This difference puts extra strain in the management layer, which needs to either handle both archs differently in the unplug scenario or choose treat x86 like sPAPR, forcing x86 users to cope with sPAPR internals. This patch changes spapr_pci_unplug_request to handle the unplug of function zero differently. When removing function zero, instead of error-ing out if there are any remaining function DRCs which needs detaching, detach those. This has no effect in any existing scripts that are detaching the non-zero functions before function zero, and can be used by management as a shortcut to remove the whole PCI multifunction device without specifying each child function. Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20190822195918.3307-1-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-08-29 09:46:07 +10:00
Greg Kurz	ea52074d3a	spapr/pci: Free MSIs during reset When the machine is reset, the MSI bitmap is cleared but the allocated MSIs are not freed. Some operating systems, such as AIX, can detect the previous configuration and assert. Empty the MSI cache, this performs the needed cleanup. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <156415228410.1064338.4486161194061636096.stgit@bahia.lan> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-08-21 17:17:39 +10:00
Greg Kurz	078eb6b05b	spapr/pci: Consolidate de-allocation of MSIs When freeing MSIs, we need to: - remove them from the machine's MSI bitmap - remove them from the IC backend - remove them from the PHB's MSI cache This is currently open coded in two places in rtas_ibm_change_msi(), and we're about to need this in spapr_phb_reset() as well. Instead of duplicating this code again, make it a destroy function for the PHB's MSI cache. Removing an MSI device from the cache will call the destroy function internally. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <156415227855.1064338.5657793835271464648.stgit@bahia.lan> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-08-21 17:17:39 +10:00
David Gibson	d15d4ad64f	spapr_pci: Allow 2MiB and 16MiB IOMMU pagesizes by default We've had the qemu and kernel KVM infrastructure to handle larger TCE page sizes for a while, but forgot to update the defaults to actually allow them. This turns that change on. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-08-21 17:16:22 +10:00
Markus Armbruster	a27bd6c779	Include hw/qdev-properties.h less In my "build everything" tree, changing hw/qdev-properties.h triggers a recompile of some 2700 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h). Many places including hw/qdev-properties.h (directly or via hw/qdev.h) actually need only hw/qdev-core.h. Include hw/qdev-core.h there instead. hw/qdev.h is actually pointless: all it does is include hw/qdev-core.h and hw/qdev-properties.h, which in turn includes hw/qdev-core.h. Replace the remaining uses of hw/qdev.h by hw/qdev-properties.h. While there, delete a few superfluous inclusions of hw/qdev-core.h. Touching hw/qdev-properties.h now recompiles some 1200 objects. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Daniel P. Berrangé" <berrange@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20190812052359.30071-22-armbru@redhat.com>	2019-08-16 13:31:53 +02:00
Markus Armbruster	650d103d3e	Include hw/hw.h exactly where needed In my "build everything" tree, changing hw/hw.h triggers a recompile of some 2600 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h). The previous commits have left only the declaration of hw_error() in hw/hw.h. This permits dropping most of its inclusions. Touching it now recompiles less than 200 objects. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-Id: <20190812052359.30071-19-armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>	2019-08-16 13:31:52 +02:00
Markus Armbruster	d645427057	Include migration/vmstate.h less In my "build everything" tree, changing migration/vmstate.h triggers a recompile of some 2700 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h). hw/hw.h supposedly includes it for convenience. Several other headers include it just to get VMStateDescription. The previous commit made that unnecessary. Include migration/vmstate.h only where it's still needed. Touching it now recompiles only some 1600 objects. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-Id: <20190812052359.30071-16-armbru@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>	2019-08-16 13:31:52 +02:00
Markus Armbruster	64552b6be4	Include hw/irq.h a lot less In my "build everything" tree, changing hw/irq.h triggers a recompile of some 5400 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h). hw/hw.h supposedly includes it for convenience. Several other headers include it just to get qemu_irq and.or qemu_irq_handler. Move the qemu_irq and qemu_irq_handler typedefs from hw/irq.h to qemu/typedefs.h, and then include hw/irq.h only where it's still needed. Touching it now recompiles only some 500 objects. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20190812052359.30071-13-armbru@redhat.com>	2019-08-16 13:31:52 +02:00
Greg Kurz	a2166410ad	spapr_pci: Unregister listeners before destroying the IOMMU address space Hot-unplugging a PHB with a VFIO device connected to it crashes QEMU: -device spapr-pci-host-bridge,index=1,id=phb1 \ -device vfio-pci,host=0034:01:00.3,id=vfio0 (qemu) device_del phb1 [ 357.207183] iommu: Removing device 0001:00:00.0 from group 1 [ 360.375523] rpadlpar_io: slot PHB 1 removed qemu-system-ppc64: memory.c:2742: do_address_space_destroy: Assertion `QTAILQ_EMPTY(&as->listeners)' failed. 'as' is the IOMMU address space, which indeed has a listener registered to by vfio_connect_container() when the VFIO device is realized. This listener is supposed to be unregistered by vfio_disconnect_container() when the VFIO device is finalized. Unfortunately, the VFIO device hasn't reached finalize yet at the time the PHB unrealize function is called, and address_space_destroy() gets called with the VFIO listener still being registered. All regions have just been unmapped from the address space. Listeners aren't needed anymore at this point. Remove them before destroying the address space. The VFIO code will try to remove them _again_ at device finalize, but it is okay since memory_listener_unregister() is idempotent. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <156110925375.92514.11649846071216864570.stgit@bahia.lan> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> [dwg: Correct spelling error pointed out by aik] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-07-02 09:43:58 +10:00
Greg Kurz	8d08fa93bb	spapr_pci: Drop useless CONFIG_KVM ifdefery kvm_enabled() expands to (0) when CONFIG_KVM is not defined. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <156051052977.224162.17306829691809502082.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-07-02 09:43:58 +10:00
Greg Kurz	7e10b57dd9	spapr_pci: Fix DRC owner in spapr_dt_pci_bus() spapr_dt_drc() scans the aliases of all DRConnector objects and filters the ones that it will use to generate OF properties according to their owner and type. Passing bus->parent_dev _works_ if bus belongs to a PCI bridge, but it is NULL if it is the PHB's root bus. This causes all allocated PCI DRCs to be associated to all PHBs (visible in their "ibm,drc-types" properties). As a consequence, hot unplugging a PHB results in PCI devices from the other PHBs to be unplugged as well, and likely confuses the guest. Use the same logic as in add_drcs() to ensure the correct owner is passed to spapr_dt_drc(). Fixes: `14e714900f` "spapr: Allow hot plug/unplug of PCI bridges and devices under PCI bridges" Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <156084737348.512412.3552825999605902691.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-07-02 09:43:58 +10:00
Philippe Mathieu-Daudé	740a19313b	spapr_pci: Fix potential NULL pointer dereference in spapr_dt_pci_bus() Commit `14e714900f` refactored the call to spapr_dt_drc(), introducing a potential NULL pointer dereference while accessing bus->parent_dev. A trivial audit show 'bus' is not null in the two places the static function spapr_dt_drc() is called. Since the 'bus' parameter is not NULL in both callers, remove remove the test on if (bus), and add an assert() to silent static analyzers. This fixes: /hw/ppc/spapr_pci.c: 1367 in spapr_dt_pci_bus() >>> CID 1401933: Null pointer dereferences (FORWARD_NULL) >>> Dereferencing null pointer "bus". 1367 ret = spapr_dt_drc(fdt, offset, OBJECT(bus->parent_dev), 1368 SPAPR_DR_CONNECTOR_TYPE_PCI); Fixes: `14e714900f` Reported-by: Coverity (CID 1401933) Suggested-by: Greg Kurz <groug@kaod.org> Suggested-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20190613213406.22053-1-philmd@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-07-02 09:43:58 +10:00
Peter Maydell	a050901d4b	ppc patch queue 2019-06-12 Next pull request against qemu-4.1. The big thing here is adding support for hot plug of P2P bridges, and PCI devices under P2P bridges on the "pseries" machine (which doesn't use SHPC). Other than that there's just a handful of fixes and small enhancements. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAl0AkgwACgkQbDjKyiDZ s5Jyug//cwxP+t1t2CNHtffKwiXFzuEKx9YSNE1V0wog6aB40EbPKU72FzCq6FfA lev+pZWV9AwVMzFYe4VM/7Lqh7WFMYDT3DOXaZwfANs4471vYtgvPi21L2TBj80d hMszlyLWMLY9ByOzCxIq3xnbivGpA94G2q9rKbwXdK4T/5i62Pe3SIfgG+gXiiwW +YlHWCPX0I1cJz2bBs9ElXdl7ONWnn+7uDf7gNfWkTKuiUq6Ps7mxzy3GhJ1T7nz OFKmQ5dKzLJsgOULSSun8kWpXBmnPffkM3+fCE07edrWZVor09fMCk4HvtfaRy2K FFa2Kvzn/V/70TL+44dsSX4QcwdcHQztiaMO7UGPq9CMswx5L7gsNmfX6zvK1Nrb 1t7ORZKNJ72hMyvDPSMiGU2DpVjO3ZbBlSL4/xG8Qeal4An0kgkN5NcFlB/XEfnz dsKu9XzuGSeD1bWz1Mgcf1x7lPDBoHIKLcX6notZ8epP/otu4ywNFvAkPu4fk8s0 4jQGajIT7328SmzpjXClsmiEskpKsEr7hQjPRhu0hFGrhVc+i9PjkmbDl0TYRAf6 N6k6gJQAi+StJde2rcua1iS7Ra+Tka6QRKy+EctLqfqOKPb2VmkZ6fswQ3nfRRlT LgcTHt2iJcLeud2klVXs1e4pKXzXchkVyFL4ucvmyYG5VeimMzU= =ERgu -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-4.1-20190612' into staging ppc patch queue 2019-06-12 Next pull request against qemu-4.1. The big thing here is adding support for hot plug of P2P bridges, and PCI devices under P2P bridges on the "pseries" machine (which doesn't use SHPC). Other than that there's just a handful of fixes and small enhancements. # gpg: Signature made Wed 12 Jun 2019 06:47:56 BST # gpg: using RSA key 75F46586AE61A66CC44E87DC6C38CACA20D9B392 # gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" [full] # gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" [full] # gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" [full] # gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" [unknown] # Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392 * remotes/dgibson/tags/ppc-for-4.1-20190612: ppc/xive: Make XIVE generate the proper interrupt types ppc/pnv: activate the "dumpdtb" option on the powernv machine target/ppc: Use tcg_gen_gvec_bitsel spapr: Allow hot plug/unplug of PCI bridges and devices under PCI bridges spapr: Direct all PCI hotplug to host bridge, rather than P2P bridge spapr: Don't use bus number for building DRC ids spapr: Clean up DRC index construction spapr: Clean up spapr_drc_populate_dt() spapr: Clean up dt creation for PCI buses spapr: Clean up device tree construction for PCI devices spapr: Clean up device node name generation for PCI devices target/ppc: Fix lxvw4x, lxvh8x and lxvb16x spapr_pci: Improve error message Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2019-06-12 14:43:47 +01:00
Markus Armbruster	0b8fa32f55	Include qemu/module.h where needed, drop it from qemu-common.h Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190523143508.25387-4-armbru@redhat.com> [Rebased with conflicts resolved automatically, except for hw/usb/dev-hub.c hw/misc/exynos4210_rng.c hw/misc/bcm2835_rng.c hw/misc/aspeed_scu.c hw/display/virtio-vga.c hw/arm/stm32f205_soc.c; ui/cocoa.m fixed up]	2019-06-12 13:18:33 +02:00
David Gibson	14e714900f	spapr: Allow hot plug/unplug of PCI bridges and devices under PCI bridges The pseries machine type already allows PCI hotplug and unplug via the PAPR mechanism, but only on the root bus of each PHB. This patch extends this to allow PCI to PCI bridges to be hotplugged, and devices to be hotplugged or unplugged under P2P bridges. For now we disallow hot unplugging P2P bridges. I tried doing that, but haven't managed to get it working, I think due to some guest side problems that need further investigation. To do this we dynamically construct DRCs when bridges are hot (or cold) added, which can in turn be used to hotplug devices under the bridge. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Michael S. Tsirkin <mst@redhat.com>	2019-06-12 10:41:49 +10:00
David Gibson	05929a6c5d	spapr: Don't use bus number for building DRC ids DRC ids are more or less arbitrary, as long as they're consistent. For PCI, we notionally build them from the phb's index along with PCI bus number, slot and function number. Using bus number is broken, however, because it can change if the guest re-enumerates the PCI topology for whatever reason (e.g. due to hotplug of a bridge, which we don't support yet but want to). Fortunately, there's an alternative. Bridges are required to have a unique non-zero "chassis number" that we can use instead. Adjust the code to use that instead. This looks like it would introduce a guest visible breaking change, but in fact it does not because we don't yet ever use non-zero bus numbers. Both chassis and bus number are always 0 for the root bus, so there's no change for the existing cases. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Michael S. Tsirkin <mst@redhat.com>	2019-06-12 10:41:49 +10:00
David Gibson	a1ec25b287	spapr: Clean up DRC index construction spapr_pci.c currently has several confusingly similarly named functions for various conversions between representations of DRCs. Make things clearer by renaming things in a more consistent XXX_from_YYY() manner and remove some called-only-once variants in favour of open coding. While we're at it, move this code together in the file to avoid some extra forward references, and split out construction and removal of DRCs for the host bridge into helper functions. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Michael S. Tsirkin <mst@redhat.com>	2019-06-12 10:41:49 +10:00
David Gibson	9e7d38e8a3	spapr: Clean up spapr_drc_populate_dt() This makes some minor cleanups to spapr_drc_populate_dt(), renaming it to the shorter and more idiomatic spapr_dt_drc() along the way. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Acked-by: Michael S. Tsirkin <mst@redhat.com>	2019-06-12 10:41:49 +10:00
David Gibson	466e883185	spapr: Clean up dt creation for PCI buses Device nodes for PCI bridges (both host and P2P) describe both the bridge device itself and the bus hanging off it, handling of this is a bit of a mess. spapr_dt_pci_device() has a few things it only adds for non-bridges, but always adds #address-cells and #size-cells which should only appear for bridges. But the walking down the subordinate PCI bus is done in one of its callers spapr_populate_pci_devices_dt(). The PHB dt creation in spapr_populate_pci_dt() open codes some similar logic to the bridge case. This patch consolidates things in a bunch of ways: * Bus specific dt info is now created in spapr_dt_pci_bus() used for both P2P bridges and the host bridge. This includes walking subordinate devices * spapr_dt_pci_device() now calls spapr_dt_pci_bus() when called on a P2P bridge * We do detection of bridges with the is_bridge field of the device class, rather than checking PCI config space directly, for consistency with qemu's core PCI code. * Several things are renamed for brevity and clarity Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Michael S. Tsirkin <mst@redhat.com>	2019-06-12 10:41:49 +10:00
David Gibson	9d2134d81d	spapr: Clean up device tree construction for PCI devices spapr_create_pci_child_dt() is a trivial wrapper around spapr_populate_pci_child_dt(), but is the latter's only caller. So fold them together into spapr_dt_pci_device(), which closer matches our modern naming convention. While there, make a number of cleanups to the function itself. This is mostly using more temporary locals to avoid awkwardly long lines, and in some cases avoiding double reads of PCI config space variables. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Acked-by: Michael S. Tsirkin <mst@redhat.com>	2019-06-12 10:41:49 +10:00
David Gibson	4782a8bb81	spapr: Clean up device node name generation for PCI devices spapr_populate_pci_child_dt() adds a 'name' property to the device tree node for PCI devices. This is never necessary for a flattened device tree, it is implicit in the name added when the node is constructed. In fact anything we do add to a 'name' property will be overwritten with something derived from the structural name in the guest firmware (but in fact it is exactly the same bytes). So, remove that. In addition, pci_get_node_name() is very simple, so fold it into its (also simple) sole caller spapr_create_pci_child_dt(). While we're there rename pci_find_device_name() to the shorter and more accurate dt_name_from_class(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Acked-by: Michael S. Tsirkin <mst@redhat.com>	2019-06-12 10:41:49 +10:00
Greg Kurz	7028293017	spapr_pci: Improve error message Every PHB must have a unique index. This is checked at realize but when a duplicate index is detected, an error message mentioning BUIDs is printed. This doesn't help much, especially since BUID is an internal concept that is no longer exposed to the user. Fix the message to mention the index property instead of BUID. As a bonus print a list of indexes already in use. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155915010892.2061314.10485622810149098411.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-06-12 10:41:49 +10:00
David Gibson	2f57db8a27	pcie: Simplify pci_adjust_config_limit() Since `c2077e2c` "pci: Adjust PCI config limit based on bus topology", pci_adjust_config_limit() has been used in the config space read and write paths to only permit access to extended config space on buses which permit it. Specifically it prevents access on devices below a vanilla-PCI bus via some combination of bridges, even if both the host bridge and the device itself are PCI-E. It accomplishes this with a somewhat complex call up the chain of bridges to see if any of them prohibit extended config space access. This is overly complex, since we can always know if the bus will support such access at the point it is constructed. This patch simplifies the test by using a flag in the PCIBus instance indicating whether extended configuration space is accessible. It is false for vanilla PCI buses. For PCI-E buses, it is true for root buses and equal to the parent bus's's capability otherwise. For the special case of sPAPR's paravirtualized PCI root bus, which acts mostly like vanilla PCI, but does allow extended config space access, we override the default value of the flag from the host bridge code. This should cause no behavioural change. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <20190513061939.3464-4-david@gibson.dropbear.id.au> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2019-05-29 18:00:57 -04:00
Greg Kurz	e8ec4adfe2	spapr: Drop duplicate PCI swizzle code LSI mapping in spapr currently open-codes standard PCI swizzling. It thus duplicates the code of pci_swizzle_map_irq_fn(). Expose the swizzling formula so that it can be used with a slot number when building the device tree. Simply drop pci_spapr_map_irq() and call pci_swizzle_map_irq_fn() instead. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155448184841.8446.13959787238854054119.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-04-26 11:37:57 +10:00
Greg Kurz	c413605ba6	spapr_pci: Get rid of duplicate code for node name creation According to the changelog of `298a971024`, SpaprPhbState::dtbusname was introduced to "make it easier to relate the guest and qemu views of memory to each other", hence its name. Use it when creating the PHB node to avoid code duplication. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155448184292.8446.8225650773162648595.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-04-26 11:37:57 +10:00
Alexey Kardashevskiy	ec132efaa8	spapr: Support NVIDIA V100 GPU with NVLink2 NVIDIA V100 GPUs have on-board RAM which is mapped into the host memory space and accessible as normal RAM via an NVLink bus. The VFIO-PCI driver implements special regions for such GPUs and emulates an NVLink bridge. NVLink2-enabled POWER9 CPUs also provide address translation services which includes an ATS shootdown (ATSD) register exported via the NVLink bridge device. This adds a quirk to VFIO to map the GPU memory and create an MR; the new MR is stored in a PCI device as a QOM link. The sPAPR PCI uses this to get the MR and map it to the system address space. Another quirk does the same for ATSD. This adds additional steps to sPAPR PHB setup: 1. Search for specific GPUs and NPUs, collect findings in sPAPRPHBState::nvgpus, manage system address space mappings; 2. Add device-specific properties such as "ibm,npu", "ibm,gpu", "memory-block", "link-speed" to advertise the NVLink2 function to the guest; 3. Add "mmio-atsd" to vPHB to advertise the ATSD capability; 4. Add new memory blocks (with extra "linux,memory-usable" to prevent the guest OS from accessing the new memory until it is onlined) and npuphb# nodes representing an NPU unit for every vPHB as the GPU driver uses it for link discovery. This allocates space for GPU RAM and ATSD like we do for MMIOs by adding 2 new parameters to the phb_placement() hook. Older machine types set these to zero. This puts new memory nodes in a separate NUMA node to as the GPU RAM needs to be configured equally distant from any other node in the system. Unlike the host setup which assigns numa ids from 255 downwards, this adds new NUMA nodes after the user configures nodes or from 1 if none were configured. This adds requirement similar to EEH - one IOMMU group per vPHB. The reason for this is that ATSD registers belong to a physical NPU so they cannot invalidate translations on GPUs attached to another NPU. It is guaranteed by the host platform as it does not mix NVLink bridges or GPUs from different NPU in the same IOMMU group. If more than one IOMMU group is detected on a vPHB, this disables ATSD support for that vPHB and prints a warning. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [aw: for vfio portions] Acked-by: Alex Williamson <alex.williamson@redhat.com> Message-Id: <20190312082103.130561-1-aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-04-26 10:41:23 +10:00
Greg Kurz	4560116e42	spapr_pci: Fix broken naming of PCI bus Recent commit `5cf0d326a0` fixed a regression which was preventing the guest to access the extended config space of a PCIe device. This was done by introducing a new PCI bus subtype for PAPR. The original fix was causing PCI busses to be named "spapr-pci-host-bridge-root-bus.N" instead of "pci.N", which was making upper layers unhappy of course. This got worked around by hardcoding the PCI bus name to "pci.0", but this only works for the default PHB. And we're now hitting: # qemu-system-ppc64 \ -device spapr-pci-host-bridge,index=1 \ -device e1000e,bus=pci.0 \ -device e1000e,bus=pci.1 qemu-system-ppc64: -device e1000e,bus=pci.1: Bus 'pci.1' not found David already posted some patches [1] to control PCI extended config space accesses with a new flag in the base PCI bus class instead of subtyping. These patches are a bit more intrusive though, and are targetted for 4.1. When no name is passed to pci_register_bus(), the core device code generates a lowercase name based on the QOM typename. The typename for the base PCI bus class is "PCI", hence the "pci.0", "pci.1" bus names. Rename the type of the PAPR PCI bus to "pci", so that the QOM code can generate proper names. This is a hack but it is enough to fix the regression. And all this will be reworked properly in 4.1. [1] https://patchwork.ozlabs.org/project/qemu-devel/list/?series=100486 Fixes: `5cf0d326a0` Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155500034416.646888.1307366522340665522.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-04-12 12:23:02 +10:00
Greg Kurz	5cf0d326a0	spapr_pci: Fix extended config space accesses The PAPR PHB acts as a legacy PCI bus but it allows PCIe extended config space accesses anyway (for pseries-2.9 and newer machine types). Introduce a specific PCI bus subtype to inform the common PCI code about that. Fixes: `c2077e2ca0` Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155414130834.574858.16502276132110219890.stgit@bahia.lan> [dwg: Apply fix so we don't rename the default pci bus, breaking everything] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-04-09 15:03:10 +10:00

1 2 3 4 5

230 Commits