qemu-e2k

Commit Graph

Author	SHA1	Message	Date
Cédric Le Goater	a28b9a5a8d	spapr: move the interrupt presenters under machine_data Next step is to remove them from under the PowerPCCPU Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-02-04 18:44:18 +11:00
Greg Kurz	21df5e4ffa	spapr: Forbid setting ic-mode for old machine types Machine types 3.0 and older only know about the legacy XICS backend. Make it clear by erroring out if the user tries to set ic-mode on such machines. Signed-off-by: Greg Kurz <groug@kaod.org> Tested-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-02-04 18:44:18 +11:00
Thomas Huth	1ac24c91bb	hw/ppc/spapr: Encode the SCSI channel (bus) in the SRP LUNs In hw/scsi/spapr_vio.c we declare that the controller supports multiple buses by specifying "max_channel = 7" there. So in the code that fixes up the device tree nodes, we must encode the channel number (a.k.a. bus number in the "Logical unit addressing format" table of SAM5) into the 64-bit LUN, too. Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1663160 Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-02-04 18:44:18 +11:00
Eduardo Habkost	6c36bddf53	machine: Use shorter format for GlobalProperty arrays Instead of verbose arrays with 4 lines for each entry, make each entry take only one line. This makes long arrays that couldn't fit in the screen become short and readable. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20190107193020.21744-4-ehabkost@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2019-01-09 22:10:00 -02:00
Eduardo Habkost	e0985450e1	machine: Eliminate unnecessary stringify() usage stringify() is useful when we need to use macros in compat_props (like when we set virtio-baloon-pci.class=PCI_CLASS_MEMORY_RAM at pc_i440fx_1_0_machine_options()), but it is pointless when we are already providing a number literal. Replace stringify() with string literals when appropriate. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20190107193020.21744-3-ehabkost@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2019-01-09 22:10:00 -02:00
Eduardo Habkost	b3bcb3cedf	spapr: Eliminate SPAPR_PCI_2_7_MMIO_WIN_SIZE macro The macro is only used in one place, where the purpose of the value is obvious. Eliminate the macro so we don't need to rely on stringify(). Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20190107193020.21744-2-ehabkost@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2019-01-09 22:10:00 -02:00
Cédric Le Goater	13db0cd9b8	spapr: introduce a new sPAPR IRQ backend supporting XIVE and XICS The 'dual' sPAPR IRQ backend supports both interrupt mode, XIVE exploitation mode and the legacy compatibility mode (XICS). both modes are not supported at the same time. The machine starts with the legacy mode and a new interrupt mode can then be negotiated by the CAS process. In this case, the new mode is activated after a reset to take into account the required changes in the machine. These impact the device tree layout, the interrupt presenter object and the exposed MMIO regions in the case of XIVE. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-01-09 09:28:14 +11:00
Cédric Le Goater	e502202c9b	spapr: return from post_load method when RTC import fails The error value can be squashed by the section handling radix migration. Simply bail out if an error occurs when the RTC offset is imported. This fixes the Coverity issue CID 1398591. Fixes: `d39c90f5f3` ("spapr: Fix migration of Radix guests") Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-01-09 09:28:14 +11:00
Cédric Le Goater	3ff73aa241	ppc: replace the 'Object intc' by a 'ICPState icp' pointer under the CPU Now that the 'intc' pointer is only used by the XICS interrupt mode, let's make things clear and use a XICS type and name. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-01-09 09:28:14 +11:00
Greg Kurz	1da85c2ae6	spapr_pci: Define SPAPR_MAX_PHBS in hw/pci-host/spapr.h PHB hotplug will bring more users for it. Let's define it along with the PHB defines from which it is derived for simplicity. While here fix a misleading comment about manual placement, which was abandoned with `30b3bc5aa9`. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-01-09 09:28:14 +11:00
Greg Kurz	999c9caf2e	spapr: move spapr_create_phb() to core machine code This function is only used when creating the default PHB. Let's rename it and move it to the core machine code for clarity. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-01-09 09:28:14 +11:00
Alexey Kardashevskiy	fea35ca4b8	ppc/spapr: Receive and store device tree blob from SLOF SLOF receives a device tree and updates it with various properties before switching to the guest kernel and QEMU is not aware of any changes made by SLOF. Since there is no real RTAS (QEMU implements it), it makes sense to pass the SLOF final device tree to QEMU to let it implement RTAS related tasks better, such as PCI host bus adapter hotplug. Specifially, now QEMU can find out the actual XICS phandle (for PHB hotplug) and the RTAS linux,rtas-entry/base properties (for firmware assisted NMI - FWNMI). This stores the initial DT blob in the sPAPR machine and replaces it in the KVMPPC_H_UPDATE_DT (new private hypercall) handler. This adds an @update_dt_enabled machine property to allow backward migration. SLOF already has a hypercall since https://github.com/aik/SLOF/commit/e6fc84652c9c0073f9183 This makes use of the new fdt_check_full() helper. In order to allow the configure script to pick the correct DTC version, this adjusts the DTC presense test. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-01-09 09:28:13 +11:00
Laurent Vivier	c24ba3d0a3	spapr: Add H-Call H_HOME_NODE_ASSOCIATIVITY H_HOME_NODE_ASSOCIATIVITY H-Call returns the associativity domain designation associated with the identifier input parameter This fixes a crash when we try to hotplug a CPU in memory-less and CPU-less numa node. In this case, the kernel tries to online the node, but without the information provided by this h-call, the node id, it cannot and the CPU is started while the node is not onlined. It also removes the warning message from the kernel: VPHN is not supported. Disabling polling.. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-01-09 09:28:13 +11:00
Marc-André Lureau	a1c3c562e2	include: remove compat.h The header is now empty. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Acked-by: Eduardo Habkost <ehabkost@redhat.com>	2019-01-07 16:18:42 +04:00
Marc-André Lureau	c4fc5695b7	compat: replace PC_COMPAT_2_1 & HW_COMPAT_2_1 macros Use static arrays instead. I decided to rename the conflicting pc_compat_2_1() function with pc_compat_2_1_fn(). Suggested-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Acked-by: Eduardo Habkost <ehabkost@redhat.com>	2019-01-07 16:18:42 +04:00
Marc-André Lureau	1c30044e1a	compat: replace PC_COMPAT_2_2 & HW_COMPAT_2_2 macros Use static arrays instead. I decided to rename the conflicting pc_compat_2_2() function with pc_compat_2_2_fn(). Suggested-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Acked-by: Eduardo Habkost <ehabkost@redhat.com>	2019-01-07 16:18:42 +04:00
Marc-André Lureau	8995dd9009	compat: replace PC_COMPAT_2_3 & HW_COMPAT_2_3 macros Use static arrays instead. I decided to rename the conflicting pc_compat_2_3() function with pc_compat_2_3_fn(). Suggested-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Acked-by: Eduardo Habkost <ehabkost@redhat.com>	2019-01-07 16:18:42 +04:00
Marc-André Lureau	2f99b9c273	compat: replace PC_COMPAT_2_4 & HW_COMPAT_2_4 macros Use static arrays instead. Suggested-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Acked-by: Eduardo Habkost <ehabkost@redhat.com>	2019-01-07 16:18:42 +04:00
Marc-André Lureau	fe759610d5	compat: replace PC_COMPAT_2_5 & HW_COMPAT_2_5 macros Use static arrays instead. Suggested-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Acked-by: Eduardo Habkost <ehabkost@redhat.com>	2019-01-07 16:18:42 +04:00
Marc-André Lureau	ff8f261f11	compat: replace PC_COMPAT_2_6 & HW_COMPAT_2_6 macros Use static arrays instead. Suggested-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Acked-by: Eduardo Habkost <ehabkost@redhat.com>	2019-01-07 16:18:42 +04:00
Marc-André Lureau	5a995064db	compat: replace PC_COMPAT_2_7 & HW_COMPAT_2_7 macros Use static arrays instead. Suggested-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Acked-by: Eduardo Habkost <ehabkost@redhat.com>	2019-01-07 16:18:42 +04:00
Marc-André Lureau	edc24ccda4	compat: replace PC_COMPAT_2_8 & HW_COMPAT_2_8 macros Use static arrays instead. Suggested-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Acked-by: Eduardo Habkost <ehabkost@redhat.com>	2019-01-07 16:18:42 +04:00
Marc-André Lureau	3e8031525a	compat: replace PC_COMPAT_2_9 & HW_COMPAT_2_9 macros Use static arrays instead. Suggested-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Acked-by: Eduardo Habkost <ehabkost@redhat.com>	2019-01-07 16:18:42 +04:00
Marc-André Lureau	503224f4c8	compat: replace PC_COMPAT_2_10 & HW_COMPAT_2_10 macros Use static arrays instead. Suggested-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Acked-by: Eduardo Habkost <ehabkost@redhat.com>	2019-01-07 16:18:41 +04:00
Marc-André Lureau	43df70a9dd	compat: replace PC_COMPAT_2_11 & HW_COMPAT_2_11 macros Use static arrays instead. Suggested-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Acked-by: Eduardo Habkost <ehabkost@redhat.com>	2019-01-07 16:18:41 +04:00
Marc-André Lureau	0d47310b03	compat: replace PC_COMPAT_2_12 & HW_COMPAT_2_12 macros Use static arrays instead. Suggested-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Acked-by: Eduardo Habkost <ehabkost@redhat.com>	2019-01-07 16:18:41 +04:00
Marc-André Lureau	ddb3235de1	compat: replace PC_COMPAT_3_0 & HW_COMPAT_3_0 macros Use static arrays instead. Suggested-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Acked-by: Eduardo Habkost <ehabkost@redhat.com>	2019-01-07 16:18:41 +04:00
Marc-André Lureau	abd93cc7df	compat: replace PC_COMPAT_3_1 & HW_COMPAT_3_1 macros Use static arrays instead. Suggested-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Acked-by: Eduardo Habkost <ehabkost@redhat.com>	2019-01-07 16:18:41 +04:00
Marc-André Lureau	88cbe07374	machine: move compat properties out of globals Move the compat arrays inside functions that use them. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Acked-by: Eduardo Habkost <ehabkost@redhat.com>	2019-01-07 16:18:41 +04:00
Marc-André Lureau	b66bbee39f	hw: apply machine compat properties without touching globals Similarly to accel properties, move compat properties out of globals registration, and apply the machine compat properties during device_post_init(). Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Acked-by: Eduardo Habkost <ehabkost@redhat.com>	2019-01-07 16:18:41 +04:00
Marc-André Lureau	fa386d989d	machines: replace COMPAT define with a static array Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Acked-by: Eduardo Habkost <ehabkost@redhat.com>	2019-01-07 16:18:41 +04:00
Cédric Le Goater	34a6b015a9	spapr: change default CPU type to POWER9 Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-12-21 09:40:43 +11:00
Cédric Le Goater	3ba3d0bc33	spapr: introduce an 'ic-mode' machine option This option is used to select the interrupt controller mode (XICS or XIVE) with which the machine will operate. XICS being the default mode for now. When running a machine with the XIVE interrupt mode backend, the guest OS is required to have support for the XIVE exploitation mode. In the case of legacy OS, the mode selected by CAS should be XICS and the OS should fail to boot. However, QEMU could possibly detect it, terminate the boot process and reset to stop in the SLOF firmware. This is not yet handled. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-12-21 09:40:43 +11:00
Cédric Le Goater	db592b5b16	spapr: add an extra OV5 field to the sPAPR IRQ backend The interrupt modes supported by the hypervisor are advertised to the guest with new bits definitions of the option vector 5 of property "ibm,arch-vec-5-platform-support. The byte 23 bits 0-1 of the OV5 are defined as follow : 0b00 PAPR 2.7 and earlier (Legacy systems) 0b01 XIVE Exploitation mode only 0b10 Either available If the client/guest selects the XIVE interrupt mode, it informs the hypervisor by returning the value 0b01 in byte 23 bits 0-1. A 0b00 value indicates the use of the XICS interrupt mode (Legacy systems). The sPAPR IRQ backend is extended with these definitions and the values are directly used to populate the "ibm,arch-vec-5-platform-support" property. The interrupt mode is advertised under TCG and under KVM. Although a KVM XIVE device is not yet available, the machine can still operate with kernel_irqchip=off. However, we apply a restriction on the CPU which is required to be a POWER9 when a XIVE interrupt controller is in use. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-12-21 09:40:43 +11:00
Cédric Le Goater	b2e2247716	spapr: add a 'reset' method to the sPAPR IRQ backend For the time being, the XIVE reset handler updates the OS CAM line of the vCPU as it is done under a real hypervisor when a vCPU is scheduled to run on a HW thread. This will let the XIVE presenter engine find a match among the NVTs dispatched on the HW threads. This handler will become even more useful when we introduce the machine supporting both interrupt modes, XIVE and XICS. In this machine, the interrupt mode is chosen by the CAS negotiation process and activated after a reset. Signed-off-by: Cédric Le Goater <clg@kaod.org> [dwg: Fix style nits] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-12-21 09:40:35 +11:00
Cédric Le Goater	1c53b06c03	spapr: extend the sPAPR IRQ backend for XICS migration Introduce a new sPAPR IRQ handler to handle resend after migration when the machine is using a KVM XICS interrupt controller model. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-12-21 09:39:13 +11:00
Cédric Le Goater	6e21de4a50	spapr: add device tree support for the XIVE exploitation mode The XIVE interface for the guest is described in the device tree under the "interrupt-controller" node. A couple of new properties are specific to XIVE : - "reg" contains the base address and size of the thread interrupt managnement areas (TIMA), for the User level and for the Guest OS level. Only the Guest OS level is taken into account today. - "ibm,xive-eq-sizes" the size of the event queues. One cell per size supported, contains log2 of size, in ascending order. - "ibm,xive-lisn-ranges" the IRQ interrupt number ranges assigned to the guest for the IPIs. and also under the root node : - "ibm,plat-res-int-priorities" contains a list of priorities that the hypervisor has reserved for its own use. OPAL uses the priority 7 queue to automatically escalate interrupts for all other queues (DD2.X POWER9). So only priorities [0..6] are allowed for the guest. Extend the sPAPR IRQ backend with a new handler to populate the DT with the appropriate "interrupt-controller" node. Signed-off-by: Cédric Le Goater <clg@kaod.org> [dwg: Fix style nits] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-12-21 09:39:07 +11:00
Cédric Le Goater	1a518e7693	spapr: export and rename the xics_max_server_number() routine The XIVE sPAPR IRQ backend will use it to define the number of ENDs of the IC controller. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-12-21 09:29:10 +11:00
Cédric Le Goater	fab397d84a	spapr: introduce a spapr_irq_init() routine Initialize the MSI bitmap from it as this will be necessary for the sPAPR IRQ backend for XIVE. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-12-21 09:28:47 +11:00
Cédric Le Goater	482969d680	spapr: initialize VSMT before initializing the IRQ backend We will need to use xics_max_server_number() to create the sPAPRXive object modeling the interrupt controller of the machine which is created before the CPUs. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> [dwg: Fix style nit] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-12-21 09:28:39 +11:00
Greg Kurz	118abc71ed	spapr: drop redundant statement in spapr_populate_drconf_memory() Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <laurent@vivier.eu>	2018-12-21 09:24:23 +11:00
Serhii Popovych	3908a24fcb	spapr: Fix ibm,max-associativity-domains property number of nodes Laurent Vivier reported off by one with maximum number of NUMA nodes provided by qemu-kvm being less by one than required according to description of "ibm,max-associativity-domains" property in LoPAPR. It appears that I incorrectly treated LoPAPR description of this property assuming it provides last valid domain (NUMA node here) instead of maximum number of domains. ### Before hot-add (qemu) info numa 3 nodes node 0 cpus: 0 node 0 size: 0 MB node 0 plugged: 0 MB node 1 cpus: node 1 size: 1024 MB node 1 plugged: 0 MB node 2 cpus: node 2 size: 0 MB node 2 plugged: 0 MB $ numactl -H available: 2 nodes (0-1) node 0 cpus: 0 node 0 size: 0 MB node 0 free: 0 MB node 1 cpus: node 1 size: 999 MB node 1 free: 658 MB node distances: node 0 1 0: 10 40 1: 40 10 ### Hot-add (qemu) object_add memory-backend-ram,id=mem0,size=1G (qemu) device_add pc-dimm,id=dimm1,memdev=mem0,node=2 (qemu) [ 87.704898] pseries-hotplug-mem: Attempting to hot-add 4 ... <there is no "Initmem setup node 2 [mem 0xHEX-0xHEX]"> [ 87.705128] lpar: Attempting to resize HPT to shift 21 ... <HPT resize messages> ### After hot-add (qemu) info numa 3 nodes node 0 cpus: 0 node 0 size: 0 MB node 0 plugged: 0 MB node 1 cpus: node 1 size: 1024 MB node 1 plugged: 0 MB node 2 cpus: node 2 size: 1024 MB node 2 plugged: 1024 MB $ numactl -H available: 2 nodes (0-1) ^^^^^^^^^^^^^^^^^^^^^^^^ Still only two nodes (and memory hot-added to node 0 below) node 0 cpus: 0 node 0 size: 1024 MB node 0 free: 1021 MB node 1 cpus: node 1 size: 999 MB node 1 free: 658 MB node distances: node 0 1 0: 10 40 1: 40 10 After fix applied numactl(8) reports 3 nodes available and memory plugged into node 2 as expected. From David Gibson: ------------------ Qemu makes a distinction between "non NUMA" (nb_numa_nodes == 0) and "NUMA with one node" (nb_numa_nodes == 1). But from a PAPR guests's point of view these are equivalent. I don't want to present two different cases to the guest when we don't need to, so even though the guest can handle it, I'd prefer we put a '1' here for both the nb_numa_nodes == 0 and nb_numa_nodes == 1 case. This consolidates everything discussed previously on mailing list. Fixes: `da9f80fbad` ("spapr: Add ibm,max-associativity-domains property") Reported-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Serhii Popovych <spopovyc@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Laurent Vivier <lvivier@redhat.com>	2018-12-21 09:24:23 +11:00
Eduardo Habkost	3420340988	spapr: Delete instance_options functions Now that all instance_options functions for spapr are empty, delete them. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20181205205827.19387-5-ehabkost@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2018-12-11 15:45:22 -02:00
Eduardo Habkost	f6d0656bc1	spapr: Use default_machine_opts to set suppress_vmdesc Instead of setting suppress_vmdesc at instance_init time, set default_machine_opts on spapr_machine_2_2_class_options() to implement equivalent behavior. This will let us eliminate the need for separate instance_init functions for each spapr machine-type. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20181205205827.19387-4-ehabkost@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2018-12-11 15:45:22 -02:00
Eduardo Habkost	a140c199f4	spapr: Use default_machine_opts to set use_hotplug_event_source Instead of setting use_hotplug_event_source at instance_init time, set default_machine_opts on spapr_machine_2_7_class_options() to implement equivalent behavior. This will let us eliminate the need for separate instance_init functions for each spapr machine-type. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20181205205827.19387-3-ehabkost@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2018-12-11 15:45:22 -02:00
Alex Williamson	84e060bf90	q35/440fx/arm/spapr: Add QEMU 4.0 machine type Including all machine types that might have a pcie-root-port. Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Eduardo Habkost <ehabkost@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Message-Id: <154394083644.28192.8501647946108201466.stgit@gimli.home> Reviewed-by: Eric Auger <eric.auger@redhat.com> [ehabkost: fixed accidental recursion at spapr_machine_3_1_class_options()] Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2018-12-11 15:45:22 -02:00
Suraj Jitindar Singh	b9a477b725	ppc/spapr_caps: Add SPAPR_CAP_NESTED_KVM_HV Add the spapr cap SPAPR_CAP_NESTED_KVM_HV to be used to control the availability of nested kvm-hv to the level 1 (L1) guest. Assuming a hypervisor with support enabled an L1 guest can be allowed to use the kvm-hv module (and thus run it's own kvm-hv guests) by setting: -machine pseries,cap-nested-hv=true or disabled with: -machine pseries,cap-nested-hv=false Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-11-08 13:08:35 +11:00
Thomas Huth	0e947a89ce	hw/ppc/spapr_rng: Introduce CONFIG_SPAPR_RNG switch for spapr_rng.c The spapr-rng device is suboptimal when compared to virtio-rng, so users might want to disable it in their builds. Thus let's introduce a proper CONFIG switch to allow us to compile QEMU without this device. The function spapr_rng_populate_dt is required for linking, so move it to a different location. Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-11-08 12:04:40 +11:00
David Hildenbrand	946d6154ab	memory-device: add and use memory_device_get_region_size() We will factor out get_memory_region() from pc-dimm to memory device code soon. Once that is done, get_region_size() can be implemented generically and essentially be replaced by memory_device_get_region_size (and work only on get_memory_region()). We have some users of get_memory_region() (spapr and pc-dimm code) that are only interested in the size. So let's rework them to use memory_device_get_region_size() first, then we can factor out get_memory_region() and eventually remove get_region_size() without touching the same code multiple times. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20181005092024.14344-10-david@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2018-10-24 06:44:59 -03:00
David Hildenbrand	fd3416f5eb	pc-dimm: pass PCDIMMDevice to pc_dimm_.*plug We're plugging/unplugging a PCDIMMDevice, so directly pass this type instead of a more generic DeviceState. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20181005092024.14344-5-david@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2018-10-24 06:44:59 -03:00
Cédric Le Goater	0976efd51b	spapr_pci: add an extra 'nr_msis' argument to spapr_populate_pci_dt So that we don't have to call qdev_get_machine() to get the machine class and the sPAPRIrq backend holding the number of MSIs. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-09-25 11:12:25 +10:00
Cédric Le Goater	ae83740237	spapr: increase the size of the IRQ number space The new layout using static IRQ number does not leave much space to the dynamic MSI range, only 0x100 IRQ numbers. Increase the total number of IRQS for newer machines and introduce a legacy XICS backend for pre-3.1 machines to maintain compatibility. For the old backend, provide a 'nr_msis' value covering the full IRQ number space as it does not use the bitmap allocator to allocate MSI interrupt numbers. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-09-25 11:12:25 +10:00
Sam Bobroff	ecda255eba	spapr: Correct reference count on spapr-cpu-core spapr_init_cpus() currently creates spapr-cpu-core objects via object_new() and setting their realized property to true. This leaves their reference count at two, because object_new() adds an initial reference and the realization attaches them to a default parent object which also increments the reference count. This causes a problem if one of these cores is hot unplugged: no delete event is generated for it because it's reference count doesn't reach zero when it is detached from it's parent. Correct this by adding a call to object_unref() in spapr_init_cpus(). Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-08-30 15:58:42 +10:00
Emilio G. Cota	eceba3477e	spapr: fix leak of rev array Introduced in `04d595b300` ("spapr: do not use CPU_FOREACH_REVERSE", 2018-08-23) Fixes: CID1395181 Reported-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-08-28 11:31:23 +10:00
Peter Maydell	3c825bb7c1	* x86 TCG fixes for 64-bit call gates (Andrew) * qumu-guest-agent freeze-hook tweak (Christian) * pm_smbus improvements (Corey) * Move validation to pre_plug for pc-dimm (David) * Fix memory leaks (Eduardo, Marc-André) * synchronization profiler (Emilio) * Convert the CPU list to RCU (Emilio) * LSI support for PPR Extended Message (George) * vhost-scsi support for protection information (Greg) * Mark mptsas as a storage device in the help (Guenter) * checkpatch tweak cherry-picked from Linux (me) * Typos, cleanups and dead-code removal (Julia, Marc-André) * qemu-pr-helper support for old libmultipath (Murilo) * Annotate fallthroughs (me) * MemoryRegionOps cleanup (me, Peter) * Make s390 qtests independent from libqos, which doesn't actually support it (me) * Make cpu_get_ticks independent from BQL (me) * Introspection fixes (Thomas) * Support QEMU_MODULE_DIR environment variable (ryang) -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAlt+5OYUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroPtxwf8CQM/F+0L+EKeYfYcVgVZsDhhOkLj Pm61q0bZsWKLby5jCqIDYw7Z/vodJnSS1DO0slIRoXxvQ9DwlkbBnBy/aG/E9U0q WF1vbCezibDIt7sGcsu9F5zXU9eqe+E6dZfxFrv8FQSOFVxn34TfeJagWLCtzg0d LnVTF/e4zJD8IQiM7w6lJQxua3fz13ssPEg2KnMkguDhACMwvZ/K/cA2AJkHRMhY sroPMwLHlrF1NOoeCIrWxYUmSGCRCAy1DmiPGiiSs0yBq/dL0UkAa5Eu6HMQ7rgI zUff3JDmzEjixUSIEbpVRN+yPCN0/ACSOpJUrKLDxXbc4nZ+PBQ04YpyPQ== =UZiV -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging * x86 TCG fixes for 64-bit call gates (Andrew) * qumu-guest-agent freeze-hook tweak (Christian) * pm_smbus improvements (Corey) * Move validation to pre_plug for pc-dimm (David) * Fix memory leaks (Eduardo, Marc-André) * synchronization profiler (Emilio) * Convert the CPU list to RCU (Emilio) * LSI support for PPR Extended Message (George) * vhost-scsi support for protection information (Greg) * Mark mptsas as a storage device in the help (Guenter) * checkpatch tweak cherry-picked from Linux (me) * Typos, cleanups and dead-code removal (Julia, Marc-André) * qemu-pr-helper support for old libmultipath (Murilo) * Annotate fallthroughs (me) * MemoryRegionOps cleanup (me, Peter) * Make s390 qtests independent from libqos, which doesn't actually support it (me) * Make cpu_get_ticks independent from BQL (me) * Introspection fixes (Thomas) * Support QEMU_MODULE_DIR environment variable (ryang) # gpg: Signature made Thu 23 Aug 2018 17:46:30 BST # gpg: using RSA key BFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: (69 commits) KVM: cleanup unnecessary #ifdef KVM_CAP_... target/i386: update MPX flags when CPL changes i2c: pm_smbus: Add the ability to force block transfer enable i2c: pm_smbus: Don't delay host status register busy bit when interrupts are enabled i2c: pm_smbus: Add interrupt handling i2c: pm_smbus: Add block transfer capability i2c: pm_smbus: Make the I2C block read command read-only i2c: pm_smbus: Fix the semantics of block I2C transfers i2c: pm_smbus: Clean up some style issues pc-dimm: assign and verify the "addr" property during pre_plug pc: drop memory region alignment check for 0 util/oslib-win32: indicate alignment for qemu_anon_ram_alloc() pc-dimm: assign and verify the "slot" property during pre_plug ipmi: Use proper struct reference for BT vmstate vhost-scsi: expose 't10_pi' property for VIRTIO_SCSI_F_T10_PI vhost-scsi: unify vhost-scsi get_features implementations vhost-user-scsi: move host_features into VHostSCSICommon cpus: allow cpu_get_ticks out of BQL cpus: protect TimerState writes with a spinlock seqlock: add QemuLockable support ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2018-08-23 19:03:54 +01:00
David Hildenbrand	b0e624435b	pc-dimm: assign and verify the "addr" property during pre_plug We can assign and verify the address before realizing and trying to plug. reading/writing the address property should never fail for DIMMs, so let's reduce error handling a bit by using &error_abort. Getting access to the memory region now might however fail. So forward errors from get_memory_region() properly. As all memory devices should use the alignment of the underlying memory region for guest physical address asignment, do detection of the alignment in pc_dimm_pre_plug(), but allow pc.c to overwrite the alignment for compatibility handling. Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20180801133444.11269-5-david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-08-23 18:46:25 +02:00
David Hildenbrand	8f1ffe5be8	pc-dimm: assign and verify the "slot" property during pre_plug We can assign and verify the slot before realizing and trying to plug. reading/writing the slot property should never fail, so let's reduce error handling a bit by using &error_abort. To do this during pre_plug, add and use (x86, ppc) pc_dimm_pre_plug(). Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20180801133444.11269-2-david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-08-23 18:46:25 +02:00
Emilio G. Cota	04d595b300	spapr: do not use CPU_FOREACH_REVERSE This paves the way for implementing the CPU list with an RCU list, which cannot be traversed in reverse order. Note that this is the only caller of CPU_FOREACH_REVERSE. Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <20180819091335.22863-11-cota@braap.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-08-23 18:46:25 +02:00
Cédric Le Goater	ef01ed9d19	spapr: introduce a IRQ controller backend to the machine This proposal moves all the related IRQ routines of the sPAPR machine behind a sPAPR IRQ backend interface 'spapr_irq' to prepare for future changes. First of which will be to increase the size of the IRQ number space, then, will follow a new backend for the POWER9 XIVE IRQ controller. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-08-21 14:28:45 +10:00
Cédric Le Goater	82cffa2eb2	spapr: introduce a fixed IRQ number space This proposal introduces a new IRQ number space layout using static numbers for all devices, depending on a device index, and a bitmap allocator for the MSI IRQ numbers which are negotiated by the guest at runtime. As the VIO device model does not have a device index but a "reg" property, we introduce a formula to compute an IRQ number from a "reg" value. It should minimize most of the collisions. The previous layout is kept in pre-3.1 machines raising the 'legacy_irq_allocation' machine class flag. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-08-21 14:28:45 +10:00
Cédric Le Goater	d45360d93d	spapr: Add a pseries-3.1 machine type Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-08-21 14:28:45 +10:00
Mark Cave-Ayland	907aac2f6a	fw_cfg: ignore suffixes in the bootdevice list dependent on machine class For the older machines (such as Mac and SPARC) the DT nodes representing bootdevices for disk nodes are irregular for mainly historical reasons. Since the majority of bootdevice nodes for these machines either do not have a separate disk node or require different (custom) names then it is much easier for processing to just disable all suffixes for a particular machine. Introduce a new ignore_boot_device_suffixes MachineClass property to control bootdevice suffix generation, defaulting to false in order to preserve compatibility. Suggested-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Message-Id: <20180810124027.10698-1-mark.cave-ayland@ilande.co.uk> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2018-08-16 22:27:43 -03:00
David Gibson	ccc2cef8b3	spapr: Correct inverted test in spapr_pc_dimm_node() This function was introduced between v2.11 and v2.12 to replace obsolete ways of specifying the NUMA nodes for DIMMs. It's used to find the correct node for an LMB, by locating which DIMM object it lies within. Unfortunately, one of the checks is inverted, so we check whether the address is less than two different things, rather than actually checking a range. This introduced a regression, meaning that after a reboot qemu will advertise incorrect node information for memory to the guest. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Igor Mammedov <imammedo@redhat.com>	2018-07-16 11:18:09 +10:00
Peter Maydell	b07cd3e748	ppc patch queue 2018-07-03 Here's a last minue pull request before today's soft freeze. Ideally I would have sent this earlier, but I was waiting for a couple of extra fixes I knew were close. And the freeze crept up on me, like always. Most of the changes here are bugfixes in any case. There are some cleanups as well, which have been in my staging tree for a little while. There are a couple of truly new features (some extensions to the sam460ex platform), but these are low risk, since they only affect a new and not really stabilized machine type anyway. Higlights are: * Mac platform improvements from Mark Cave-Ayland * Sam460ex improvements from BALATON Zoltan et al. * XICS interrupt handler cleanups from Cédric Le Goater * TCG improvements for atomic loads and stores from Richard Henderson * Assorted other bugfixes -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAls7D8oACgkQbDjKyiDZ s5Lxmg//YzPfC/nKqTTKkyJPzh/NnSC+kRTMAT3mbxdRIc7yfgMqJtWGGbS1iKgK EeJ9hl5Qm0HfscfDuzf0xasU62ZEv3kNdLnWJEIgkqiXrxoO5KCnC0y4D8NN1W03 mvINNCa8+QDg2OsirGmNUTkriiG3wLIrHTpLZ4+JuC2Bd9H3nTHZgJ0MXON/1VWY oRgr6kMZ5+IAzPhvYLFR6l3nPI883fgJOFyRo7YqYrkVBKFrFkfK0Xjw6vpsNxcx 2dE/YCHhNIriLuBG5noewL7GuqZRtLnl6rjjee5VAKIe1EmFeR+jsXwNjzGOVOJg dhjOtsJsQQ3WdEw5uImJzE64kV228WCgmkeXzZd1010JBLr7sUkrd2EuoZ23vvat uvZAHVSBrJg5WvzMo1VMEoPU3VeeZQ5HL+MI80iKiU6oUgRK11gVJcebtA0sEKt+ zhJC4JiUlHtZLTGIpMBmU8DJZ3Tyk1cBEm+Ky+SaPE+dsz16UHI0fazFQXJnXphE MLHEGAyQgzWYp7kIcAjUFev0Geq/Uovy4JKIGI6ISop1wRPEQDxkthfkfRyQxQkE zuse4EBcEH/Undw9KrmEQa0hCe+8BRkxklVbPesFPPdqH3PKNxtHYuWpSShQF0PW XMjw43O2Rbsl8kBUHCpy4pYSugD1hpfgaw/mVUOU1u/M1O6toTw= =AHrx -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-3.0-20180703' into staging ppc patch queue 2018-07-03 Here's a last minue pull request before today's soft freeze. Ideally I would have sent this earlier, but I was waiting for a couple of extra fixes I knew were close. And the freeze crept up on me, like always. Most of the changes here are bugfixes in any case. There are some cleanups as well, which have been in my staging tree for a little while. There are a couple of truly new features (some extensions to the sam460ex platform), but these are low risk, since they only affect a new and not really stabilized machine type anyway. Higlights are: * Mac platform improvements from Mark Cave-Ayland * Sam460ex improvements from BALATON Zoltan et al. * XICS interrupt handler cleanups from Cédric Le Goater * TCG improvements for atomic loads and stores from Richard Henderson * Assorted other bugfixes # gpg: Signature made Tue 03 Jul 2018 06:55:22 BST # gpg: using RSA key 6C38CACA20D9B392 # gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" # gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" # gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" # gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" # Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392 * remotes/dgibson/tags/ppc-for-3.0-20180703: (35 commits) ppc: Include vga cirrus card into the compiling process target/ppc: Relax reserved bitmask of indexed store instructions target/ppc: set is_jmp on ppc_tr_breakpoint_check spapr: compute default value of "hpt-max-page-size" later target/ppc/kvm: don't pass cpu to kvm_get_smmu_info() target/ppc/kvm: get rid of kvm_get_fallback_smmu_info() ppc440_uc: Basic emulation of PPC440 DMA controller sam460ex: Add RTC device hw/timer: Add basic M41T80 emulation ppc4xx_i2c: Rewrite to model hardware more closely hw/ppc: Give sam46ex its own config option fpu_helper.c: fix setting FPSCR[FI] bit target/ppc: Implement the rest of gen_st_atomic target/ppc: Implement the rest of gen_ld_atomic target/ppc: Use atomic min/max helpers target/ppc: Use MO_ALIGN for EXIWX and ECOWX target/ppc: Split out gen_st_atomic target/ppc: Split out gen_ld_atomic target/ppc: Split out gen_load_locked target/ppc: Tidy gen_conditional_store ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org> # Conflicts: # hw/ppc/spapr.c	2018-07-03 14:59:27 +01:00
Sebastian Bauer	29f9cef39e	ppc: Include vga cirrus card into the compiling process Drivers for this card exists on PPC-based AmigaOS guests so it is useful to allow users to emulate the graphics card for PPC machines. As cirrus vga is currently preferred over std(vga) in absence of any user choice, this change also sets the default display of spapr machines to std as otherwise qemu refuses to start these machines. Not specifying an explicit graphics mode is for instance done by 'make check'. Signed-off-by: Sebastian Bauer <mail@sebastianbauer.info> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-07-03 11:23:09 +10:00
Greg Kurz	e89372951d	spapr: compute default value of "hpt-max-page-size" later It is currently not possible to run a pseries-2.12 or older machine with HV KVM. QEMU prints the following and exits right away. qemu-system-ppc64: KVM doesn't support for base page shift 34 The "hpt-max-page-size" capability was recently added to spapr to hide host configuration details from HPT mode guests. Its default value for newer machine types is 64k. For backwards compatibility, pseries-2.12 and older machine types need a different value. This is handled as usual in a class init function. The default value is 16G, ie, all page sizes supported by POWER7 and newer CPUs, but HV KVM requires guest pages to be hpa contiguous as well as gpa contiguous. The default value is the page size used to back the guest RAM in this case. Unfortunately kvmppc_hpt_needs_host_contiguous_pages()->kvm_enabled() is called way before KVM init and returns false, even if the user requested KVM. We thus end up selecting 16G, which isn't supported by HV KVM. The default value must be set during machine init, because we can safely assume that KVM is initialized at this point. We fix this by moving the logic to default_caps_with_cpu(). Since the user cannot pass cap-hpt-max-page-size=0, we set the default to 0 in the pseries-2.12 class init function and use that as a flag to do the real work. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-07-03 10:20:15 +10:00
Cédric Le Goater	abe82ebb20	ppc/xics: rework the ICS classes inheritance tree With the previous changes, we can now let the ICS_KVM class inherit directly from ICS_BASE class and not from the intermediate ICS_SIMPLE. It makes the class hierarchy much cleaner. What is left in the top classes is the low level interface to access the KVM XICS device in ICS_KVM and the XICS emulating handlers in ICS_SIMPLE. This should not break migration compatibility. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-07-03 09:56:51 +10:00
Philippe Mathieu-Daudé	ab3dd74924	hw/ppc: Use the IEC binary prefix definitions It eases code review, unit is explicit. Patch generated using: $ git grep -E '(1024\|2048\|4096\|8192\|(<<\|>>).?(10\|20\|30))' hw/ include/hw/ and modified manually. Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20180625124238.25339-33-f4bug@amsat.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-07-02 15:41:16 +02:00
Philippe Mathieu-Daudé	d23b6caadb	hw: Use IEC binary prefix definitions from "qemu/units.h" Code change produced with: $ git ls-files \| egrep '\.[ch]$' \| \ xargs sed -i -e 's/$\W[KMGTPE]$_BYTE/\1iB/g' Suggested-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> (ppc parts) Message-Id: <20180625124238.25339-6-f4bug@amsat.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-07-02 15:41:10 +02:00
David Hildenbrand	f0b7bca64d	pc-dimm: get_memory_region() will not fail after realize Let's try to reduce error handling a bit. In the plug/unplug case, the device was realized and therefore we can assume that getting access to the memory region will not fail. For get_vmstate_memory_region() this is already handled that way. Document both cases. Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20180619134141.29478-13-david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-06-28 19:05:34 +02:00
David Hildenbrand	284878ee98	pc-dimm: rename pc_dimm_memory_* to pc_dimm_* Let's rename it to make it look more consistent. Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20180619134141.29478-4-david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-06-28 19:05:33 +02:00
David Gibson	123eec6552	spapr: Use maximum page size capability to simplify memory backend checking The way we used to handle KVM allowable guest pagesizes for PAPR guests required some convoluted checking of memory attached to the guest. The allowable pagesizes advertised to the guest cpus depended on the memory which was attached at boot, but then we needed to ensure that any memory later hotplugged didn't change which pagesizes were allowed. Now that we have an explicit machine option to control the allowable maximum pagesize we can simplify this. We just check all memory backends against that declared pagesize. We check base and cold-plugged memory at reset time, and hotplugged memory at pre_plug() time. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org>	2018-06-22 14:19:07 +10:00
David Gibson	2309832afd	spapr: Maximum (HPT) pagesize property The way the POWER Hash Page Table (HPT) MMU is virtualized by KVM HV means that every page that the guest puts in the pagetables must be truly physically contiguous, not just GPA-contiguous. In effect this means that an HPT guest can't use any pagesizes greater than the host page size used to back its memory. At present we handle this by changing what we advertise to the guest based on the backing pagesizes. This is pretty bad, because it means the guest sees a different environment depending on what should be host configuration details. As a start on fixing this, we add a new capability parameter to the pseries machine type which gives the maximum allowed pagesizes for an HPT guest. For now we just create and validate the parameter without making it do anything. For backwards compatibility, on older machine types we set it to the max available page size for the host. For the 3.0 machine type, we fix it to 16, the intention being to only allow HPT pagesizes up to 64kiB by default in future. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org>	2018-06-22 14:19:07 +10:00
Cédric Le Goater	71b5c8d26e	spapr: remove unused spapr_irq routines spapr_irq_alloc_block and spapr_irq_alloc() are now deprecated. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-06-21 21:22:53 +10:00
Cédric Le Goater	4fe75a8ccd	spapr: split the IRQ allocation sequence Today, when a device requests for IRQ number in a sPAPR machine, the spapr_irq_alloc() routine first scans the ICSState status array to find an empty slot and then performs the assignement of the selected numbers. Split this sequence in two distinct routines : spapr_irq_find() for lookups and spapr_irq_claim() for claiming the IRQ numbers. This will ease the introduction of a static layout of IRQ numbers. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-06-21 21:22:53 +10:00
David Gibson	9f6edd066e	spapr: Compute effective capability values earlier Previously, the effective values of the various spapr capability flags were only determined at machine reset time. That was a lazy way of making sure it was after cpu initialization so it could use the cpu object to inform the defaults. But we've now improved the compat checking code so that we don't need to instantiate the cpus to use it. That lets us move the resolution of the capability defaults much earlier. This is going to be necessary for some future capabilities. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org>	2018-06-21 21:22:53 +10:00
David Gibson	ad99d04c76	target/ppc: Allow cpu compatiblity checks based on type, not instance ppc_check_compat() is used in a number of places to check if a cpu object supports a certain compatiblity mode, subject to various constraints. It takes a PowerPCCPU *, however it really only depends on the cpu's class. We have upcoming cases where it would be useful to make compatibility checks before we fully instantiate the cpu objects. ppc_type_check_compat() will now make an equivalent check, but based on a CPU's QOM typename instead of an instantiated CPU object. We make use of the new interface in several places in spapr, where we're essentially making a global check, rather than one specific to a particular cpu. This avoids some ugly uses of first_cpu to grab a "representative" instance. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org>	2018-06-21 21:22:53 +10:00
Greg Kurz	b94020268e	spapr_cpu_core: migrate per-CPU data A per-CPU machine data pointer was recently added to PowerPCCPU. The motivation is to to hide platform specific details from the core CPU code. This per-CPU data can hold state which is relevant to the guest though, eg, Virtual Processor Areas, and we should migrate this state. This patch adds the plumbing so that we can migrate the per-CPU data for PAPR guests. We only do this for newer machine types for the sake of backward compatibility. No state is migrated for the moment: the vmstate_spapr_cpu_state structure will be populated by subsequent patches. Signed-off-by: Greg Kurz <groug@kaod.org> [dwg: Fix some trivial spelling and spacing errors] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-06-21 21:22:53 +10:00
Greg Kurz	844afc54ae	spapr: fix xics_system_init() error path Commit `3d85885a1b` tried to fix error handling, but it actually went into the wrong direction by dropping the local Error . In the default KVM case, the rationale is to try the in-kernel XICS first, and if not possible, to fallback to userland XICS. Passing errp everywhere makes this fallback impossible if errp is &error_fatal (which happens to be the case). And anyway, if the caller would pass a regular &local_err, things would be worse: we could possibly pass an already set errp to error_setg() and crash, or return an error even in case of success. So we definitely need a local Error * and only propagate it when we're done with the fallback logic. This is what this patch does. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-06-18 09:43:19 +10:00
David Hildenbrand	a4261be172	spapr: handle cpu core unplug via hotplug handler chain Factor out cpu core unplug into separate function from spapr_core_release(). Then use generic hotplug_handler_unplug() to trigger cpu core unplug, which would call spapr_machine_device_unplug() -> spapr_core_unplug() in the end. This way unplug operation is not buried in spapr internals and located in the same place like in other targets, following similar logic/call chain across targets. Acked-by: Igor Mammedov <imammedo@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-06-12 10:44:36 +10:00
David Hildenbrand	3ec71474ca	spapr: handle pc-dimm unplug via hotplug handler chain Factor out memory unplug into separate function from spapr_lmb_release(). Then use generic hotplug_handler_unplug() to trigger memory unplug, which will call spapr_machine_device_unplug() -> spapr_memory_unplug() in the end. This way unplug operation is not buried in lmb internals and located in the same place like in other targets, following similar logic/call chain across targets. Acked-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-06-12 10:44:36 +10:00
David Hildenbrand	88432f44aa	spapr: introduce machine unplug handler We'll be handling unplug of e.g. CPUs and PCDIMMs via the general hotplug handler soon, so let's add that handler function. Acked-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-06-12 10:44:36 +10:00
David Hildenbrand	4e8a01bdb2	spapr: move memory hotplug support check into spapr_memory_pre_plug() Let's finish cleaning up the hotplug handler. This check can be performed in the pre_plug code as the very first thing. Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-06-12 10:44:36 +10:00
David Hildenbrand	81985f3be9	spapr: move lookup of the node into spapr_memory_plug() Let's clean the hotplug handler up by moving lookup of the node into the function where it is actually being used. Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-06-12 10:44:36 +10:00
David Hildenbrand	fcc8ef17e2	spapr: no need to verify the node The node property can always be queried and the value has already been verified in pc_dimm_realize(). Acked-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-06-12 10:44:36 +10:00
Peter Maydell	afd76ffba9	* Linux header upgrade (Peter) * firmware.json definition (Laszlo) * IPMI migration fix (Corey) * QOM improvements (Alexey, Philippe, me) * Memory API cleanups (Jay, me, Tristan, Peter) * WHPX fixes and improvements (Lucian) * Chardev fixes (Marc-André) * IOMMU documentation improvements (Peter) * Coverity fixes (Peter, Philippe) * Include cleanup (Philippe) * -clock deprecation (Thomas) * Disable -sandbox unless CONFIG_SECCOMP (Yi Min Zhao) * Configurability improvements (me) -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAlsRd2UUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroPG8Qf+M85E8xAQ/bhs90tAymuXkUUsTIFF uI76K8eM0K3b2B+vGckxh1gyN5O3GQaMEDL7vITfqbX+EOH5U2lv8V9JRzf2YvbG Zahjd4pOCYzR0b9JENA1r5U/J8RntNrBNXlKmGTaXOaw9VCXlZyvgVd9CE3z/e2M 0jSXMBdF4LB3UzECI24Va8ejJxdSiJcqXA2j3J+pJFxI698i+Z5eBBKnRdo5TVe5 jl0TYEsbS6CLwhmbLXmt3Qhq+ocZn7YH9X3HjkHEdqDUeYWyT9jwUpa7OHFrIEKC ikWm9er4YDzG/vOC0dqwKbShFzuTpTJuMz5Mj4v8JjM/iQQFrp4afjcW2g== =RS/B -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging * Linux header upgrade (Peter) * firmware.json definition (Laszlo) * IPMI migration fix (Corey) * QOM improvements (Alexey, Philippe, me) * Memory API cleanups (Jay, me, Tristan, Peter) * WHPX fixes and improvements (Lucian) * Chardev fixes (Marc-André) * IOMMU documentation improvements (Peter) * Coverity fixes (Peter, Philippe) * Include cleanup (Philippe) * -clock deprecation (Thomas) * Disable -sandbox unless CONFIG_SECCOMP (Yi Min Zhao) * Configurability improvements (me) # gpg: Signature made Fri 01 Jun 2018 17:42:13 BST # gpg: using RSA key BFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: (56 commits) hw: make virtio devices configurable via default-configs/ hw: allow compiling out SCSI memory: Make operations using MemoryRegionIoeventfd struct pass by pointer. char: Remove unwanted crlf conversion qdev: Remove DeviceClass::init() and ::exit() qdev: Simplify the SysBusDeviceClass::init path hw/i2c: Use DeviceClass::realize instead of I2CSlaveClass::init hw/i2c/smbus: Use DeviceClass::realize instead of SMBusDeviceClass::init target/i386/kvm.c: Remove compatibility shim for KVM_HINTS_REALTIME Update Linux headers to 4.17-rc6 target/i386/kvm.c: Handle renaming of KVM_HINTS_DEDICATED scripts/update-linux-headers: Handle kernel license no longer being one file scripts/update-linux-headers: Handle __aligned_u64 virtio-gpu-3d: Define VIRTIO_GPU_CAPSET_VIRGL2 elsewhere gdbstub: Prevent fd leakage docs/interop: add "firmware.json" ipmi: Use proper struct reference for KCS vmstate vmstate: Add a VSTRUCT type tcg: remove softfloat from --disable-tcg builds qemu-options: Mark the non-functional -clock option as deprecated ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2018-06-01 18:24:16 +01:00
Philippe Mathieu-Daudé	0304f9ec9c	hw: Do not include "sysemu/block-backend.h" if it is not necessary Remove those unneeded includes to speed up the compilation process a little bit. (Continue `7eceff5b5a` cleanup) Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20180528232719.4721-13-f4bug@amsat.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-06-01 14:15:10 +02:00
Peter Maydell	d8c0c7af80	ppc: Rename 2.13 machines to 3.0 Rename the 2.13 machines to match the number we're going to use for the next release. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-id: 20180522104000.9044-5-peter.maydell@linaro.org	2018-05-29 11:28:46 +01:00
Igor Mammedov	debbdc0018	make sure that we aren't overwriting mc->get_hotplug_handler by accident Suggested-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-id: 1525691524-32265-5-git-send-email-imammedo@redhat.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2018-05-10 18:10:56 +01:00
David Hildenbrand	0c9269a52d	spapr: rename "hotplug memory" terminology to "device memory" Let's make it clear at relevant places that we are dealing with device memory. That it can be used for memory hotplug is just a special case. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20180423165126.15441-11-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> [ehabkost: rebased series, solved conflicts at spapr.c] Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2018-05-07 10:00:02 -03:00
David Hildenbrand	bd6c3e4a49	pc-dimm: pass in the machine and to the MemoryHotplugState We use the machine internally either way, so let's just pass it in then. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20180423165126.15441-5-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2018-05-07 10:00:02 -03:00
David Hildenbrand	acc7fa17e6	pc-dimm: no need to pass the memory region We can just query it ourselves. When unplugging, we should always be able to the region (as it was previously plugged). E.g. PPC already assumed that and used &error_abort. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20180423165126.15441-4-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2018-05-07 10:00:02 -03:00
David Hildenbrand	b0c14ec4ef	machine: make MemoryHotplugState accessible via the machine Let's allow to query the MemoryHotplugState directly from the machine. If the pointer is NULL, the machine does not support memory devices. If the pointer is !NULL, the machine supports memory devices and the data structure contains information about the applicable physical guest address space region. This allows us to generically detect if a certain machine has support for memory devices, and to generically manage it (find free address range, plug/unplug a memory region). We will rename "MemoryHotplugState" to something more meaningful ("DeviceMemory") after we completed factoring out the pc-dimm code into MemoryDevice code. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20180423165126.15441-3-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> [ehabkost: rebased series, solved conflicts at spapr.c] [ehabkost: squashed fix to use g_malloc0()] Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2018-05-07 10:00:02 -03:00
David Hildenbrand	2cc0e2e814	pc-dimm: factor out MemoryDevice interface On the qmp level, we already have the concept of memory devices: "query-memory-devices" Right now, we only support NVDIMM and PCDIMM. We want to map other devices later into the address space of the guest. Such device could e.g. be virtio devices. These devices will have a guest memory range assigned but won't be exposed via e.g. ACPI. We want to make them look like memory device, but not glued to pc-dimm. Especially, it will not always be possible to have TYPE_PC_DIMM as a parent class (e.g. virtio devices). Let's use an interface instead. As a first part, convert handling of - qmp_pc_dimm_device_list - get_plugged_memory_size to our new model. plug/unplug stuff etc. will follow later. A memory device will have to provide the following functions: - get_addr(): Necessary, as the property "addr" can e.g. not be used for virtio devices (already defined). - get_plugged_size(): The amount this device offers to the guest as of now. - get_region_size(): Because this can later on be bigger than the plugged size. - fill_device_info(): Fill MemoryDeviceInfo, e.g. for qmp. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20180423165126.15441-2-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2018-05-07 10:00:02 -03:00
Greg Kurz	0550b1206a	spapr: don't advertise radix GTSE if max-compat-cpu < power9 On a POWER9 host, if a guest runs in pre POWER9 compat mode, it necessarily uses the hash MMU mode. In this case, we shouldn't advertise radix GTSE in the ibm,arch-vec-5-platform-support DT property as the current code does. The first reason is that it doesn't make sense, and the second one is that causes the CAS-negotiated options subsection to be migrated. This breaks backward migration to QEMU 2.7 and older versions on POWER8 hosts: qemu-system-ppc64: error while loading state for instance 0x0 of device 'spapr' qemu-system-ppc64: load of migration failed: No such file or directory This patch hence initialize CPUs a bit earlier so that we can check the requested compat mode, and don't set OV5_MMU_RADIX_GTSE for power8 and older. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-05-04 15:00:37 +10:00
Greg Kurz	aef19c04bf	spapr: don't migrate "spapr_option_vector_ov5_cas" to pre 2.8 machines `a324d6f166` "spapr: Support ibm,dynamic-memory-v2 property" added a new feature in the set of CAS-negotiatable options. This causes the CAS-negotiated options subsection to be migrated, even for old machine types that don't know about it, and breaks backward migration to QEMU 2.7 and older versions: qemu-system-ppc64: error while loading state for instance 0x0 of device 'spapr' qemu-system-ppc64: load of migration failed: No such file or directory Since this feature only affects boot time behaviour, it should be filtered out when we decide to migrate CAS-negotiated options, like we already do with OV5_FORM1_AFFINITY and OV5_DRCONF_MEMORY. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-05-04 15:00:37 +10:00
David Gibson	84369f639e	spapr: Make a helper to set up cpu entry point state Under PAPR, only the boot CPU is active when the system starts. Other cpus must be explicitly activated using an RTAS call. The entry state for the boot and secondary cpus isn't identical, but it has some things in common. We're going to add a bit more common setup later, too, so to simplify make a helper which sets up the common entry state for both boot and secondary cpu threads. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Cédric Le Goater <clg@kaod.org> Tested-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org>	2018-05-04 15:00:37 +10:00
David Gibson	090052aa08	spapr: Remove support for explicitly allocated RMAs Current POWER cpus allow for a VRMA, a special mapping which describes a guest's view of memory when in real mode (MMU off, from the guest's point of view). Older cpus didn't have that which meant that to support a guest a special host-contiguous region of memory was needed to give the guest its Real Mode Area (RMA). KVM used to provide special calls to allocate a contiguous RMA for those cases. This was useful in the early days of KVM on Power to allow it to be tested on PowerPC 970 chips as used in Macintosh G5 machines. Now, those machines are so old as to be almost irrelevant. The normal qemu deprecation process would require this to be marked deprecated then removed in 2 releases. However, this can only be used with corresponding support in the host kernel - which was dropped years ago (in c17b98cf "KVM: PPC: Book3S HV: Remove code for PPC970 processors" of 2014-12-03 to be precise). Therefore it should be ok to drop this immediately. Just to be clear this only affects KVM HV guests with PowerPC 970, and those already require an ancient host kernel. TCG and KVM PR guests with PowerPC 970 should still work. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Thomas Huth <thuth@redhat.com>	2018-05-04 11:15:18 +10:00
Bharata B Rao	a324d6f166	spapr: Support ibm,dynamic-memory-v2 property The new property ibm,dynamic-memory-v2 allows memory to be represented in a more compact manner in device tree. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-04-27 18:05:23 +10:00
Serhii Popovych	da9f80fbad	spapr: Add ibm,max-associativity-domains property Now recent kernels (i.e. since linux-stable commit a346137e9142 ("powerpc/numa: Use ibm,max-associativity-domains to discover possible nodes") support this property to mark initially memory-less NUMA nodes as "possible" to allow further memory hot-add to them. Advertise this property for pSeries machines to let guest kernels detect maximum supported node configuration and benefit from kernel side change when hot-add memory to specific, possibly empty before, NUMA node. Signed-off-by: Serhii Popovych <spopovyc@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-04-27 18:05:23 +10:00
David Gibson	67d7d66f27	target/ppc: Fold slb_nr into PPCHash64Options The env->slb_nr field gives the size of the SLB (Segment Lookaside Buffer). This is another static-after-initialization parameter of the specific version of the 64-bit hash MMU in the CPU. So, this patch folds the field into PPCHash64Options with the other hash MMU options. This is a bit more complicated that the things previously put in there, because slb_nr was foolishly included in the migration stream. So we need some of the usual dance to handle backwards compatible migration. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org>	2018-04-27 18:05:22 +10:00
David Gibson	26cd35b861	target/ppc: Fold ci_large_pages flag into PPCHash64Options The ci_large_pages boolean in CPUPPCState is only relevant to 64-bit hash MMU machines, indicating whether it's possible to map large (> 4kiB) pages as cache-inhibitied (i.e. for IO, rather than memory). Fold it as another flag into the PPCHash64Options structure. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org>	2018-04-27 18:05:22 +10:00
David Gibson	58969eeece	target/ppc: Move 1T segment and AMR options to PPCHash64Options Currently env->mmu_model is a bit of an unholy mess of an enum of distinct MMU types, with various flag bits as well. This makes which bits of the field should be compared pretty confusing. Make a start on cleaning that up by moving two of the flags bits - POWERPC_MMU_1TSEG and POWERPC_MMU_AMR - which are specific to the 64-bit hash MMU into a new flags field in PPCHash64Options structure. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org>	2018-04-27 18:05:22 +10:00
David Gibson	644a2c99a9	target/ppc: Pass cpu instead of env to ppc_create_page_sizes_prop() As a rule we prefer to pass PowerPCCPU instead of CPUPPCState, and this change will make some things simpler later on. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org>	2018-04-27 18:05:22 +10:00
Greg Kurz	b2692d5fed	spapr: drop useless dynamic sysbus device sanity check Since commit `7da79a167a`, the machine class init function registers dynamic sysbus device types it supports. Passing an unsupported device type on the command line causes QEMU to exit with an error message just after machine init. It is hence not needed to do the same sanity check at machine reset. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-04-27 18:05:22 +10:00
Serhii Popovych	e47f1d2786	Revert "spapr: Don't allow memory hotplug to memory less nodes" This reverts commit `b556854bd8`. Leave change @node type from uint32_t to to int from reverted commit because node < 0 is always false. Note that implementing capability or some trick to detect if guest kernel does not support hot-add to memory: this returns previous behavour where memory added to first non-empty node. Signed-off-by: Serhii Popovych <spopovyc@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-04-27 18:05:22 +10:00
Greg Kurz	1d36c75a9e	spapr: drop useless sanity check in spapr_irq_alloc() Both spapr_irq_alloc() and spapr_irq_alloc_block() have an errp parameter, but they don't use it if XICS hasn't been initialized yet. This is doubly wrong: - all callers do pass a non-null Error *, ie, they expect an error to be propagated in case of failure - XICS obviously needs to be initialized before anything starts allocating IRQs So this patch turns the check into an assert. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-04-27 18:05:22 +10:00
David Gibson	8a4fd427fe	spapr: Introduce pseries-2.13 machine type Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-04-27 18:05:22 +10:00
Peter Maydell	b8846a4d63	vl.c: new function serial_max_hds() Create a new function serial_max_hds() which returns the number of serial ports defined by the user. This is needed only by spapr. This allows us to remove the MAX_SERIAL_PORTS define. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-id: 20180420145249.32435-14-peter.maydell@linaro.org	2018-04-26 13:58:29 +01:00
Peter Maydell	9bca0edb28	Change references to serial_hds[] to serial_hd() Change all the uses of serial_hds[] to go via the new serial_hd() function. Code change produced with: find hw -name '.[ch]' \| xargs sed -i -e 's/serial_hds\[$[^]]$\]/serial_hd(\1)/g' Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Message-id: 20180420145249.32435-8-peter.maydell@linaro.org	2018-04-26 13:57:00 +01:00
Alexey Kardashevskiy	127f03e442	spapr: Initialize reserved areas list in FDT in H_CAS handler At the moment the device tree produced by the H_CAS handler has no reserved map initialized at all which is not correct as at least one empty record is required to be present as a marker of the end. This does not cause problems now as the only consumer is SLOF which does not look at the reserved map area. However when DTC's "Improve libfdt's memory safety" changeset hits the QEMU upstream, there will be errors reported and crashes observed. This fixes the problem by adding an empty entry to the reserved map, just like create_device_tree() does already. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-04-10 10:05:38 +10:00
Peter Maydell	ed627b2ad3	virtio,vhost,pci,pc: features, cleanups SRAT tables for DIMM devices new virtio net flags for speed/duplex post-copy migration support in vhost cleanups in pci Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJasR1rAAoJECgfDbjSjVRpOocH/R9A3g/TkpGjmLzJBrrX1NGO I/iq0ttHjqg4OBIChA4BHHjXwYUMs7XQn26B3efrk1otLAJhuqntZIIo3uU0WraA 5J+4DT46ogs5rZWNzDCZ0zAkSaATDA6h9Nfh7TvPc9Q2WpcIT0cTa/jOtrxRc9Vq 32hbUKtJSpNxRjwbZvk6YV21HtWo3Tktdaj9IeTQTN0/gfMyOMdgxta3+bymicbJ FuF9ybHcpXvrEctHhXHIL4/YVGEH/4shagZ4JVzv1dVdLeHLZtPomdf7+oc0+07m Qs+yV0HeRS5Zxt7w5blGLC4zDXczT/bUx8oln0Tz5MV7RR/+C2HwMOHC69gfpSc= =vomK -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging virtio,vhost,pci,pc: features, cleanups SRAT tables for DIMM devices new virtio net flags for speed/duplex post-copy migration support in vhost cleanups in pci Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # gpg: Signature made Tue 20 Mar 2018 14:40:43 GMT # gpg: using RSA key 281F0DB8D28D5469 # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * remotes/mst/tags/for_upstream: (51 commits) postcopy shared docs libvhost-user: Claim support for postcopy postcopy: Allow shared memory vhost: Huge page align and merge vhost+postcopy: Wire up POSTCOPY_END notify vhost-user: Add VHOST_USER_POSTCOPY_END message libvhost-user: mprotect & madvises for postcopy vhost+postcopy: Call wakeups vhost+postcopy: Add vhost waker postcopy: postcopy_notify_shared_wake postcopy: helper for waking shared vhost+postcopy: Resolve client address postcopy-ram: add a stub for postcopy_request_shared_page vhost+postcopy: Helper to send requests to source for shared pages vhost+postcopy: Stash RAMBlock and offset vhost+postcopy: Send address back to qemu libvhost-user+postcopy: Register new regions with the ufd migration/ram: ramblock_recv_bitmap_test_byte_offset postcopy+vhost-user: Split set_mem_table for postcopy vhost+postcopy: Transmit 'listen' to slave ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org> # Conflicts: # scripts/update-linux-headers.sh	2018-03-20 15:48:34 +00:00
Haozhong Zhang	52c95cae4e	pc-dimm: make qmp_pc_dimm_device_list() sort devices by address Make qmp_pc_dimm_device_list() return sorted by start address list of devices so that it could be reused in places that would need sorted list. Reuse existing pc_dimm_built_list() to get sorted list. While at it hide recursive callbacks from callers, so that: qmp_pc_dimm_device_list(qdev_get_machine(), &list); could be replaced with simpler: list = qmp_pc_dimm_device_list(); follow up patch will use it in build_srat() Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> for ppc part Reviewed-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2018-03-20 03:34:52 +02:00
Thomas Huth	3c3a4e7afa	hw/ppc/spapr: Allow "spapr-vlan" as NIC model name beside "ibmveth" With the new "--nic" command line parameter option, the "old" way of specifying a NIC model via the nd_table[] is becoming more prominent again. But for the pseries "spapr-vlan" device, there is a confusing discrepancy between the model name that is used for "--device" (i.e. "spapr-vlan") and the model name that has to be used for "--net nic" or the new "--nic" parameter (i.e. "ibmveth"). Since "spapr-vlan" is the "real" name of the device, let's allow "spapr-vlan" to be used as model name for the nd_table[] entries, too. Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-03-18 18:27:23 +11:00
Alexey Kardashevskiy	fcad0d2121	ppc/spapr, vfio: Turn off MSIX emulation for VFIO devices This adds a possibility for the platform to tell VFIO not to emulate MSIX so MMIO memory regions do not get split into chunks in flatview and the entire page can be registered as a KVM memory slot and make direct MMIO access possible for the guest. This enables the entire MSIX BAR mapping to the guest for the pseries platform in order to achieve the maximum MMIO preformance for certain devices. Tested on: LSI Logic / Symbios Logic SAS3008 PCI-Express Fusion-MPT SAS-3 (rev 02) Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2018-03-13 11:17:31 -06:00
Nikunj A Dadhania	90ee4e01a1	hw/ppc/spapr,e500: Use new property "stdout-path" for boot console Linux kernel commit 2a9d832cc9aae21ea827520fef635b6c49a06c6d (of: Add bindings for chosen node, stdout-path) deprecated chosen property "linux,stdout-path" and "stdout". Introduce the new property "stdout-path" and continue supporting the older property to remain compatible with existing/older firmware. This older property can be deprecated after 5 years. Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-03-06 13:16:29 +11:00
Suraj Jitindar Singh	813f3cf655	ppc/spapr-caps: Define the pseries-2.12-sxxm machine type The sxxm (speculative execution exploit mitigation) machine type is a variant of the 2.12 machine type with workarounds for speculative execution vulnerabilities enabled by default. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-03-06 13:16:29 +11:00
Greg Kurz	1a5008fc17	spapr: harden code that depends on VSMT VSMT must be set in order to compute VCPU ids. This means that the following functions must not be called before spapr_set_vsmt_mode() was called: - spapr_vcpu_id() - spapr_is_thread0_in_vcore() - xics_max_server_number() We had a recent regression where the latter would be called before VSMT was set, and broke migration of some old machine types. This patch adds assert() in the above functions to avoid problems in the future. Also, since VSMT is really a CPU related thing, spapr_set_vsmt_mode() is now called from spapr_init_cpus(), just before the first VSMT user. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-03-06 13:16:29 +11:00
Greg Kurz	72fdd4de8e	spapr: register dummy ICPs later Some older machine types create more ICPs than needed. We hence need to register up to xics_max_server_number() dummy ICPs to accomodate the migration of these machine types. Recent VSMT rework changed xics_max_server_number() to return DIV_ROUND_UP(max_cpus * spapr->vsmt, smp_threads) instead of DIV_ROUND_UP(max_cpus * kvmppc_smt_threads(), smp_threads); The change is okay but it requires spapr->vsmt to be set, which isn't the case with the current code. This causes the formula to return zero and we don't create dummy ICPs. This breaks migration of older guests as reported here: https://bugzilla.redhat.com/show_bug.cgi?id=1549087 The dummy ICP workaround doesn't really have a dependency on XICS itself. But it does depend on proper VCPU id numbering and it must be applied before creating vCPUs (ie, creating real ICPs). So this patch moves the workaround to spapr_init_cpus(), which already assumes VSMT to be set. Fixes: `72194664c8` ("spapr: use spapr->vsmt to compute VCPU ids") Reported-by: Lukas Doktor <ldoktor@redhat.com> Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-03-06 13:16:29 +11:00
Greg Kurz	b1a568c1c2	spapr: fix missing CPU core nodes in DT when running with TCG Commit `5d0fb1508e` "spapr: consolidate the VCPU id numbering logic in a single place" introduced a helper to detect thread0 of a virtual core based on its VCPU id. This is used to create CPU core nodes in the DT, but it is broken in TCG. $ qemu-system-ppc64 -nographic -accel tcg -machine dumpdtb=dtb.bin \ -smp cores=16,maxcpus=16,threads=1 $ dtc -f -O dts dtb.bin \| grep POWER8 PowerPC,POWER8@0 { PowerPC,POWER8@8 { instead of the expected 16 cores that we get with KVM: $ dtc -f -O dts dtb.bin \| grep POWER8 PowerPC,POWER8@0 { PowerPC,POWER8@8 { PowerPC,POWER8@10 { PowerPC,POWER8@18 { PowerPC,POWER8@20 { PowerPC,POWER8@28 { PowerPC,POWER8@30 { PowerPC,POWER8@38 { PowerPC,POWER8@40 { PowerPC,POWER8@48 { PowerPC,POWER8@50 { PowerPC,POWER8@58 { PowerPC,POWER8@60 { PowerPC,POWER8@68 { PowerPC,POWER8@70 { PowerPC,POWER8@78 { This happens because spapr_get_vcpu_id() maps VCPU ids to cs->cpu_index in TCG mode. This confuses the code in spapr_is_thread0_in_vcore(), since it assumes thread0 VCPU ids to have a spapr->vsmt spacing. spapr_get_vcpu_id(cpu) % spapr->vsmt == 0 Actually, there's no real reason to expose cs->cpu_index instead of the VCPU id, since we also generate it with TCG. Also we already set it explicitly in spapr_set_vcpu_id(), so there's no real reason either to call kvm_arch_vcpu_id() with KVM. This patch unifies spapr_get_vcpu_id() to always return the computed VCPU id both in TCG and KVM. This is one step forward towards KVM<->TCG migration. Fixes: `5d0fb1508e` Reported-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-03-06 13:16:29 +11:00
Greg Kurz	5d0fb1508e	spapr: consolidate the VCPU id numbering logic in a single place Several places in the code need to calculate a VCPU id: (cpu_index / smp_threads) * spapr->vsmt + cpu_index % smp_threads (core_id / smp_threads) * spapr->vsmt (1 user) index * spapr->vsmt (2 users) or guess that the VCPU id of a given VCPU is the first thread of a virtual core: index % spapr->vsmt != 0 Even if the numbering logic isn't that complex, it is rather fragile to have these assumptions open-coded in several places. FWIW this was proved with recent issues related to VSMT. This patch moves the VCPU id formula to a single function to be called everywhere the code needs to compute one. It also adds an helper to guess if a VCPU is the first thread of a VCORE. Signed-off-by: Greg Kurz <groug@kaod.org> [dwg: Rename spapr_is_vcore() to spapr_is_thread0_in_vcore() for clarity] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-02-16 12:14:26 +11:00
Greg Kurz	14bb4486c8	spapr: rename spapr_vcpu_id() to spapr_get_vcpu_id() The spapr_vcpu_id() function is an accessor actually. Let's rename it for symmetry with the recently added spapr_set_vcpu_id() helper. The motivation behind this is that a later patch will consolidate the VCPU id formula in a function and spapr_vcpu_id looks like an appropriate name. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-02-16 12:14:26 +11:00
Greg Kurz	648edb6475	spapr: move VCPU calculation to core machine code The VCPU ids are currently computed and assigned to each individual CPU threads in spapr_cpu_core_realize(). But the numbering logic of VCPU ids is actually a machine-level concept, and many places in hw/ppc/spapr.c also have to compute VCPU ids out of CPU indexes. The current formula used in spapr_cpu_core_realize() is: vcpu_id = (cc->core_id * spapr->vsmt / smp_threads) + i where: cc->core_id is a multiple of smp_threads cpu_index = cc->core_id + i 0 <= i < smp_threads So we have: cpu_index % smp_threads == i cc->core_id / smp_threads == cpu_index / smp_threads hence: vcpu_id = (cpu_index / smp_threads) * spapr->vsmt + cpu_index % smp_threads; This formula was used before VSMT at the time VCPU ids where computed at the target emulation level. It has the advantage of being useable to derive a VPCU id out of a CPU index only. It is fitted for all the places where the machine code has to compute a VCPU id. This patch introduces an accessor to set the VCPU id in a PowerPCCPU object using the above formula. It is a first step to consolidate all the VCPU id logic in a single place. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-02-16 12:14:26 +11:00
Greg Kurz	72194664c8	spapr: use spapr->vsmt to compute VCPU ids Since the introduction of VSMT in 2.11, the spacing of VCPU ids between cores is controllable through a machine property instead of being only dictated by the SMT mode of the host: cpu->vcpu_id = (cc->core_id * spapr->vsmt / smp_threads) + i Until recently, the machine code would try to change the SMT mode of the host to be equal to VSMT or exit. This allowed the rest of the code to assume that kvmppc_smt_threads() == spapr->vsmt is always true. Recent commit "8904e5a75005 spapr: Adjust default VSMT value for better migration compatibility" relaxed the rule. If the VSMT mode cannot be set in KVM for some reasons, but the requested CPU topology is compatible with the current SMT mode, then we let the guest run with kvmppc_smt_threads() != spapr->vsmt. This breaks quite a few places in the code, in particular when calculating DRC indexes. This is what happens on a POWER host with subcores-per-core=2 (ie, supports up to SMT4) when passing the following topology: -smp threads=4,maxcpus=16 \ -device host-spapr-cpu-core,core-id=4,id=core1 \ -device host-spapr-cpu-core,core-id=8,id=core2 qemu-system-ppc64: warning: Failed to set KVM's VSMT mode to 8 (errno -22) This is expected since KVM is limited to SMT4, but the guest is started anyway because this topology can run on SMT4 even with a VSMT8 spacing. But when we look at the DT, things get nastier: cpus { ... ibm,drc-indexes = <0x4 0x10000000 0x10000004 0x10000008 0x1000000c>; This means that we have the following association: CPU core device \| DRC \| VCPU id -----------------+------------+--------- boot core \| 0x10000000 \| 0 core1 \| 0x10000004 \| 4 core2 \| 0x10000008 \| 8 core3 \| 0x1000000c \| 12 But since the spacing of VCPU ids is 8, the DRC for core1 points to a VCPU that doesn't exist, the DRC for core2 points to the first VCPU of core1 and and so on... ... PowerPC,POWER8@0 { ... ibm,my-drc-index = <0x10000000>; ... }; PowerPC,POWER8@8 { ... ibm,my-drc-index = <0x10000008>; ... }; PowerPC,POWER8@10 { ... No ibm,my-drc-index property for this core since 0x10000010 doesn't exist in ibm,drc-indexes above. ... }; }; ... interrupt-controller { ... ibm,interrupt-server-ranges = <0x0 0x10>; With a spacing of 8, the highest VCPU id for the given topology should be: 16 * 8 / 4 = 32 and not 16 ... linux,phandle = <0x7e7323b8>; interrupt-controller; }; And CPU hot-plug/unplug is broken: (qemu) device_del core1 pseries-hotplug-cpu: Cannot find CPU (drc index 10000004) to remove (qemu) device_del core2 cpu 4 (hwid 8) Ready to die... cpu 5 (hwid 9) Ready to die... cpu 6 (hwid 10) Ready to die... cpu 7 (hwid 11) Ready to die... These are the VCPU ids of core1 actually (qemu) device_add host-spapr-cpu-core,core-id=12,id=core3 (qemu) device_del core3 pseries-hotplug-cpu: Cannot find CPU (drc index 1000000c) to remove This patches all the code in hw/ppc/spapr.c to assume the VSMT spacing when manipulating VCPU ids. Fixes: `8904e5a750` Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-02-16 12:14:26 +11:00
Laurent Vivier	4ad64cbd0c	spapr: set vsmt to MAX(8, smp_threads) We ignore silently the value of smp_threads when we set the default VSMT value, and if smp_threads is greater than VSMT kernel is going into trouble later. Fixes: `8904e5a750` ("spapr: Adjust default VSMT value for better migration compatibility") Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-02-10 20:22:02 +11:00
Daniel Henrique Barboza	b472b1a727	hw/ppc: rename functions in comments Commit `bcb5ce08cf` ("spapr: Rename machine init functions for clarity") renamed ppc_spapr_reset to spapr_machine_reset and ppc_spapr_init to spapr_machine_init. Let's also rename the references in comments. Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-02-10 12:17:17 +11:00
Markus Armbruster	abb297ed44	Include qmp-commands.h exactly where needed Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180201111846.21846-7-armbru@redhat.com> [OSX breakage fixed]	2018-02-09 13:52:10 +01:00
Suraj Jitindar Singh	4be8d4e7d9	target/ppc/spapr_caps: Add new tristate cap safe_indirect_branch Add new tristate cap cap-ibs to represent the indirect branch serialisation capability. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-01-29 14:24:55 +11:00
Suraj Jitindar Singh	09114fd817	target/ppc/spapr_caps: Add new tristate cap safe_bounds_check Add new tristate cap cap-sbbc to represent the speculation barrier bounds checking capability. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-01-29 14:24:55 +11:00
Suraj Jitindar Singh	8f38eaf8f9	target/ppc/spapr_caps: Add new tristate cap safe_cache Add new tristate cap cap-cfpc to represent the cache flush on privilege change capability. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-01-29 14:24:55 +11:00
Greg Kurz	9012a53f06	spapr: fix device tree properties when using compatibility mode Commit `51f84465dd` changed the compatility mode setting logic: - machine reset only sets compatibility mode for the boot CPU - compatibility mode is set for other CPUs when they are put online by the guest with the "start-cpu" RTAS call This causes a regression for machines started with max-compat-cpu: the device tree nodes related to secondary CPU cores contain wrong "cpu-version" and "ibm,pa-features" values, as shown below. Guest started on a POWER8 host with: -smp cores=2 -machine pseries,max-cpu-compat=compat7 ibm,pa-features = [18 00 f6 3f c7 c0 80 f0 80 00 00 00 00 00 00 00 00 00 80 00 80 00 80 00 00 00]; cpu-version = <0x4d0200>; ^^^ second CPU core ibm,pa-features = <0x600f63f 0xc70080c0>; cpu-version = <0xf000003>; ^^^ boot CPU core The second core is advertised in raw POWER8 mode. This happens because CAS assumes all CPUs to have the same compatibility mode. Since the boot CPU already has the requested compatibility mode, the CAS code does not set it for the secondary one, and exposes the bogus device tree properties in in the CAS response to the guest. A similar situation is observed when hot-plugging a CPU core. The related device tree properties are generated and exposed to guest with the "ibm,configure-connector" RTAS before "start-cpu" is called. The CPU core is advertised to the guest in raw mode as well. It both cases, it boils down to the fact that "start-cpu" happens too late. This can be fixed globally by propagating the compatibility mode of the boot CPU to the other CPUs during reset. For this to work, the compatibility mode of the boot CPU must be set before the machine code actually resets all CPUs. It is not needed to set the compatibility mode in "start-cpu" anymore, so the code is dropped. Fixes: `51f84465dd` Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-01-20 17:15:05 +11:00
Greg Kurz	bc8772835f	spapr: drop duplicate variable in spapr_core_plug() A variable is already defined at the begining of the function to hold a pointer to the CPU core object: sPAPRCPUCore *core = SPAPR_CPU_CORE(OBJECT(dev)); No need to define it again in the pre-2.10 compatibility code snipplet. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-01-20 17:15:05 +11:00
Igor Mammedov	d342eb7662	possible_cpus: add CPUArchId::type field Remove dependency of possible_cpus on 1st CPU instance, which decouples configuration data from CPU instances that are created using that data. Also later it would be used for enabling early cpu to numa node configuration at runtime qmp_query_hotpluggable_cpus() should provide a list of available cpu slots at early stage, before machine_init() is called and the 1st cpu is created, so that mgmt might be able to call it and use output to set numa mapping. Use MachineClass::possible_cpu_arch_ids() callback to set cpu type info, along with the rest of possible cpu properties, to let machine define which cpu type* will be used. * for SPAPR it will be a spapr core type and for ARM/s390x/x86 a respective descendant of CPUClass. Move parse_numa_opts() in vl.c after cpu_model is parsed into cpu_type so that possible_cpu_arch_ids() would know which cpu_type to use during layout initialization. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <1515597770-268979-1-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2018-01-19 11:18:51 -02:00
Eduardo Habkost	7da79a167a	spapr: Allow only supported dynamic sysbus devices TYPE_SPAPR_PCI_HOST_BRIDGE is the only dynamic sysbus device not rejected by ppc_spapr_reset(), so it can be the only entry on the allowed list. Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Alexander Graf <agraf@suse.de> Cc: qemu-ppc@nongnu.org Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20171125151610.20547-5-ehabkost@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2018-01-19 11:18:51 -02:00
Eduardo Habkost	0bd1909da6	machine: Replace has_dynamic_sysbus with list of allowed devices The existing has_dynamic_sysbus flag makes the machine accept every user-creatable sysbus device type on the command-line. Replace it with a list of allowed device types, so machines can easily accept some sysbus devices while rejecting others. To keep exactly the same behavior as before, the existing has_dynamic_sysbus=true assignments are replaced with a TYPE_SYS_BUS_DEVICE entry on the allowed list. Other patches will replace the TYPE_SYS_BUS_DEVICE entries with more specific lists of devices. Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Marcel Apfelbaum <marcel@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Alexander Graf <agraf@suse.de> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Anthony Perard <anthony.perard@citrix.com> Cc: qemu-arm@nongnu.org Cc: qemu-ppc@nongnu.org Cc: xen-devel@lists.xenproject.org Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20171125151610.20547-2-ehabkost@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Marcel Apfelbaum <marcel@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2018-01-19 11:18:51 -02:00
David Gibson	8904e5a750	spapr: Adjust default VSMT value for better migration compatibility `fa98fbfc` "PC: KVM: Support machine option to set VSMT mode" introduced the "vsmt" parameter for the pseries machine type, which controls the spacing of the vcpu ids of thread 0 for each virtual core. This was done to bring some consistency and stability to how that was done, while still allowing backwards compatibility for migration and otherwise. The default value we used for vsmt was set to the max of the host's advertised default number of threads and the number of vthreads per vcore in the guest. This was done to continue running without extra parameters on older KVM versions which don't allow the VSMT value to be changed. Unfortunately, even that smaller than before leakage of host configuration into guest visible configuration still breaks things. Specifically a guest with 4 (or less) vthread/vcore will get a different vsmt value when running on a POWER8 (vsmt==8) and POWER9 (vsmt==4) host. That means the vcpu ids don't line up so you can't migrate between them, though you should be able to. Long term we really want to make vsmt == smp_threads for sufficiently new machine types. However, that means that qemu will then require a sufficiently recent KVM (one which supports changing VSMT) - that's still not widely enough deployed to be really comfortable to do. In the meantime we need some default that will work as often as possible. This patch changes that default to 8 in all circumstances. This does change guest visible behaviour (including for existing machine versions) for many cases - just not the most common/important case. Following is case by case justification for why this is still the least worst option. Note that any of the old behaviours can still be duplicated after this patch, it's just that it requires manual intervention by setting the vsmt property on the command line. KVM HV on POWER8 host: This is the overwhelmingly common case in production setups, and is unchanged by design. POWER8 hosts will advertise a default VSMT mode of 8, and > 8 vthreads/vcore isn't permitted KVM HV on POWER7 host: Will break, but POWER7s allowing KVM were never released to the public. KVM HV on POWER9 host: Not yet released to the public, breaking this now will reduce other breakage later. KVM HV on PowerPC 970: Will theoretically break it, but it was barely supported to begin with and already required various user visible hacks to work. Also so old that I just don't care. TCG: This is the nastiest one; it means migration of TCG guests (without manual vsmt setting) will break. Since TCG is rarely used in production I think this is worth it for the other benefits. It does also remove one more barrier to TCG<->KVM migration which could be interesting for debugging applications. KVM PR: As with TCG, this will break migration of existing configurations, without adding extra manual vsmt options. As with TCG, it is rare in production so I think the benefits outweigh breakages. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Jose Ricardo Ziviani <joserz@linux.vnet.ibm.com> Reviewed-by: Greg Kurz <groug@kaod.org>	2018-01-17 09:35:24 +11:00
David Gibson	1f20f2e0ee	spapr: Allow some cases where we can't set VSMT mode in the kernel At present if we require a vsmt mode that's not equal to the kernel's default, and the kernel doesn't let us change it (e.g. because it's an old kernel without support) then we always fail. But in fact we can cope with the kernel having a different vsmt as long as a) it's >= the actual number of vthreads/vcore (so that guest threads that are supposed to be on the same core act like it) b) it's a submultiple of the requested vsmt mode (so that guest threads spaced by the vsmt value will act like they're on different cores) Allowing this case gives us a bit more freedom to adjust the vsmt behaviour without breaking existing cases. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Tested-by: Greg Kurz <groug@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org>	2018-01-17 09:35:24 +11:00
David Gibson	abbc124753	target/ppc: Clarify compat mode max_threads value We recently had some discussions that were sidetracked for a while, because nearly everyone misapprehended the purpose of the 'max_threads' field in the compatiblity modes table. It's all about guest expectations, not host expectations or support (that's handled elsewhere). In an attempt to avoid a repeat of that confusion, rename the field to 'max_vthreads' and add an explanatory comment. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Jose Ricardo Ziviani <joserz@linux.vnet.ibm.com>	2018-01-17 09:35:24 +11:00
Suraj Jitindar Singh	4e5fe3688e	hw/ppc/spapr_caps: Rework spapr_caps to use uint8 internal representation Currently spapr_caps are tied to boolean values (on or off). This patch reworks the caps so that they can have any uint8 value. This allows more capabilities with various values to be represented in the same way internally. Capabilities are numbered in ascending order. The internal representation of capability values is an array of uint8s in the sPAPRMachineState, indexed by capability number. Capabilities can have their own name, description, options, getter and setter functions, type and allow functions. They also each have their own section in the migration stream. Capabilities are only migrated if they were explictly set on the command line, with the assumption that otherwise the default will match. On migration we ensure that the capability value on the destination is greater than or equal to the capability value from the source. So long at this remains the case then the migration is considered compatible and allowed to continue. This patch implements generic getter and setter functions for boolean capabilities. It also converts the existings cap-htm, cap-vsx and cap-dfp capabilities to this new format. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-01-17 09:35:24 +11:00
David Gibson	2d1fb9bc8e	spapr: Handle Decimal Floating Point (DFP) as an optional capability Decimal Floating Point has been available on POWER7 and later (server) cpus. However, it can be disabled on the hypervisor, meaning that it's not available to guests. We currently handle this by conditionally advertising DFP support in the device tree depending on whether the guest CPU model supports it - which can also depend on what's allowed in the host for -cpu host. That can lead to confusion on migration, since host properties are silently affecting guest visible properties. This patch handles it by treating it as an optional capability for the pseries machine type. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org>	2018-01-17 09:35:24 +11:00
David Gibson	2938664286	spapr: Handle VMX/VSX presence as an spapr capability flag We currently have some conditionals in the spapr device tree code to decide whether or not to advertise the availability of the VMX (aka Altivec) and VSX vector extensions to the guest, based on whether the guest cpu has those features. This can lead to confusion and subtle failures on migration, since it makes a guest visible change based only on host capabilities. We now have a better mechanism for this, in spapr capabilities flags, which explicitly depend on user options rather than host capabilities. Rework the advertisement of VSX and VMX based on a new VSX capability. We no longer bother with a conditional for VMX support, because every CPU that's ever been supported by the pseries machine type supports VMX. NOTE: Some userspace distributions (e.g. RHEL7.4) already rely on availability of VSX in libc, so using cap-vsx=off may lead to a fatal SIGILL in init. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org>	2018-01-17 09:35:24 +11:00
David Gibson	be85537d65	spapr: Validate capabilities on migration Now that the "pseries" machine type implements optional capabilities (well, one so far) there's the possibility of having different capabilities available at either end of a migration. Although arguably a user error, it would be nice to catch this situation and fail as gracefully as we can. This adds code to migrate the capabilities flags. These aren't pulled directly into the destination's configuration since what the user has specified on the destination command line should take precedence. However, they are checked against the destination capabilities. If the source was using a capability which is absent on the destination, we fail the migration, since that could easily cause a guest crash or other bad behaviour. If the source lacked a capability which is present on the destination we warn, but allow the migration to proceed. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org>	2018-01-17 09:35:24 +11:00
David Gibson	ee76a09fc7	spapr: Treat Hardware Transactional Memory (HTM) as an optional capability This adds an spapr capability bit for Hardware Transactional Memory. It is enabled by default for pseries-2.11 and earlier machine types. with POWER8 or later CPUs (as it must be, since earlier qemu versions would implicitly allow it). However it is disabled by default for the latest pseries-2.12 machine type. This means that with the latest machine type, HTM will not be available, regardless of CPU, unless it is explicitly enabled on the command line. That change is made on the basis that: * This way running with -M pseries,accel=tcg will start with whatever cpu and will provide the same guest visible model as with accel=kvm. - More specifically, this means existing make check tests don't have to be modified to use cap-htm=off in order to run with TCG * We hope to add a new "HTM without suspend" feature in the not too distant future which could work on both POWER8 and POWER9 cpus, and could be enabled by default. * Best guesses suggest that future POWER cpus may well only support the HTM-without-suspend model, not the (frankly, horribly overcomplicated) POWER8 style HTM with suspend. * Anecdotal evidence suggests problems with HTM being enabled when it wasn't wanted are more common than being missing when it was. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org>	2018-01-17 09:35:24 +11:00
David Gibson	33face6b89	spapr: Capabilities infrastructure Because PAPR is a paravirtual environment access to certain CPU (or other) facilities can be blocked by the hypervisor. PAPR provides ways to advertise in the device tree whether or not those features are available to the guest. In some places we automatically determine whether to make a feature available based on whether our host can support it, in most cases this is based on limitations in the available KVM implementation. Although we correctly advertise this to the guest, it means that host factors might make changes to the guest visible environment which is bad: as well as generaly reducing reproducibility, it means that a migration between different host environments can easily go bad. We've mostly gotten away with it because the environments considered mature enough to be well supported (basically, KVM on POWER8) have had consistent feature availability. But, it's still not right and some limitations on POWER9 is going to make it more of an issue in future. This introduces an infrastructure for defining "sPAPR capabilities". These are set by default based on the machine version, masked by the capabilities of the chosen cpu, but can be overriden with machine properties. The intention is at reset time we verify that the requested capabilities can be supported on the host (considering TCG, KVM and/or host cpu limitations). If not we simply fail, rather than silently modifying the advertised featureset to the guest. This does mean that certain configurations that "worked" may now fail, but such configurations were already more subtly broken. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org>	2018-01-17 09:35:24 +11:00
David Gibson	51f84465dd	spapr: Correct compatibility mode setting for hotplugged CPUs Currently the pseries machine sets the compatibility mode for the guest's cpus in two places: 1) at machine reset and 2) after CAS negotiation. This means that if we set or negotiate a compatiblity mode, then hotplug a cpu, the hotplugged cpu doesn't get the right mode set and will incorrectly have the full native features. To correct this, we set the compatibility mode on a cpu when it is brought online with the 'start-cpu' RTAS call. Given that we no longer need to set the compatibility mode on all CPUs at machine reset, so we change that to only set the mode for the boot cpu. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Tested-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>	2018-01-10 12:53:00 +11:00
Laurent Vivier	1481fe5fcf	spapr: don't initialize PATB entry if max-cpu-compat < power9 if KVM is enabled and KVM capabilities MMU radix is available, the partition table entry (patb_entry) for the radix mode is initialized by default in ppc_spapr_reset(). It's a problem if we want to migrate the guest to a POWER8 host while the kernel is not started to set the value to the one expected for a POWER8 CPU. The "-machine max-cpu-compat=power8" should allow to migrate a POWER9 KVM host to a POWER8 KVM host, but because patb_entry is set, the destination QEMU tries to enable radix mode on the POWER8 host. This fails and cancels the migration: Process table config unsupported by the host error while loading state for instance 0x0 of device 'spapr' load of migration failed: Invalid argument This patch doesn't set the PATB entry if the user provides a CPU compatibility mode that doesn't support radix mode. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-12-15 09:50:29 +11:00
David Gibson	4f441474c6	spapr: Assume msi_nonbroken We conditionally adjust part of the guest device tree based on the global msi_nonbroken flag. However, the main machine type code initializes msi_nonbroken to true and there's nothing that would set it to false again. So replace the test with an assert(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>	2017-12-15 09:49:24 +11:00
David Gibson	bcb5ce08cf	spapr: Rename machine init functions for clarity Machine objects have two init functions - the generic QOM level instance_init which should only do static object initialization, and the Machine specific MachineClass::init which does the actual construction of the machine. In spapr the functions implementing these two have names - ppc_machine_initfn() and ppc_spapr_init() - which don't correspond closely to either of those. To prevent people (read, me) from confusing which is which, rename them spapr_instance_init() and spapr_machine_init() to make it clearer which is which. While we're there rename ppc_spapr_reset() to spapr_machine_reset() to match. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>	2017-12-15 09:49:24 +11:00
Igor Mammedov	f47bd1c839	spapr: replace numa_get_node() with lookup in pc-dimm list SPAPR is the last user of numa_get_node() and a bunch of supporting code to maintain numa_info[x].addr list. Get LMB node id from pc-dimm list, which allows to remove ~80LOC maintaining dynamic address range lookup list. It also removes pc-dimm dependency on numa_[un]set_mem_node_id() and makes pc-dimms a sole source of information about which node it belongs to and removes duplicate data from global numa_info. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-12-15 09:49:24 +11:00
Cédric Le Goater	7718375584	spapr: introduce a spapr_qirq() helper xics_get_qirq() is only used by the sPAPR machine. Let's move it there and change its name to reflect its scope. It will be useful for XIVE support which will use its own set of qirqs. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-12-15 09:49:24 +11:00
Cédric Le Goater	9e7dc5fc2e	spapr: introduce a spapr_irq_set_lsi() helper It will make synchronisation easier with the XIVE interrupt mode when available. The 'irq' parameter refers to the global IRQ number space. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-12-15 09:49:24 +11:00
Cédric Le Goater	60c6823b9b	spapr: move the IRQ allocation routines under the machine Also change the prototype to use a sPAPRMachineState and prefix them with spapr_irq_. It will let us synchronise the IRQ allocation with the XIVE interrupt mode when available. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-12-15 09:49:24 +11:00
Greg Kurz	94ad93bd97	spapr_cpu_core: instantiate CPUs separately The current code assumes that only the CPU core object holds a reference on each individual CPU object, and happily frees their allocated memory when the core is unrealized. This is dangerous as some other code can legitimely keep a pointer to a CPU if it calls object_ref(), but it would end up with a dangling pointer. Let's allocate all CPUs with object_new() and let QOM free them when their reference count reaches zero. This greatly simplify the code as we don't have to fiddle with the instance size anymore. Signed-off-by: Greg Kurz <groug@kaod.org> Acked-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-12-15 09:49:23 +11:00
David Gibson	2b6154120c	spapr: Add pseries-2.12 machine type While we're at it fix a couple of small errors in the 2.11 and 2.10 models (they didn't have any real effect, but don't quite match the template). Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-12-15 09:49:23 +11:00
David Gibson	768a20f3a4	spapr: Include "pre-plugged" DIMMS in ram size calculation at reset At guest reset time, we allocate a hash page table (HPT) for the guest based on the guest's RAM size. If dynamic HPT resizing is not available we use the maximum RAM size, if it is we use the current RAM size. But the "current RAM size" calculation is incorrect - we just use the "base" ram_size from the machine structure. This doesn't include any pluggable DIMMs that are already plugged at reset time. This means that if you try to start a 'pseries' machine with a DIMM specified on the command line that's much larger than the "base" RAM size, then the guest will get a woefully inadequate HPT. This can lead to a guest freeze during boot as it runs out of HPT space during initial MMU setup. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Tested-by: Greg Kurz <groug@kaod.org>	2017-12-04 11:31:22 +11:00
Laurent Vivier	0c86b2df78	pseries: fix TCG migration Migration of pseries is broken with TCG because QEMU tries to restore KVM MMU state unconditionally. The result is a SIGSEGV in kvm_vm_ioctl(): #0 kvm_vm_ioctl (s=0x0, type=-2146390353) at qemu/accel/kvm/kvm-all.c:2032 #1 0x00000001003e3e2c in kvmppc_configure_v3_mmu (cpu=<optimized out>, radix=<optimized out>, gtse=<optimized out>, proc_tbl=<optimized out>) at qemu/target/ppc/kvm.c:396 #2 0x00000001002f8b88 in spapr_post_load (opaque=0x1019103c0, version_id=<optimized out>) at qemu/hw/ppc/spapr.c:1578 #3 0x000000010059e4cc in vmstate_load_state (f=0x106230000, vmsd=0x1009479e0 <vmstate_spapr>, opaque=0x1019103c0, version_id=<optimized out>) at qemu/migration/vmstate.c:165 #4 0x00000001005987e0 in vmstate_load (f=<optimized out>, se=<optimized out>) at qemu/migration/savevm.c:748 This patch fixes the problem by not calling the KVM function with the TCG mode. Fixes: `d39c90f5f3` ("spapr: Fix migration of Radix guests") Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-11-30 13:57:51 +11:00
Suraj Jitindar Singh	ee4d9ecc36	target/ppc: Move setting of patb_entry on hash table init The patb_entry is used to store the location of the process table in guest memory. The msb is also used to indicate the mmu mode of the guest, that is patb_entry & 1 << 63 ? radix_mode : hash_mode. Currently we set this to zero in spapr_setup_hpt_and_vrma() since if this function gets called then we know we're hash. However some code paths, such as setting up the hpt on incoming migration of a hash guest, call spapr_reallocate_hpt() directly bypassing this higher level function. Since we assume radix if the host is capable this results in the msb in patb_entry being left set so in spapr_post_load() we call kvmppc_configure_v3_mmu() and tell the host we're radix which as expected means addresses cannot be translated once we actually run the cpu. To fix this move the zeroing of patb_entry into spapr_reallocate_hpt(). Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-11-27 12:20:11 +11:00
Thomas Huth	bac658d1a4	hw/ppc/spapr: Fix virtio-scsi bootindex handling for LUNs >= 256 LUNs >= 256 have to be encoded with the so-called "flat space addressing method" for virtio-scsi, where an additional bit has to be set. SLOF already took care of this with the following commit: https://git.qemu.org/?p=SLOF.git;a=commitdiff;h=f72a37713fea47da (see https://bugzilla.redhat.com/show_bug.cgi?id=1431584 for details) But QEMU does not use this encoding yet for device tree paths that have to be handed over to SLOF to deal with the "bootindex" property, so SLOF currently fails to boot from virtio-scsi devices with LUNs >= 256 in the right boot order. Fix it by using the bit to indicate the "flat space addressing method" for LUNs >= 256. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-11-22 15:28:37 +11:00
Greg Kurz	8251248394	spapr: reset DRCs after devices A DRC with a pending unplug request releases its associated device at machine reset time. In the case of LMB, when all DRCs for a DIMM device have been reset, the DIMM gets unplugged, causing guest memory to disappear. This may be very confusing for anything still using this memory. This is exactly what happens with vhost backends, and QEMU aborts with: qemu-system-ppc64: used ring relocated for ring 2 qemu-system-ppc64: qemu/hw/virtio/vhost.c:649: vhost_commit: Assertion `r >= 0' failed. The issue is that each DRC registers a QEMU reset handler, and we don't control the order in which these handlers are called (ie, a LMB DRC will unplug a DIMM before the virtio device using the memory on this DIMM could stop its vhost backend). To avoid such situations, let's reset DRCs after all devices have been reset. Reported-by: Mallesh N. Koti <mallesh@linux.vnet.ibm.com> Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-11-20 10:10:56 +11:00
Suraj Jitindar Singh	7abd43baec	target/ppc: Update setting of cpu features to account for compat modes The device tree nodes ibm,arch-vec-5-platform-support and ibm,pa-features are used to communicate features of the cpu to the guest operating system. The properties of each of these are determined based on the selected cpu model and the availability of hypervisor features. Currently the compatibility mode of the cpu is not taken into account. The ibm,arch-vec-5-platform-support node is used to communicate the level of support for various ISAv3 processor features to the guest before CAS to inform the guests' request. The available mmu mode should only be hash unless the cpu is a POWER9 which is not in a prePOWER9 compat mode, in which case the available modes depend on the accelerator and the hypervisor capabilities. The ibm,pa-featues node is used to communicate the level of cpu support for various features to the guest os. This should only contain features relevant to the operating mode of the processor, that is the selected cpu model taking into account any compat mode. This means that the compat mode should be taken into account when choosing the properties of ibm,pa-features and they should match the compat mode selected, or the cpu model selected if no compat mode. Update the setting of these cpu features in the device tree as described above to properly take into account any compat mode. We use the ppc_check_compat function which takes into account the current processor model and the cpu compat mode. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-11-20 10:07:49 +11:00
Igor Mammedov	2e9c10eba0	ppc: spapr: use generic cpu_model parsing use generic cpu_model parsing introduced by (`6063d4c0f` vl.c: convert cpu_model to cpu type and set of global properties before machine_init()) it allows to: * replace sPAPRMachineClass::tcg_default_cpu with MachineClass::default_cpu_type * drop cpu_parse_cpu_model() from hw/ppc/spapr.c and reuse one in vl.c * simplify spapr_get_cpu_core_type() by removing not needed anymore recurrsion since alias look up happens earlier at vl.c and spapr_get_cpu_core_type() works only with resulted from that cpu type. * spapr no more needs to parse/depend on being phased out MachineState::cpu_model, all tha parsing done by generic code and target specific callback. Signed-off-by: Igor Mammedov <imammedo@redhat.com> [dwg: Correct minor compile error] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-10-17 10:34:01 +11:00
Igor Mammedov	17be88a713	ppc: spapr: use cpu model names as tcg defaults instead of aliases Signed-off-by: Igor Mammedov <imammedo@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-10-17 10:34:01 +11:00
Igor Mammedov	b51d3c8818	ppc: spapr: use cpu type name directly replace sPAPRCPUCoreClass::cpu_class with cpu type name since it were needed just to get that at points it were accessed. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-10-17 10:34:00 +11:00
Igor Mammedov	b8e999673b	ppc: move '-cpu foo,compat=xxx' parsing into ppc_cpu_parse_featurestr() there is a dedicated callback CPUClass::parse_features which purpose is to convert -cpu features into a set of global properties AND deal with compat/legacy features that couldn't be directly translated into CPU's properties. Create ppc variant of it (ppc_cpu_parse_featurestr) and move 'compat=val' handling from spapr_cpu_core.c into it. That removes a dependency of board/core code on cpu_model parsing and would let to reuse common -cpu parsing introduced by `6063d4c0` Set "max-cpu-compat" property only if it exists, in practice it should limit 'compat' hack to spapr machine and allow to avoid including machine/spapr headers in target/ppc/cpu.c Signed-off-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-10-17 10:34:00 +11:00
Daniel Henrique Barboza	2a129767eb	hw/ppc/spapr.c: abort unplug_request if previous unplug isn't done LMB removal is completed only when the spapr_lmb_release callback is called after all DRCs of the dimm are detached. During this time, it is possible that a unplug request for the same dimm arrives, trying to detach DRCs that were detached by the guest in the first unplug_request. BQL doesn't help in this case - the lock will prevent any concurrent removal from happening until the end of spapr_memory_unplug_request only. What happens is that the second unplug_request ends up calling spapr_drc_detach in a DRC that were detached already, causing an assert error in spapr_drc_detach (e.g https://bugs.launchpad.net/qemu/+bug/1718118). spapr_lmb_release uses a structure called sPAPRDIMMState, stored in the spapr->pending_dimm_unplugs QTAIL, to track how many LMB DRCs are left to be detached by the guest. When there are no more DRCs left, this structure is deleted and the pc-dimm unplug handler is called to finish the process. This patch reuses the sPAPRDIMMState to allow unplug_request to know if there is an ongoing unplug process for a given dimm, aborting the unplug request in this case, by doing the following changes: - in spapr_lmb_release callback, move the dimm state removal to the end, after pc-dimm unplug handler. With this change we can check for the existence of the dimm state to see if the unplug process is done. - use spapr_pending_dimm_unplugs_find in spapr_memory_unplug_request to check if the dimm state exists. If positive, there is an unplug operation already in progress for this dimm, meaning that we should abort it and warn the user about it. Fixes: https://bugs.launchpad.net/qemu/+bug/1718118 Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-10-17 10:34:00 +11:00
Greg Kurz	827b17c468	spapr: sanity check size of the CAS buffer The CAS buffer is provided by SLOF. A broken SLOF could pass a silly size: either smaller than the diff header, in which case the current code will try to allocate 16 Exabytes of memory and g_malloc0() will abort, or bigger than the maximum memory provisioned for SLOF (ie, 40 Megabytes), which doesn't make sense. Both cases indicate that SLOF has a bug. Let's print out an explicit error message and exit since rebooting as we do with other errors would only result in a reset loop. Signed-off-by: Greg Kurz <groug@kaod.org> [dwg: Fix format specifier that broke 32-bit builds] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-10-17 10:34:00 +11:00
Greg Kurz	dc1b5eee86	spapr: fix OF word name in comment Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-10-17 10:34:00 +11:00
Greg Kurz	a4f3885c74	hw/ppc: use 0 instead of fdt_path_offset(fdt, "/") The offset of the root node is guaranteed to be 0. This doesn't fix anything, it's just trivial cleanup of the two remaining places where this was done under hw/ppc. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-10-17 10:34:00 +11:00
Greg Kurz	1ec26c757d	spapr: fix the value of SDR1 in kvmppc_put_books_sregs() When running with KVM PR, if a new HPT is allocated we need to inform KVM about the HPT address and size. This is currently done by hacking the value of SDR1 and pushing it to KVM in several places. Also, migration breaks the guest since it is very unlikely the HPT has the same address in source and destination, but we push the incoming value of SDR1 to KVM anyway. This patch introduces a new virtual hypervisor hook so that the spapr code can provide the correct value of SDR1 to be pushed to KVM each time kvmppc_put_books_sregs() is called. It allows to get rid of all the hacking in the spapr/kvmppc code and it fixes migration of nested KVM PR. Suggested-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-09-27 13:05:41 +10:00
Greg Kurz	332f7721cb	spapr: introduce helpers to migrate HPT chunks and the end marker This consolidates some duplicated code in a dedicated helpers. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-09-27 13:05:41 +10:00
Greg Kurz	14b0d74887	ppc/kvm: generalize the use of kvmppc_get_htab_fd() The use of KVM_PPC_GET_HTAB_FD is open-coded in kvmppc_read_hptes() and kvmppc_write_hpte(). This patch modifies kvmppc_get_htab_fd() so that it can be used everywhere we need to access the in-kernel htab: - add an index argument => only kvmppc_read_hptes() passes an actual index, all other users pass 0 - add an errp argument to propagate error messages to the caller. => spapr migration code prints the error => hpte helpers pass &error_abort to keep the current behavior of hw_error() While here, this also fixes a bug in kvmppc_write_hpte() so that it opens the htab fd for writing instead of reading as it currently does. This never broke anything because we currently never call this code, as explained in the changelog of commit c1385933804bb: "This support updating htab managed by the hypervisor. Currently we don't have any user for this feature. This actually bring the store_hpte interface in-line with the load_hpte one. We may want to use this when we want to emulate henter hcall in qemu for HV kvm." The above is still true today. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-09-27 13:05:41 +10:00
Greg Kurz	82be8e7394	ppc/kvm: change kvmppc_get_htab_fd() to return -errno on error When kvmppc_get_htab_fd() fails, its return value is propagated up to qemu_savevm_state_iterate() or to qemu_savevm_state_complete_precopy(). All savevm handlers expect to receive a negative errno on error. Let's patch kvmppc_get_htab_fd() accordingly. While here, let's change htab_load() in the spapr code to also propagate the error, since it doesn't make sense to abort() if we couldn't get the htab fd from KVM. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-09-27 13:05:41 +10:00
Igor Mammedov	79e0793614	numa: cpu: calculate/set default node-ids after all -numa CLI options are parsed Calculating default node-ids for CPUs in possible_cpu_arch_ids() is rather fragile since defaults calculation uses nb_numa_nodes but callback might be potentially called early before all -numa CLI options are parsed, which would lead to cpus assigned only upto nb_numa_nodes at the time possible_cpu_arch_ids() is called. Issue was introduced by (`7c88e65` numa: mirror cpu to node mapping in MachineState::possible_cpus) and for example CLI: -smp 4 -numa node,cpus=0 -numa node would set props.node-id in possible_cpus array for every non explicitly mapped CPU to the first node. Issue is not visible to guest nor to mgmt interface due to 1) implictly mapped cpus are forced to the first node in case of partial mapping 2) in case of default mapping possible_cpu_arch_ids() is called after all -numa options are parsed (resulting in correct mapping). However it's fragile to rely on late execution of possible_cpu_arch_ids(), therefore add machine specific callback that returns node-id for CPU and use it to calculate/ set defaults at machine_numa_finish_init() time when all -numa options are parsed. Reported-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <1496314408-163972-1-git-send-email-imammedo@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-09-19 16:51:33 -03:00
Cédric Le Goater	21f3f8db0e	ppc/xive: fix OV5_XIVE_EXPLOIT bits On POWER9, the Client Architecture Support (CAS) negotiation process determines whether the guest operates in XIVE Legacy compatibility or in XIVE exploitation mode. Now that we have initial guest support for the XIVE interrupt controller, let's fix the bits definition which have evolved in the latest specs. The platform advertises the XIVE Exploitation Mode support using the property "ibm,arch-vec-5-platform-support-vec-5", byte 23 bits 0-1 : - 0b00 XIVE legacy mode Only - 0b01 XIVE exploitation mode Only - 0b10 XIVE legacy or exploitation mode The OS asks for XIVE Exploitation Mode support using the property "ibm,architecture-vec-5", byte 23 bits 0-1: - 0b00 XIVE legacy mode Only - 0b01 XIVE exploitation mode Only Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-09-15 10:29:48 +10:00
Daniel Henrique Barboza	c86c1affae	hw/ppc/spapr.c: cleaning up qdev_get_machine() calls This patch removes the qdev_get_machine() calls that are made in spapr.c in situations where we can get an existing pointer for the MachineState by either passing it as an argument to the function or by using other already available pointers. The following changes were made: - spapr_node0_size: static function that is called two times: at spapr_setup_hpt_and_vrma and ppc_spapr_init. In both cases we can pass an existing MachineState pointer to it. - spapr_build_fdt: MachineState pointer can be retrieved from the existing sPAPRMachineState pointer. - spapr_boot_set: the opaque in the first arg is a sPAPRMachineState pointer as we can see inside ppc_spapr_init: qemu_register_boot_set(spapr_boot_set, spapr); We can get a MachineState pointer from it. - spapr_machine_device_plug and spapr_machine_device_unplug_request: the MachineState, sPAPRMachineState, MachineClass and sPAPRMachineClass pointers can all be retrieved from the HotplugHandler pointer. Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-09-15 10:29:48 +10:00
Sam Bobroff	fa98fbfcdf	PPC: KVM: Support machine option to set VSMT mode KVM now allows writing to KVM_CAP_PPC_SMT which has previously been read only. Doing so causes KVM to act, for that VM, as if the host's SMT mode was the given value. This is particularly important on Power 9 systems because their default value is 1, but they are able to support values up to 8. This patch introduces a way to control this capability via a new machine property called VSMT ("Virtual SMT"). If the value is not set on the command line a default is chosen that is, when possible, compatible with legacy systems. Note that the intialization of KVM_CAP_PPC_SMT has changed slightly because it has changed (in KVM) from a global capability to a VM-specific one. This won't cause a problem on older KVMs because VM capabilities fall back to global ones. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-09-08 09:30:55 +10:00
Sam Bobroff	2e886fb391	ppc: spapr: Make VCPU ID handling private to SPAPR The concept of a VCPU ID that differs from the CPU's index (cpu->cpu_index) exists only within SPAPR machines so, move the functions ppc_get_vcpu_id() and ppc_get_cpu_by_vcpu_id() into spapr.c and rename them appropriately. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-09-08 09:30:55 +10:00
Sam Bobroff	81210c2009	ppc: spapr: Rename cpu_dt_id to vcpu_id This field actually records the VCPU ID used by KVM and, although the value is also used in the device tree it is primarily the VCPU ID so rename it as such. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> [dwg: Updated comment missed in cpu.h] Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-09-08 09:30:55 +10:00
Greg Kurz	e2676b1697	spapr: add pseries-2.11 machine type Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-09-08 09:30:55 +10:00
Daniel Henrique Barboza	10f12e6450	hw/ppc: CAS reset on early device hotplug This patch is a follow up on the discussions made in patch "hw/ppc: disable hotplug before CAS is completed" that can be found at [1]. At this moment, we do not support CPU/memory hotplug in early boot stages, before CAS. When a hotplug occurs, the event is logged in an internal RTAS event log queue and an IRQ pulse is fired. In regular conditions, the guest handles the interrupt by executing check_exception, fetching the generated hotplug event and enabling the device for use. In early boot, this IRQ isn't caught (SLOF does not handle hotplug events), leaving the event in the rtas event log queue. If the guest executes check_exception due to another hotplug event, the re-assertion of the IRQ ends up de-queuing the first hotplug event as well. In short, a device hotplugged before CAS is considered coldplugged by SLOF. This leads to device misbehavior and, in some cases, guest kernel Ooops when trying to unplug the device. A proper fix would be to turn every device hotplugged before CAS as a colplugged device. This is not trivial to do with the current code base though - the FDT is written in the guest memory at ppc_spapr_reset and can't be retrieved without adding extra state (fdt_size for example) that will need to managed and migrated. Adding the hotplugged DT in the middle of CAS negotiation via the updated DT tree works with CPU devs, but panics the guest kernel at boot. Additional analysis would be necessary for LMBs and PCI devices. There are questions to be made in QEMU/SLOF/kernel level about how we can make this change in a sustainable way. With Linux guests, a fix would be the kernel executing check_exception at boot time, de-queueing the events that happened in early boot and processing them. However, even if/when the newer kernels start fetching these events at boot time, we need to take care of older kernels that won't be doing that. This patch works around the situation by issuing a CAS reset if a hotplugged device is detected during CAS: - the DRC conditions that warrant a CAS reset is the same as those that triggers a DRC migration - the DRC must have a device attached and the DRC state is not equal to its ready_state. With that in mind, this patch makes use of 'spapr_drc_needed' to determine if a CAS reset is needed. - In the middle of CAS negotiations, the function 'spapr_hotplugged_dev_before_cas' goes through all the DRCs to see if there are any DRC that requires a reset, using spapr_drc_needed. If that happens, returns '1' in 'spapr_h_cas_compose_response' which will set spapr->cas_reboot to true, causing the machine to reboot. No changes are made for coldplug devices. [1] http://lists.nongnu.org/archive/html/qemu-devel/2017-08/msg02855.html Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-09-08 09:30:54 +10:00
Daniel Henrique Barboza	5625817423	hw/ppc: clear pending_events on machine reset The sPAPR machine isn't clearing up the pending events QTAILQ on machine reboot. This allows for unprocessed hotplug/epow events to persist in the queue after reset and, when reasserting the IRQs in check_exception later on, these will be being processed by the OS. This patch implements a new function called 'spapr_clear_pending_events' that clears up the pending_events QTAILQ. This helper is then called inside ppc_spapr_reset to clear up the events queue, preventing old/deprecated events from persisting after a reset. Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-09-08 09:30:54 +10:00
Thomas Huth	0479097859	hw/ppc/spapr: Fix segfault when instantiating a 'pc-dimm' without 'memdev' QEMU currently crashes when trying to use a 'pc-dimm' on the pseries machine without specifying its 'memdev' property. This happens because pc_dimm_get_memory_region() does not check whether the 'memdev' property has properly been set by the user. Looking closer at this function, it's also obvious that it is using &error_abort to call another function - and this is bad in a function that is used in the hot-plugging calling chain since this can also cause QEMU to exit unexpectedly. So let's fix these issues in a proper way now: Add a "Error **errp" parameter to pc_dimm_get_memory_region() which we use in case the 'memdev' property has not been set by the user, and which we can use instead of the &error_abort, and change the callers of get_memory_region() to make use of this "errp" parameter for proper error checking. Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-08-22 21:26:46 +10:00
David Gibson	fc7e0765fc	Revert "spapr: populate device tree depending on XIVE_EXPLOIT option" This reverts commit `b87680427e`. I thought this was a harmless preliminary for XIVE enablement patches we expect later on. However, due to some subtle interactions between qemu and SLOF (guest firmware) this breaks some things. Revert it for now, we'll work out how to fix it when the rest of the XIVE patches are ready. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-07-29 16:22:14 +10:00
Bharata B Rao	8d5981c4fc	spapr: Fix QEMU abort during memory unplug Commit `0cffce56` (hw/ppc/spapr.c: adding pending_dimm_unplugs to sPAPRMachineState) introduced a new way to track pending LMBs of DIMM device that is marked for removal. Since this commit we can hit the assert in spapr_pending_dimm_unplugs_add() in the following situation: - DIMM device removal fails as the guest doesn't allow the removal. - Subsequent attempt to remove the same DIMM would hit the assert as the corresponding sPAPRDIMMState is still part of the pending_dimm_unplugs list. Fix this by removing the assert and conditionally adding the sPAPRDIMMState to pending_dimm_unplugs list only when it is not already present. Fixes: `0cffce56ae` Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> [dwg: Tweaked to avoid returning NULL when spapr_pending_dimm_unplugs_add() does find an existing entry] Reviewed-by: Daniel Barboza <danielhb@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-07-25 11:14:25 +10:00
Laurent Vivier	e8cd4247e9	spapr/htab: fix savevm Commit `3a38429` ("spapr: Add a "no HPT" encoding to HTAB migration stream") allows to migrate an empty HPT, but doesn't mark correctly the end of the migration stream. The end condition (value returned by htab_save_iterate()) should be 1, whereas in `3a38429` it returns 0. The problem can be reproduced with QEMU monitor command "savevm": the command never stops and the disk image grows without limit. Fixes: `3a38429748` Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-07-25 11:14:25 +10:00
Greg Kurz	df8658de43	spapr: fix memory leak in spapr_core_pre_plug() In case of error, we must ensure the dynamically allocated base_core_type is freed, like it is done everywhere else in this function. This is a regression introduced in QEMU 2.9 by commit `8149e2992f`. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-07-17 15:07:05 +10:00
David Gibson	2772cf6be9	pseries: Use smaller default hash page tables when guest can resize We've now implemented a PAPR extension allowing PAPR guest to resize their hash page table (HPT) during runtime. This patch makes use of that facility to allocate smaller HPTs by default. Specifically when a guest is aware of the HPT resize facility, qemu sizes the HPT to the initial memory size, rather than the maximum memory size on the assumption that the guest will resize its HPT if necessary for hot plugged memory. When the initial memory size is much smaller than the maximum memory size (a common configuration with e.g. oVirt / RHEV) then this can save significant memory on the HPT. If the guest does not advertise HPT resize awareness when it makes the ibm,client-architecture-support call, qemu resizes the HPT for maxmimum memory size (unless it's been configured not to allow such guests at all). For now we make that reallocation assuming the guest has not yet used the HPT at all. That's true in practice, but not, strictly, an architectural or PAPR requirement. If we need to in future we can fix this by having the client-architecture-support call reboot the guest with the revised HPT size (the client-architecture-support call is explicitly permitted to trigger a reboot in this way). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>	2017-07-17 15:07:05 +10:00
David Gibson	52b81ab5e9	pseries: Enable HPT resizing for 2.10 We've now implemented a PAPR extensions which allows PAPR guests (i.e. "pseries" machine type) to resize their hash page table during runtime. However, that extension is only enabled if explicitly chosen on the command line. This patch enables it by default for spapr-2.10, but leaves it disabled (by default) for older machine types. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com>	2017-07-17 15:07:05 +10:00
David Gibson	0b0b831016	pseries: Implement HPT resizing This patch implements hypercalls allowing a PAPR guest to resize its own hash page table. This will eventually allow for more flexible memory hotplug. The implementation is partially asynchronous, handled in a special thread running the hpt_prepare_thread() function. The state of a pending resize is stored in SPAPR_MACHINE->pending_hpt. The H_RESIZE_HPT_PREPARE hypercall will kick off creation of a new HPT, or, if one is already in progress, monitor it for completion. If there is an existing HPT resize in progress that doesn't match the size specified in the call, it will cancel it, replacing it with a new one matching the given size. The H_RESIZE_HPT_COMMIT completes transition to a resized HPT, and can only be called successfully once H_RESIZE_HPT_PREPARE has successfully completed initialization of a new HPT. The guest must ensure that there are no concurrent accesses to the existing HPT while this is called (this effectively means stop_machine() for Linux guests). For now H_RESIZE_HPT_COMMIT goes through the whole old HPT, rehashing each HPTE into the new HPT. This can have quite high latency, but it seems to be of the order of typical migration downtime latencies for HPTs of size up to ~2GiB (which would be used in a 256GiB guest). In future we probably want to move more of the rehashing to the "prepare" phase, by having H_ENTER and other hcalls update both current and pending HPTs. That's a project for another day, but should be possible without any changes to the guest interface. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-07-17 15:07:05 +10:00
David Gibson	30f4b05bd0	pseries: Stubs for HPT resizing This introduces stub implementations of the H_RESIZE_HPT_PREPARE and H_RESIZE_HPT_COMMIT hypercalls which we hope to add in a PAPR extension to allow run time resizing of a guest's hash page table. It also adds a new machine property for controlling whether this new facility is available. For now we only allow resizing with TCG, allowing it with KVM will require kernel changes as well. Finally, it adds a new string to the hypertas property in the device tree, advertising to the guest the availability of the HPT resizing hypercalls. This is a tentative suggested value, and would need to be standardized by PAPR before being merged. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com>	2017-07-17 15:07:05 +10:00
Greg Kurz	e49c63d5b3	spapr: fix potential memory leak in spapr_core_plug() Since commit `5c1da81215` ("spapr: Remove unnecessary differences between hotplug and coldplug paths"), the CPU DT for the DRC is always allocated. This causes a memory leak for pseries-2.6 and older machine types, that don't support CPU hotplug and don't allocate DRCs for CPUs. Reported-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-07-17 15:07:05 +10:00
David Gibson	a8dc47fd82	spapr: Refactor spapr_drc_detach() This function has two unused parameters - remove them. It also sets awaiting_release on all paths, except one. On that path setting it is harmless, since it will be immediately cleared by spapr_drc_release(). So factor it out of the if statements. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>	2017-07-17 15:07:05 +10:00
David Gibson	765d1bdda5	spapr: Simplify unplug path spapr_lmb_release() and spapr_core_release() call hotplug_handler_unplug() which after a bunch of indirection calls spapr_memory_unplug() or spapr_core_unplug(). But we already know which is the appropriate thing to call here, so we can just fold it directly into the release function. Once that's done, there's no need for an hc->unplug method in the spapr machine at all: since we also have an hc->unplug_request method, the hotplug core will never use ->unplug. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>	2017-07-17 15:07:05 +10:00
Laurent Vivier	94fd9cbaa3	spapr: Treat devices added before inbound migration as coldplugged When migrating a guest which has already had devices hotplugged, libvirt typically starts the destination qemu with -incoming defer, adds those hotplugged devices with qmp, then initiates the incoming migration. This causes problems for the management of spapr DRC state. Because the device is treated as hotplugged, it goes into a DRC state for a device immediately after it's plugged, but before the guest has acknowledged its presence. However, chances are the guest on the source machine has acknowledged the device's presence and configured it. If the source has fully configured the device, then DRC state won't be sent in the migration stream: for maximum migration compatibility with earlier versions we don't migrate DRCs in coldplug-equivalent state. That means that the DRC effectively changes state over the migrate, causing problems later on. In addition, logging hotplug events for these devices isn't what we want because a) those events should already have been issued on the source host and b) the event queue should get wiped out by the incoming state anyway. In short, what we really want is to treat devices added before an incoming migration as if they were coldplugged. To do this, we first add a spapr_drc_hotplugged() helper which determines if the device is hotplugged in the sense relevant for DRC state management. We only send hotplug events when this is true. Second, when we add a device which isn't hotplugged in this sense, we force a reset of the DRC state - this ensures the DRC is in a coldplug-equivalent state (there isn't usually a system reset between these device adds and the incoming migration). This is based on an earlier patch by Laurent Vivier, cleaned up and extended. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>	2017-07-17 15:07:05 +10:00
David Gibson	5341258e86	spapr: Minor cleanups to events handling The rtas_error_log structure is marked packed, which strongly suggests its precise layout is important to match an external interface. Along with that one could expect it to have a fixed endianness to match the same interface. That used to be the case - matching the layout of PAPR RTAS event format and requiring BE fields. Now, however, it's only used embedded within sPAPREventLogEntry with the fields in native order, since they're processed internally. Clear that up by removing the nested structure in sPAPREventLogEntry. struct rtas_error_log is moved back to spapr_events.c where it is used as a temporary to help convert the fields in sPAPREventLogEntry to the correct in memory format when delivering an event to the guest. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-07-17 15:07:05 +10:00
Daniel Henrique Barboza	fd38804b38	spapr: migrate pending_events of spapr state In racing situations between hotplug events and migration operation, a rtas hotplug event could have not yet be delivered to the source guest when migration is started. In this case the pending_events of spapr state need be transmitted to the target so that the hotplug event can be finished on the target. To achieve the minimal VMSD possible to migrate the pending_events list, this patch makes the changes in spapr_events.c: - 'log_type' of sPAPREventLogEntry struct deleted. This information can be derived by inspecting the rtas_error_log summary field. A new function called 'spapr_event_log_entry_type' was added to retrieve the type of a given sPAPREventLogEntry. - sPAPREventLogEntry, epow_log_full and hp_log_full were redesigned. The only data we're going to migrate in the VMSD is the event log data itself, which can be divided in two parts: a rtas_error_log header and an extended event log field. The rtas_error_log header contains information about the size of the extended log field, which can be used inside VMSD as the size parameter of the VBUFFER_ALOC field that will store it. To allow this use, the header.extended_length field must be exposed inline to the VMSD instead of embedded into a 'data' field that holds everything. With this in mind, the following changes were done: * a new 'header' field was added to sPAPREventLogEntry. This field holds a a struct rtas_error_log inline. * the declaration of the 'rtas_error_log' struct was moved to spapr.h to be visible to the VMSD macros. * 'data' field of sPAPREventLogEntry was renamed to 'extended_log' and now holds only the contents of the extended event log. * 'struct rtas_error_log hdr' were taken away from both epow_log_full and hp_log_full. This information is now available at the header field of sPAPREventLogEntry. * epow_log_full and hp_log_full were renamed to epow_extended_log and hp_extended_log respectively. This rename makes it clearer to understand the new purpose of both structures: hold the information of an extended event log field. * spapr_powerdown_req and spapr_hotplug_req_event now creates a sPAPREventLogEntry structure that contains the full rtas log entry. * rtas_event_log_queue and rtas_event_log_dequeue now receives a sPAPREventLogEntry pointer as a parameter instead of a void pointer. - the endianess of the sPAPREventLogEntry header is now native instead of be32. We can use the fields in native endianess internally and write them in be32 in the guest physical memory inside 'check_exception'. This allows the VMSD inside spapr.c to read the correct size of the entended_log field. - inside spapr.c, pending_events is put in a subsection in the spapr state VMSD to make sure migration across different versions is not broken. A small change in rtas_event_log_queue and rtas_event_log_dequeue were also made: instead of calling qdev_get_machine(), both functions now receive a pointer to the sPAPRMachineState. This pointer is already available in the callers of these functions and we don't need to waste resources calling qdev() again. Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-07-17 15:07:05 +10:00
Alistair Francis	3dc6f86936	Convert error_report() to warn_report() Convert all uses of error_report("warning:"... to use warn_report() instead. This helps standardise on a single method of printing warnings to the user. All of the warnings were changed using these two commands: find ./* -type f -exec sed -i \ 's\|error_report(".*warning[,:] \|warn_report("\|Ig' {} + Indentation fixed up manually afterwards. The test-qdev-global-props test case was manually updated to ensure that this patch passes make check (as the test cases are case sensitive). Signed-off-by: Alistair Francis <alistair.francis@xilinx.com> Suggested-by: Thomas Huth <thuth@redhat.com> Cc: Jeff Cody <jcody@redhat.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Max Reitz <mreitz@redhat.com> Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Lieven <pl@kamp.de> Cc: Josh Durgin <jdurgin@redhat.com> Cc: "Richard W.M. Jones" <rjones@redhat.com> Cc: Markus Armbruster <armbru@redhat.com> Cc: Peter Crosthwaite <crosthwaite.peter@gmail.com> Cc: Richard Henderson <rth@twiddle.net> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Greg Kurz <groug@kaod.org> Cc: Rob Herring <robh@kernel.org> Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Peter Chubb <peter.chubb@nicta.com.au> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Marcel Apfelbaum <marcel@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Alexander Graf <agraf@suse.de> Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Cornelia Huck <cohuck@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Greg Kurz <groug@kaod.org> Acked-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed by: Peter Chubb <peter.chubb@data61.csiro.au> Acked-by: Max Reitz <mreitz@redhat.com> Acked-by: Marcel Apfelbaum <marcel@redhat.com> Message-Id: <e1cfa2cd47087c248dd24caca9c33d9af0c499b0.1499866456.git.alistair.francis@xilinx.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2017-07-13 13:49:58 +02:00
Peter Maydell	aa916e409c	ppc patch queue 2017-07-11 * Several minor cleanups from Greg Kurz * Fix for migration of pseries-2.7 and earlier machine types * More reworking of the DRC hotplug code, fixing several problems though there are still more to go * Fixes for CPU family / alias handling on POWER9 * Preliminary patches for POWER9 XIVE (new interrupt controller) support * Assorted other fixes -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJZZFWEAAoJEGw4ysog2bOSxgAQAI85Vv8RuK1mgN0w0aIguP09 JIM+iZ3zJwSFM3A/D8CnWxMGEQkjkVfKWT8cB97v5vPGTu21WD2hdQ26ZrcjC8Do Y5sPuCGRRSZvz+tnz17HU2aZMQwteNNgdes9MGr61kdVUk+1uvcyqTdhqxka5rF7 SYcIEf95+Fcu00+bhwGaGg0ZXHer4rSTjDXbT3CcxT64sgQW8X36SceFBkFH0P40 tX1bn9gdQgBNOT11O0MNeq6ewxHhSSusTwyYXpHTvK6p0EXPqfm+vM9dQSmXeKsk T7/yDmKplutVnWlfbxrdG+wp+ObE1h7KljGdWLx4jIX58dHVvjDJ+kZ+OJbcb6Xj oEV947tYkZaDC7q7TkwXjYltbq+A6HFFKEwxJ59L4zYgVYVkTUMRJ3Apl66sq5a1 SHEBXAA5SDq8jxdKKqvwzh4ZtkkxIelOO8lTVjOAg8ffcNfEwbJOuom2h0kgzOgz Sn2PxC/jwk2RZZ4T+qe1KNpVbV3RYpGanMXYDMFUnTRw2RAU2io0R2bBwOlm/0I7 ZUrjD2xCFrMPuthxr5/5/w0P1StALVN50S5YqWvDuQYIbMYhSjSh3tDgAHVrqL4W Yc1Zr5X9X91qgUjAkejBuirvWLvgofiw8jlqAZ6K2zTUcvtn0KdQGe7eiK+wostA PhLW9tYrkpt/BmzEMi1X =8Wy2 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.10-20170711' into staging ppc patch queue 2017-07-11 * Several minor cleanups from Greg Kurz * Fix for migration of pseries-2.7 and earlier machine types * More reworking of the DRC hotplug code, fixing several problems though there are still more to go * Fixes for CPU family / alias handling on POWER9 * Preliminary patches for POWER9 XIVE (new interrupt controller) support * Assorted other fixes # gpg: Signature made Tue 11 Jul 2017 05:35:16 BST # gpg: using RSA key 0x6C38CACA20D9B392 # gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" # gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" # gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" # gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" # Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392 * remotes/dgibson/tags/ppc-for-2.10-20170711: spapr: populate device tree depending on XIVE_EXPLOIT option spapr: introduce the XIVE_EXPLOIT option in CAS ppc/kvm: have the "family" CPU alias to point to TYPE_HOST_POWERPC_CPU spapr: Only report host/guest IOMMU page size mismatches on KVM spapr: fix memory hotplug error path target/ppc: Add debug function for radix mmu translation target/ppc: Refactor tcg radix mmu code spapr: Use unplug_request for PCI hot unplug spapr: Remove unnecessary differences between hotplug and coldplug paths spapr: Add DRC release method spapr: Uniform DRC reset paths spapr: Leave DR-indicator management to the guest target-ppc: SPR_BOOKE_ESR not set on FP exceptions spapr: fix migration to pseries machine < 2.8 spapr: fix bogus function name in comment spapr: refresh "platform-specific" hcalls comment spapr: make spapr_populate_hotplug_cpu_dt() static Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-07-11 16:34:09 +01:00
Cédric Le Goater	b87680427e	spapr: populate device tree depending on XIVE_EXPLOIT option When XIVE is supported, the device tree should be populated accordingly and the XIVE memory regions mapped to activate MMIOs. Depending on the design we choose, we could also allocate different ICS and ICP objects, or switch between objects. This needs to be discussed. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-07-11 11:04:02 +10:00
Cédric Le Goater	f2b14e3a9f	spapr: introduce the XIVE_EXPLOIT option in CAS On POWER9, the Client Architecture Support (CAS) negotiation process determines whether the guest operates in XIVE Legacy compatibility (the former POWER8 interrupt model) or in XIVE exploitation mode (the newer POWER9 interrupt model). Bit 7 of Byte 23 of vector 5 is used for this purpose. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-07-11 11:04:02 +10:00
Greg Kurz	160bb67885	spapr: fix memory hotplug error path QEMU shouldn't abort if spapr_add_lmbs()->spapr_drc_attach() fails. Let's propagate the error instead, like it is done everywhere else where spapr_drc_attach() is called. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Daniel Barboza <danielhb@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-07-11 11:04:02 +10:00
David Gibson	5c1da81215	spapr: Remove unnecessary differences between hotplug and coldplug paths spapr_drc_attach() has a 'coldplug' parameter which sets the DRC into configured state initially, instead of the usual ISOLATED/UNUSABLE state. It turns out this is unnecessary: although coldplugged devices do need to be in CONFIGURED state once the guest starts, that will already be accomplished by the reset code which will move DRCs for already plugged devices into a coldplug equivalent state. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org>	2017-07-11 11:04:01 +10:00
David Gibson	6caf3ac613	spapr: Uniform DRC reset paths DRC objects have a regular device reset method. However, it only gets called in the usual way for PCI DRCs. Because of where CPU and LMB DRCs are in the QOM tree, their device reset method isn't automatically called. So, the machine manually registers reset handlers to call device_reset(). This patch removes the device reset method, and instead always explicitly registers the reset handler from realize(). This means the callers don't have to worry about the two cases, and we always get proper resets. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Laurent Vivier <lvivier@redhat.com>	2017-07-11 11:04:01 +10:00
Greg Kurz	f3728f9cbb	spapr: fix bogus function name in comment $ git grep spapr_ppc_reset hw/ppc/spapr.c: * as part of spapr_ppc_reset(). $ git grep ppc_spapr_reset hw/ppc/spapr.c:static void ppc_spapr_reset(void) hw/ppc/spapr.c: mc->reset = ppc_spapr_reset; hw/ppc/spapr_hcall.c: /* If ppc_spapr_reset() did not set up a HPT but one is necessary Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-07-11 11:04:01 +10:00
Greg Kurz	04d0ffbd52	spapr: make spapr_populate_hotplug_cpu_dt() static Since commit `ff9006ddbf` ("spapr: move spapr_core_[foo]plug() callbacks close to machine code in spapr.c"), this function doesn't need to be extern anymore. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-07-11 11:04:01 +10:00
Juan Quintela	70f794fcfa	migration: Rename cleanup() to save_cleanup() We need a cleanup for loads, so we rename here to be consistent. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> -- Rename htab_cleanup to htap_save_cleanup as dave suggestion Message-Id: <20170628095228.4661-3-quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-07-10 17:52:21 +01:00
Juan Quintela	9907e842d7	migration: Rename save_live_setup() to save_setup() We are going to use it now for more than save live regions. Once there rename qemu_savevm_state_begin() to qemu_savevm_state_setup(). Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20170628095228.4661-2-quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-07-10 17:52:21 +01:00
David Gibson	4f9242fc93	spapr: Make DRC reset force DRC into known state The reset handler for DRCs attempts several state transitions which are subject to various checks and restrictions. But at reset time we know there is no guest, so we can ignore most of the usual sequencing rules and just set the DRC back to a known state. In fact, it's safer to do so. The existing code also has several redundant checks for drc->awaiting_release inside a block which has already tested that. This patch removes those and sets the DRC to a fixed initial state based only on whether a device is currently plugged or not. With DRCs correctly reset to a state based on device presence, we don't need to force state transitions as cold plugged devices are processed. This allows us to remove all the callers of the set_*_state() methods from outside spapr_drc.c. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>	2017-06-30 14:03:32 +10:00
Daniel Henrique Barboza	aca8bf9f1c	hw/ppc/spapr.c: consecutive 'spapr->patb_entry = 0' statements In ppc_spapr_reset(), if the guest is using HPT, the code was executing: } else { spapr->patb_entry = 0; spapr_setup_hpt_and_vrma(spapr); } And, at the end of spapr_setup_hpt_and_vrma: /* We're setting up a hash table, so that means we're not radix */ spapr->patb_entry = 0; Resulting in spapr->patb_entry being assigned to 0 twice in a row. Given that 'spapr_setup_hpt_and_vrma' is also called inside 'spapr_check_setup_free_hpt' of spapr_hcall.c, this trivial patch removes the 'patb_entry = 0' assignment from the 'else' clause inside ppc_spapr_reset to avoid this behavior. Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-06-30 14:03:31 +10:00
Greg Kurz	46f7afa370	spapr: fix migration of ICPState objects from/to older QEMU Commit `5bc8d26de2` ("spapr: allocate the ICPState object from under sPAPRCPUCore") moved ICPState objects from the machine to CPU cores. This is an improvement since we no longer allocate ICPState objects that will never be used. But it has the side-effect of breaking migration of older machine types from older QEMU versions. This patch allows spapr to register dummy "icp/server" entries to vmstate. These entries use a dedicated VMStateDescription that can swallow and discard state of an incoming migration stream, and that don't send anything on outgoing migration. As for real ICPState objects, the instance_id is the cpu_index of the corresponding vCPU, which happens to be equal to the generated instance_id of older machine types. The machine can unregister/register these entries when CPUs are dynamically plugged/unplugged. This is only available for pseries-2.9 and older machines, thanks to a compat property. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-06-30 14:03:31 +10:00
Bharata B Rao	d39c90f5f3	spapr: Fix migration of Radix guests Fix migration of radix guests by ensuring that we issue KVM_PPC_CONFIGURE_V3_MMU for radix case post migration. Reported-by: Nageswara R Sastry <rnsastry@linux.vnet.ibm.com> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-06-30 14:03:31 +10:00
Bharata B Rao	3a38429748	spapr: Add a "no HPT" encoding to HTAB migration stream Add a "no HPT" encoding (using value -1) to the HTAB migration stream (in the place of HPT size) when the guest doesn't allocate HPT. This will help the target side to match target HPT with the source HPT and thus enable successful migration. Suggested-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-06-30 14:03:31 +10:00
David Gibson	d5fc133eed	ppc: Rework CPU compatibility testing across migration Migrating between different CPU versions is a bit complicated for ppc. A long time ago, we ensured identical CPU versions at either end by checking the PVR had the same value. However, this breaks under KVM HV, because we always have to use the host's PVR - it's not virtualized. That would mean we couldn't migrate between hosts with different PVRs, even if the CPUs are close enough to compatible in practice (sometimes identical cores with different surrounding logic have different PVRs, so this happens in practice quite often). So, we removed the PVR check, but instead checked that several flags indicating supported instructions matched. This turns out to be a bad idea, because those instruction masks are not architected information, but essentially a TCG implementation detail. So changes to qemu internal CPU modelling can break migration - this happened between qemu-2.6 and qemu-2.7. That was addressed by `146c11f1` "target-ppc: Allow eventual removal of old migration mistakes". Now, verification of CPU compatibility across a migration basically doesn't happen. We simply ignore the PVR of the incoming migration, and hope the cpu on the destination is close enough to work. Now that we've cleaned up handling of processor compatibility modes for pseries machine type, we can do better. For new machine types (pseries-2.10+) We allow migration if: * The source and destination PVRs are for the same type of CPU, as determined by CPU class's pvr_match function OR * When the source was in a compatibility mode, and the destination CPU supports the same compatibility mode For older machine types we retain the existing behaviour - current CAS code will usually set a compat mode which would break backwards migration if we made them use the new behaviour. [Fixed from an earlier version by Greg Kurz]. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Tested-by: Andrea Bolognani <abologna@redhat.com>	2017-06-30 14:03:31 +10:00
David Gibson	66d5c492dd	pseries: Reset CPU compatibility mode Currently, the CPU compatibility mode is set when the cpu is initialized, then again when the guest negotiates features. This means if a guest negotiates a compatibility mode, then reboots, that compatibility mode will be retained across the reset. Usually that will get overridden when features are negotiated on the next boot, but it's still not really correct. This patch moves the initial set up of the compatibility mode from cpu init to reset time. The mode is retained if the reboot was caused by the feature negotiation (it might be important in that case, though it's unlikely). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Tested-by: Andrea Bolognani <abologna@redhat.com>	2017-06-30 14:03:31 +10:00
David Gibson	7843c0d60d	pseries: Move CPU compatibility property to machine Server class POWER CPUs have a "compat" property, which is used to set the backwards compatibility mode for the processor. However, this only makes sense for machine types which don't give the guest access to hypervisor privilege - otherwise the compatibility level is under the guest's control. To reflect this, this removes the CPU 'compat' property and instead creates a 'max-cpu-compat' property on the pseries machine. Strictly speaking this breaks compatibility, but AFAIK the 'compat' option was never (directly) used with -device or device_add. The option was used with -cpu. So, to maintain compatibility, this patch adds a hack to the cpu option parsing to strip out any compat options supplied with -cpu and set them on the machine property instead of the now deprecated cpu property. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Tested-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: Greg Kurz <groug@kaod.org> Tested-by: Greg Kurz <groug@kaod.org> Tested-by: Andrea Bolognani <abologna@redhat.com>	2017-06-30 14:03:31 +10:00
Peter Xu	15c3850325	migration: move skip_section_footers Move it into MigrationState, revert its meaning and renaming it to send_section_footer, with a property bound to it. Same trick is played like previous patches. Removing savevm_skip_section_footers(). Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1498536619-14548-9-git-send-email-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-06-28 11:18:39 +02:00
Peter Xu	71dd4c1a56	migration: move skip_configuration out It was in SaveState but now moved to MigrationState altogether, reverted its meaning, then renamed to "send_configuration". Again, using HW_COMPAT_2_3 for old PC/SPAPR machines, and accel_register_prop() for xen_init(). Removing savevm_skip_configuration(). Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1498536619-14548-8-git-send-email-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-06-28 11:18:38 +02:00
Peter Xu	5272298c48	migration: move global_state.optional out Put it into MigrationState then we can use the properties to specify whether to enable storing global state. Removing global_state_set_optional() since now we can use HW_COMPAT_2_3 for x86/power, and AccelClass.global_props for Xen. Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1498536619-14548-6-git-send-email-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-06-28 11:18:38 +02:00
Marc-André Lureau	9ed442b8ae	pc-dimm: use get_uint() for dimm properties TYPE_PC_DIMM's property PC_DIMM_ADDR_PROP is defined with DEFINE_PROP_UINT64(). TYPE_PC_DIMM's property PC_DIMM_NODE_PROP is defined with DEFINE_PROP_UINT32(). Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20170607163635.17635-22-marcandre.lureau@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2017-06-20 14:31:32 +02:00
Peter Maydell	735286a4f8	migration/next for 20170613 -----BEGIN PGP SIGNATURE----- iQIcBAABCAAGBQJZP6n5AAoJEPSH7xhYctcj04oQAJczMfc2X8vTwII6lN9klf+T Cy32B4WB8FBO9M7oJYD/yytJ3ibcLuMwKwTy/GGfaTspuYDI/HrplUD3Pt+trDPc fUxmTNjK9vE9foPAwOTSwTGsdOp5ICoZuDjHTj8gtHmfFLclDxxJMojtthMJ1Csc qn9oJzjLn3izn8C6CY6oXGnqOt6gy2lz+RqNKlve/bwxaVdQIXTXCVsLWwQZuj48 VI9qAFw9TsgSBi9dlTYpVfdMvItO73SVYd2c1ETzL0YSNK3S/Yhpww7fyK8TQNpO Y8xXMMBMybHZej1ixHXh01CRmEnBZXpjLCIXnWwxQGXxTH8p7F+W1+lhDTL4IIXR Py0EwiPUj4sPyTW2htSnDBRtE1uHcJlDtsFAAmsEqfeASet7ueE2bkfKwWUftqTs GZ7ikseIb9F0eQKjecYcEfaLtYNn+0UflgVkimW1gXIeuO58VYLpa8vdiUV3eKJn UCDDHGYKf7QJQLpSzYWXGRT4HJOQvaCbJ0a03hKceYyLB6rJv96khajirbczKZ92 cja0EJfDy5S9fBulWRveHKLUAFMrR3zA4DhlK0pb591uIs4iMcKH3egHQZpv0uf0 iifWNI+AFuorhQfdhV2G4Zg1g/fwI2RRJK7HdBOklulUrcr0caPvjjGdbA3Q0Hf6 u61pWdr+Yb3XPaqlC2AH =EFHC -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/juanquintela/tags/migration/20170613' into staging migration/next for 20170613 # gpg: Signature made Tue 13 Jun 2017 10:01:45 BST # gpg: using RSA key 0xF487EF185872D723 # gpg: Good signature from "Juan Quintela <quintela@redhat.com>" # gpg: aka "Juan Quintela <quintela@trasno.org>" # Primary key fingerprint: 1899 FF8E DEBF 58CC EE03 4B82 F487 EF18 5872 D723 * remotes/juanquintela/tags/migration/20170613: migration: Move migration.h to migration/ migration: Move remaining exported functions to migration/misc.h migration: create global_state.c migration: ram_control_* are implemented in qemu_file migration: Commands are only used inside migration.c migration: Move constants to savevm.h migration: Move dump_vmsate_json_to_file() to misc.h migration: Split registration functions from vmstate.h migration: Move self_announce_delay() to misc.h migration: Remove MigrationState from migration_channel_incomming() ram: Now POSTCOPY_ACTIVE is the same that STATUS_ACTIVE ram: Print block stats also in the complete case migration: Don't try to set *errp directly migration: isolate return path on src Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-06-13 13:51:29 +01:00
Juan Quintela	c4b63b7cc5	migration: Move remaining exported functions to migration/misc.h Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-06-13 11:00:45 +02:00
Juan Quintela	84a899de8c	migration: create global_state.c It don't belong anywhere else, just the global state where everybody can stick other things. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Laurent Vivier <lvivier@redhat.com>	2017-06-13 11:00:45 +02:00
Juan Quintela	f2a8f0a631	migration: Split registration functions from vmstate.h They are indpendent, and nowadays almost every device register things with qdev->vmsd. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-06-13 11:00:44 +02:00
Greg Kurz	ad265631c0	xics: introduce macros for ICP/ICS link properties These properties are part of the XICS API. They deserve to appear explicitely in the XICS header file. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-06-09 12:12:34 +10:00
Thomas Huth	4871dd4c3f	hw/ppc/spapr: Adjust firmware name for PCI bridges SLOF uses "pci" as name for PCI bridges nodes in the device tree instead of "pci-bridges", so booting via bootindex from a device behind a PCI bridge currently does not work since QEMU passes the wrong name in the "qemu,boot-list" property. Fix it by changing the name of the PCI bridge nodes to "pci" instead. Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1459170 Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-06-08 14:38:27 +10:00
David Gibson	0be4e88621	spapr: Change DRC attach & detach methods to functions DRC objects have attach & detach methods, but there's only one implementation. Although there are some differences in its behaviour for different DRC types, the overall structure is the same, so while we might want different method implementations for some parts, we're unlikely to want them for the top-level functions. So, replace them with direct function calls. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>	2017-06-08 14:38:26 +10:00
David Gibson	454b580ae9	spapr: Don't misuse DR-indicator in spapr_recover_pending_dimm_state() With some combinations of migration and hotplug we can lost temporary state indicating how many DRCs (guest side hotplug handles) are still connected to a DIMM object in the process of removal. When we hit that situation spapr_recover_pending_dimm_state() is used to scan more extensively and work out the right number. It does this using drc->indicator state to determine what state of disconnection the DRC is in. However, this is not safe, because the indicator state is guest settable - in fact it's more-or-less a purely guest->host notification mechanism which should have no bearing on the internals of hotplug state management. So, replace the test for this with a test on drc->dev, which is a purely qemu side managed variable, and updated the same BQL critical section as the indicator state. This does introduce an off-by-one change, because the indicator state was updated before the call to spapr_lmb_release() on the current DRC, whereas drc->dev is updated afterwards. That's corrected by always decrementing the nr_lmbs value instead of only doing so in the case where we didn't have to recover information. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>	2017-06-08 14:38:26 +10:00
Greg Kurz	8a9e0e7b89	spapr: fix memory leak in spapr_memory_pre_plug() The string returned by object_property_get_str() is dynamically allocated. (Spotted by Coverity, CID 1375942) Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-06-08 11:05:31 +10:00
Peter Maydell	e02bbe1956	ppc patch queue 2017-06-06 Accumulated patches for ppc targets and the pseries machine type. The big thing in this batch is a start on a substantial cleanup of the pseries hotplug mechanisms, which were pretty confusing. For now these shouldn't cause substantial behavioural changes, but I am hoping these lead to clearer code and eventually to fixes for the bugs we have in hotplug handling, particularly when hotplug and migration are combined. The remaining patches are mostly bugfixes. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJZNhgSAAoJEGw4ysog2bOSERwP/A7T7UJ8XXWit9QXGCi+G83w +RUuHxjA9qFqrg1zYqFyLg3ctGl93Sxu7mzI5MOIKwAVXlTsE6+84TH7zBc18DPB fekPWmzJ6jfiVO+1Zg1JPWorMfIHDDc2v6Q6qPfD8KWbt02yPfrXbKlivQB4hVZ4 Qb4VJdjZgBDcVy79xhcW5k6v8dVw8PdSyDmkQrBhccI0noLerhI41Mgt7QQaWQRH Le3ziexUpWelVCRQB0FqE/PIWo2+NY/e0pumX7Aqtjs/G35KjOXy0ja3yKLjfeUW Z4NugIO2I2hncERa68YFar/BqG26DX8KCErNMDkn7LyZcoDAQWhcDH+65G1BNuf2 jW+KApMNm+N1vXabbz8P9BbLjuZpRQQhyPOxB3I8UGaTYGtCPe/lUCe2/V8EbKNa VFavc1UuLftOZuJj/rYGJeU/4JBU6srbAKCO3VVK4Tnd8DyiT3QCpUWEkjv+J6jo co35oYBavLfQPMr+rsX15lgbmZwg7iBV+dgKLa2+cwmKXzCf7aYe38aJy7nRBmhb ivhH3bKtdysy0qq4UYaCgW06qQcVF0QMJaxFQ0X7I+GBNwHA7wdZD/i6IMcO6Z7H 7gQdavBTdukgKb2+pVjR58H13ieHXuBxktonhOz70rvEDVa4xx8pxhnZlpSiH2ha RzpkhanrwEeECG6Lke/3 =QDWB -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.10-20170606' into staging ppc patch queue 2017-06-06 Accumulated patches for ppc targets and the pseries machine type. The big thing in this batch is a start on a substantial cleanup of the pseries hotplug mechanisms, which were pretty confusing. For now these shouldn't cause substantial behavioural changes, but I am hoping these lead to clearer code and eventually to fixes for the bugs we have in hotplug handling, particularly when hotplug and migration are combined. The remaining patches are mostly bugfixes. # gpg: Signature made Tue 06 Jun 2017 03:48:50 BST # gpg: using RSA key 0x6C38CACA20D9B392 # gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" # gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" # gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" # gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" # Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392 * remotes/dgibson/tags/ppc-for-2.10-20170606: spapr: Remove some non-useful properties on DRC objects spapr: Eliminate spapr_drc_get_type_str() spapr: Move configure-connector state into DRC spapr: Clean up spapr_dr_connector_by_() spapr: Introduce DRC subclasses spapr/drc: don't migrate DRC of cold-plugged CPUs and LMBs spapr: Allow boot from vhost--scsi backends ppc/pnv: check the return value of fdt_setprop() spapr_nvram: Check return value from blk_getlength() target/ppc: Fixup set_spr error in h_register_process_table target-ppc: Fix openpic timer read register offset spapr: Make DRC get_index and get_type methods into plain functions spapr: Abolish DRC set_configured method spapr: Abolish DRC get_fdt method spapr: Move DRC RTAS calls into spapr_drc.c migration: Mark CPU states dirty before incoming migration/loadvm migration: remove register_savevm() Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-06-06 14:30:06 +01:00
David Gibson	b8fdd530be	spapr: Move configure-connector state into DRC Currently the sPAPRMachineState contains a list of sPAPRConfigureConnector structures which store intermediate state for the ibm,configure-connector RTAS call. This was an attempt to separate this state from the core of the DRC state. However the configure connector process is intimately tied to the DRC model, so there's really no point trying to have two levels of interface here. Moving the configure-connector state into its corresponding DRC allows removal of a number of helpers for maintaining the anciliary list. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>	2017-06-06 09:24:17 +10:00
David Gibson	fbf5539718	spapr: Clean up spapr_dr_connector_by_() Change names to something less ludicrously verbose * Now that we have QOM subclasses for the different DRC types, use a QOM typename instead of a PAPR type value parameter The latter allows removal of the get_type_shift() helper. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>	2017-06-06 09:24:08 +10:00
David Gibson	2d33581899	spapr: Introduce DRC subclasses Currently we only have a single QOM type for all DRCs, but lots of places where we switch behaviour based on the DRC's PAPR defined type. This is a poor use of our existing type system. So, instead create QOM subclasses for each PAPR defined DRC type. We also introduce intermediate subclasses for physical and logical DRCs, a division which will be useful later on. Instead of being stored in the DRC object itself, the PAPR type is now stored in the class structure. There are still many places where we switch directly on the PAPR type value, but this at least provides the basis to start to remove those. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>	2017-06-06 09:23:46 +10:00
Felipe Franciosi	c4e13492af	spapr: Allow boot from vhost--scsi backends The current implementation of spapr_get_fw_dev_path() doesn't take into consideration vhost--scsi devices. This makes said devices unbootable on PPC as SLOF is unable to work out the path to scan boot disks. This makes VMs bootable on spapr when using vhost-*-scsi by implementing a disk path for VHostSCSICommon (which currently includes both vhost-user-scsi and vhost-scsi). Signed-off-by: Felipe Franciosi <felipe@nutanix.com> Signed-off-by: Mike Cui <cui@nutanix.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-06-06 09:19:01 +10:00
David Gibson	0b55aa91c9	spapr: Make DRC get_index and get_type methods into plain functions These two methods only have one implementation, and the spec they're implementing means any other implementation is unlikely, verging on impossible. So replace them with simple functions. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>	2017-06-06 08:53:24 +10:00
Igor Mammedov	99861ecbc5	spapr: cleanup spapr_fixup_cpu_numa_dt() usage even though spapr_fixup_cpu_numa_dt() has no effect on FDT if numa is disabled, don't call it uselessly. It makes it obvious at call sites that function is needed only when numa is enabled. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <1496161442-96665-7-git-send-email-imammedo@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-06-05 14:59:09 -03:00
Igor Mammedov	15f8b14228	numa: move numa_node from CPUState into target specific classes Move vcpu's associated numa_node field out of generic CPUState into inherited classes that actually care about cpu<->numa mapping, i.e: ARMCPU, PowerPCCPU, X86CPU. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <1496161442-96665-6-git-send-email-imammedo@redhat.com> [ehabkost: s/CPU is belonging to/CPU belongs to/ on comments] Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-06-05 14:59:09 -03:00
Igor Mammedov	a0ceb640d0	numa: consolidate cpu_preplug fixups/checks for pc/arm/spapr Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <1496161442-96665-2-git-send-email-imammedo@redhat.com> [ehabkost: Fix indentation] Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-06-05 14:59:08 -03:00
Daniel Henrique Barboza	16ee99805e	hw/ppc/spapr.c: recover pending LMB unplug info in spapr_lmb_release When a LMB hot unplug starts, the current DRC LMB status is stored at spapr->pending_dimm_unplugs QTAILQ. This queue isn't migrated, thus if a migration occurs in the middle of a LMB unplug the spapr_lmb_release callback will lost track of the LMB unplug progress. This patch implements a new recover function spapr_recover_pending_dimm_state that is used inside spapr_lmb_release to recover this DRC LMB release status that is lost during the migration. Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> [dwg: Minor stylistic changes, simplify error handling] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-05-25 11:31:33 +10:00
Daniel Henrique Barboza	318347234d	hw/ppc: removing drc->detach_cb and drc->detach_cb_opaque The pointer drc->detach_cb is being used as a way of informing the detach() function inside spapr_drc.c which cb to execute. This information can also be retrieved simply by checking drc->type and choosing the right callback based on it. In this context, detach_cb is redundant information that must be managed. After the previous spapr_lmb_release change, no detach_cb_opaques are being used by any of the three callbacks functions. This is yet another information that is now unused and, on top of that, can't be migrated either. This patch makes the following changes: - removal of detach_cb_opaque. the 'opaque' argument was removed from the callbacks and from the detach() function of sPAPRConnectorClass. The attribute detach_cb_opaque of sPAPRConnector was removed. - removal of detach_cb from the detach() call. The function pointer detach_cb of sPAPRConnector was removed. detach() now uses a switch(drc->type) to execute the apropriate callback. To achieve this, spapr_core_release, spapr_lmb_release and spapr_phb_remove_pci_device_cb callbacks were made public to be visible inside detach(). Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-05-25 11:31:33 +10:00
David Gibson	0cffce56ae	hw/ppc/spapr.c: adding pending_dimm_unplugs to sPAPRMachineState The LMB DRC release callback, spapr_lmb_release(), uses an opaque parameter, a sPAPRDIMMState struct that stores the current LMBs that are allocated to a DIMM (nr_lmbs). After each call to this callback, the nr_lmbs is decremented by one and, when it reaches zero, the callback proceeds with the qdev calls to hot unplug the LMB. Using drc->detach_cb_opaque is problematic because it can't be migrated in the future DRC migration work. This patch makes the following changes to eliminate the usage of this opaque callback inside spapr_lmb_release: - sPAPRDIMMState was moved from spapr.c and added to spapr.h. A new attribute called 'addr' was added to it. This is used as an unique identifier to associate a sPAPRDIMMState to a PCDIMM element. - sPAPRMachineState now hosts a new QTAILQ called 'pending_dimm_unplugs'. This queue of sPAPRDIMMState elements will store the DIMM state of DIMMs that are currently going under an unplug process. - spapr_lmb_release() will now retrieve the nr_lmbs value by getting the correspondent sPAPRDIMMState. A helper function called spapr_dimm_get_address was created to fetch the address of a PCDIMM device inside spapr_lmb_release. When nr_lmbs reaches zero and the callback proceeds with the qdev hot unplug calls, the sPAPRDIMMState struct is removed from spapr->pending_dimm_unplugs. After these changes, the opaque argument for spapr_lmb_release is now unused and is passed as NULL inside spapr_del_lmbs. This and the other opaque arguments can now be safely removed from the code. As an additional cleanup made by this patch, the spapr_del_lmbs function was merged with spapr_memory_unplug_request. The former was being called only by the latter and both were small enough to fit one single function. Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> [dwg: Minor stylistic cleanups] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-05-25 11:31:28 +10:00
Laurent Vivier	c871bc70bb	spapr: add pre_plug function for memory This allows to manage errors before the memory has started to be hotplugged. We already have the function for the CPU cores. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> [dwg: Fixed a couple of style nits] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-05-24 17:27:39 +10:00
David Gibson	459264ef24	pseries: Restore support for total vcpus not a multiple of threads-per-core for old machine types As of pseries-2.7 and later, we require the total number of guest vcpus to be a multiple of the threads-per-core. pseries-2.6 and earlier machine types, however, are supposed to allow this for the sake of migration from old qemu versions which allowed this. Unfortunately, `8149e29` "pseries: Enforce homogeneous threads-per-core" broke this by not considering the old machine type case. This fixes it by only applying the check when the machine type supports hotpluggable cpus. By not-entirely-coincidence, that corresponds to the same time when we started enforcing total threads being a multiple of threads-per-core. Fixes: `8149e2992f` Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Tested-by: Greg Kurz <groug@kaod.org>	2017-05-24 11:39:53 +10:00
Greg Kurz	3d85885a1b	spapr: fix error reporting in xics_system_init() If the user explicitely asked for kernel-irqchip support and "xics-kvm" initialization fails, we shouldn't fallback to emulated "xics" as we do now. It is also awkward to print an error message when we have an errp pointer argument. Let's use the errp argument to report the error and let the caller decide. This simplifies the code as we don't need a local Error * here. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-05-24 11:39:53 +10:00
Greg Kurz	07572c0653	spapr: ensure core_slot isn't NULL in spapr_core_unplug() If we go that far on the path of hot-removing a core and we find out that the core-id is invalid, then we have a serious bug. Let's make it explicit with an assert() instead of dereferencing a NULL pointer. This fixes Coverity issue CID 1375404. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-05-24 11:39:53 +10:00
Bharata B Rao	06ec79e865	spapr: Consolidate HPT freeing code into a routine Consolidate the code that frees HPT into a separate routine spapr_free_hpt() as the same chunk of code is called from two places. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-05-24 11:39:52 +10:00
Greg Kurz	175d2aa038	spapr: sanitize error handling in spapr_ics_create() The spapr_ics_create() function handles errors in a rather convoluted way, with two local Error * variables. Moreover, failing to parent the ICS object to the machine should be considered as a bug but it is currently ignored. This patch addresses both issues. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-05-24 11:39:52 +10:00
Greg Kurz	f63ebfe0ac	ppc/xics: simplify prototype of xics_spapr_init() This function only does hypercall and RTAS-call registration, and thus never returns an error. This patch adapt the prototype to reflect that. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-05-24 11:39:52 +10:00
Stefan Hajnoczi	ba9915e1f8	x86 and machine queue, 2017-05-11 Highlights: * New "-numa cpu" option * NUMA distance configuration * migration/i386 vmstatification -----BEGIN PGP SIGNATURE----- iQIcBAABCAAGBQJZFLh3AAoJECgHk2+YTcWmppQQAJe9Y5a3VwHXqvHbwBHX2ysn RZDAUPd9DWpbM+UUydyKOVIZ7u5RXbbVq4E0NeCD8VYYd+grZB5Wo1cAzy3b4U2j 2s+MDqaPMtZtGoqxTsyQOVoVxazT5Kf1zglK+iUEzik44J7LGdro+ty2Z7Ut2c11 q9rE/GNS78czBm7c4lxgkxXW4N95K/tEGlLtDQ7uct//3U/ZimF+mO6GcbVFlOWT 4iEbOz2sqvBVv22nLJRufiPgFNIW4hizAz5KBWxwGFCCKvT3N6yYNKKjzEpCw+jE lpjIRODU02yIZZZY841fLRtyrk7p4zORS8jRaHTdEJgb5bGc/YazxxVL8nzRQT1W VxFwAMd+UNrDkV24hpN++Ln2O+b3kwcGZ7uA/qu9d5WvSYUKXlHqcMJ35q6zuhAI /ecfYO7EZfVP86VjIt5IH04iV8RChA9Q6de+kQEFa6wHUxufeCOwCFqukGo8zj07 plX8NcjnzYmSXKnYjHOHao4rKT+DiJhRB60rFiMeKP/qvKbZPjtgsIeonhHm53qZ /QwkhowahHKkpAnetIl0QHm8KS4YudAofMi/Fl+he4gRkEbSQVAo6iQb2L4cjcLC LNSDDsIVWGem4gCR+vcsFqB3lggRDfltHXm15JKh92UMpOr6RI6s8pD55T7EdnPC CfdxWB5kYM6/lLbOHj94 =48wH -----END PGP SIGNATURE----- Merge remote-tracking branch 'ehabkost/tags/x86-and-machine-pull-request' into staging x86 and machine queue, 2017-05-11 Highlights: * New "-numa cpu" option * NUMA distance configuration * migration/i386 vmstatification # gpg: Signature made Thu 11 May 2017 08:16:07 PM BST # gpg: using RSA key 0x2807936F984DC5A6 # gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>" # gpg: Note: This key has expired! # Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF D1AA 2807 936F 984D C5A6 * ehabkost/tags/x86-and-machine-pull-request: (29 commits) migration/i386: Remove support for pre-0.12 formats vmstatification: i386 FPReg migration/i386: Remove old non-softfloat 64bit FP support tests: check -numa node,cpu=props_list usecase numa: add '-numa cpu,...' option for property based node mapping numa: remove node_cpu bitmaps as they are no longer used numa: use possible_cpus for not mapped CPUs check machine: call machine init from wrapper numa: remove no longer need numa_post_machine_init() tests: numa: add case for QMP command query-cpus QMP: include CpuInstanceProperties into query_cpus output output virt-arm: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu() spapr: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu() pc: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu() numa: do default mapping based on possible_cpus instead of node_cpu bitmaps numa: mirror cpu to node mapping in MachineState::possible_cpus numa: add check that board supports cpu_index to node mapping virt-arm: add node-id property to CPU pc: add node-id property to CPU spapr: add node-id property to sPAPR core ... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-05-15 14:12:03 +01:00
Igor Mammedov	722387e78d	spapr: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu() it's safe to remove thread node_id != core node_id error branch as machine_set_cpu_numa_node() also does mismatch check and is called even before any CPU is created. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <1494415802-227633-10-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-05-11 16:08:49 -03:00
Igor Mammedov	0b8497f08c	spapr: add node-id property to sPAPR core it will allow switching from cpu_index to core based numa mapping in follow up patches. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <1494415802-227633-3-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-05-11 16:08:48 -03:00
Igor Mammedov	ea089eebbd	numa: move source of default CPUs to NUMA node mapping into boards Originally CPU threads were by default assigned in round-robin fashion. However it was causing issues in guest since CPU threads from the same socket/core could be placed on different NUMA nodes. Commit `fb43b73b` (pc: fix default VCPU to NUMA node mapping) fixed it by grouping threads within a socket on the same node introducing cpu_index_to_socket_id() callback and commit `20bb648d` (spapr: Fix default NUMA node allocation for threads) reused callback to fix similar issues for SPAPR machine even though socket doesn't make much sense there. As result QEMU ended up having 3 default distribution rules used by 3 targets /virt-arm, spapr, pc/. In effort of moving NUMA mapping for CPUs into possible_cpus, generalize default mapping in numa.c by making boards decide on default mapping and let them explicitly tell generic numa code to which node a CPU thread belongs to by replacing cpu_index_to_socket_id() with @cpu_index_to_instance_props() which provides default node_id assigned by board to specified cpu_index. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <1494415802-227633-2-git-send-email-imammedo@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-05-11 16:08:48 -03:00
Laurent Vivier	3bfe57165b	numa: equally distribute memory on nodes When there are more nodes than available memory to put the minimum allowed memory by node, all the memory is put on the last node. This is because we put (ram_size / nb_numa_nodes) & ~((1 << mc->numa_mem_align_shift) - 1); on each node, and in this case the value is 0. This is particularly true with pseries, as the memory must be aligned to 256MB. To avoid this problem, this patch uses an error diffusion algorithm [1] to distribute equally the memory on nodes. We introduce numa_auto_assign_ram() function in MachineClass to keep compatibility between machine type versions. The legacy function is used with pseries-2.9, pc-q35-2.9 and pc-i440fx-2.9 (and previous), the new one with all others. Example: qemu-system-ppc64 -S -nographic -nodefaults -monitor stdio -m 1G -smp 8 \ -numa node -numa node -numa node \ -numa node -numa node -numa node Before: (qemu) info numa 6 nodes node 0 cpus: 0 6 node 0 size: 0 MB node 1 cpus: 1 7 node 1 size: 0 MB node 2 cpus: 2 node 2 size: 0 MB node 3 cpus: 3 node 3 size: 0 MB node 4 cpus: 4 node 4 size: 0 MB node 5 cpus: 5 node 5 size: 1024 MB After: (qemu) info numa 6 nodes node 0 cpus: 0 6 node 0 size: 0 MB node 1 cpus: 1 7 node 1 size: 256 MB node 2 cpus: 2 node 2 size: 0 MB node 3 cpus: 3 node 3 size: 256 MB node 4 cpus: 4 node 4 size: 256 MB node 5 cpus: 5 node 5 size: 256 MB [1] https://en.wikipedia.org/wiki/Error_diffusion Signed-off-by: Laurent Vivier <lvivier@redhat.com> Message-Id: <20170502162955.1610-2-lvivier@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> [ehabkost: s/ram_size/size/ at numa_default_auto_assign_ram()] Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-05-11 16:08:47 -03:00
David Gibson	9bf502fe12	spapr: Don't accidentally advertise HTM support on POWER9 Logic in spapr_populate_pa_features() enables the bit advertising Hardware Transactional Memory (HTM) in the guest's device tree only when KVM advertises its availability with the KVM_CAP_PPC_HTM feature. However, this assumes that the HTM bit is off in the base template used for the device tree value. That is true for POWER8, but not for POWER9. It looks like that was accidentally changed in `9fb4541` "spapr: Enable ISA 3.0 MMU mode selection via CAS". Fixes: `9fb4541f58` Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com>	2017-05-11 09:45:15 +10:00
Suraj Jitindar Singh	545d6e2b5c	target/ppc: Enable RADIX mmu mode for pseries TCG guest Now that we have added all the infrastructure we can enable a pseries TCG guest to use radix. In order to do this we have to add the appropriate bits to the ibm,arch-vec-5-platform-support vector to represent that we support both hash and radix mmu models. A radix guest can now be booted in pseries tcg mode by specifying: -cpu POWER9 Note that we assume hash, that is we allocate a hpt, until a guest tells us otherwise via a H_REGISTER_PROCESS_TABLE call with radix specified - in which case we free the hpt. If we were right and the guest is hash then there's nothing for us to do. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-05-11 09:45:15 +10:00
Cédric Le Goater	71cd4dace9	spapr: remove the 'nr_servers' field from the machine xics_system_init() does not need 'nr_servers' anymore as it is only used to define the 'interrupt-controller' node in the device tree. So let's just compute the value when calling spapr_dt_xics(). This also gives us an opportunity to simplify the xics_system_init() routine and introduce a specific spapr_ics_create() helper to create the sPAPR ICS object. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:41:55 +10:00
Cédric Le Goater	5bc8d26de2	spapr: allocate the ICPState object from under sPAPRCPUCore Today, all the ICPs are created before the CPUs, stored in an array under the sPAPR machine and linked to the CPU when the core threads are realized. This modeling brings some complexity when a lookup in the array is required and it can be simplified by allocating the ICPs when the CPUs are. This is the purpose of this proposal which introduces a new 'icp_type' field under the machine and creates the ICP objects of the right type (KVM or not) before the PowerPCCPU object are. This change allows more cleanups : the removal of the icps array under the sPAPR machine and the removal of the xics_get_cpu_index_by_dt_id() helper. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:00:42 +10:00
Cédric Le Goater	06747ba6d4	spapr: move the IRQ server number mapping under the machine This is the second step to abstract the IRQ 'server' number of the XICS layer. Now that the prereq cleanups have been done in the previous patch, we can move down the 'cpu_dt_id' to 'cpu_index' mapping in the sPAPR machine handler. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:00:42 +10:00
Alexey Kardashevskiy	3dc410ae83	target-ppc/kvm: Enable in-kernel TCE acceleration for multi-tce This enables in-kernel handling of H_PUT_TCE_INDIRECT and H_STUFF_TCE hypercalls. The host kernel support is there since v4.6, in particular d3695aa4f452 ("KVM: PPC: Add support for multiple-TCE hcalls"). H_PUT_TCE is already accelerated and does not need any special enablement. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:00:41 +10:00
Sam Bobroff	e957f6a9b9	spapr: Workaround for broken radix guests For a little while around 4.9, Linux kernels that saw the radix bit in ibm,pa-features would attempt to set up the MMU as if they were a hypervisor, even if they were a guest, which would cause them to crash. Work around this by detecting pre-ISA 3.0 guests by their lack of that bit in option vector 1, and then removing the radix bit from ibm,pa-features. Note: This now requires regeneration of that node after CAS negotiation. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> [dwg: Fix style nits] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:00:41 +10:00
Sam Bobroff	9fb4541f58	spapr: Enable ISA 3.0 MMU mode selection via CAS Add the new node, /chosen/ibm,arch-vec-5-platform-support to the device tree. This allows the guest to determine which modes are supported by the hypervisor. Update the option vector processing in h_client_architecture_support() to handle the new MMU bits. This allows guests to request hash or radix mode and QEMU to create the guest's HPT at this time if it is necessary but hasn't yet been done. QEMU will terminate the guest if it requests an unavailable mode, as required by the architecture. Extend the ibm,pa-features node with the new ISA 3.0 values and set the radix bit if KVM supports radix mode. This probably won't be used directly by guests to determine the availability of radix mode (that is indicated by the new node added above) but the architecture requires that it be set when the hardware supports it. If QEMU is using KVM, and KVM is capable of running in radix mode, guests can be run in real-mode without allocating a HPT (because KVM will use a minimal RPT). So in this case, we avoid creating the HPT at reset time and later (during CAS) create it if it is necessary. ISA 3.0 guests will now begin to call h_register_process_table(), which has been added previously. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> [dwg: Strip some unneeded prefix from error messages] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:00:41 +10:00
Sam Bobroff	86d5771a5a	spapr: move spapr_populate_pa_features() In the next patch, spapr_fixup_cpu_dt() will need to call spapr_populate_pa_features() so move it's definition up without making any other changes. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:00:41 +10:00
Suraj Jitindar Singh	b4db54132f	target/ppc: Implement H_REGISTER_PROCESS_TABLE H_CALL The H_REGISTER_PROCESS_TABLE H_CALL is used by a guest to indicate to the hypervisor where in memory its process table is and how translation should be performed using this process table. Provide the implementation of this H_CALL for a guest. We first check for invalid flags, then parse the flags to determine the operation, and then check the other parameters for valid values based on the operation (register new table/deregister table/maintain registration). The process table is then stored in the appropriate location and registered with the hypervisor (if running under KVM), and the LPCR_[UPRT/GTSE] bits are updated as required. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> [dwg: Correct missing prototype and uninitialized variable] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:00:41 +10:00
Sam Bobroff	c64abd1f9c	spapr: Add ibm,processor-radix-AP-encodings to the device tree Use the new ioctl, KVM_PPC_GET_RMMU_INFO, to fetch radix MMU information from KVM and present the page encodings in the device tree under ibm,processor-radix-AP-encodings. This provides page size information to the guest which is necessary for it to use radix mode. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> [dwg: Compile fix for 32-bit targets, style nit fix] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:00:41 +10:00
Cédric Le Goater	147ff8079e	ppc/spapr: QOM'ify sPAPRRTCState Also use an 'sPAPRRTCState' attribute under the sPAPR machine to hold the RTC object. Overall, these changes remove an unnecessary and implicit dependency on SysBus. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:00:41 +10:00
David Gibson	3fa14fbe13	pseries: Add pseries-2.10 machine type Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:00:41 +10:00
David Gibson	8149e2992f	pseries: Enforce homogeneous threads-per-core For reasons that may be useful in future, CPU core objects, as used on the pseries machine type have their own nr-threads property, potentially allowing cores with different numbers of threads in the same system. If the user/management uses the values specified in query-hotpluggable-cpus as they're expected to do, this will never matter in pratice. But that's not actually enforced - it's possible to manually specify a core with a different number of threads from that in -smp. That will confuse the platform - most immediately, this can be used to create a CPU thread with index above max_cpus which leads to an assertion failure in spapr_cpu_core_realize(). For now, enforce that all cores must have the same, standard, number of threads. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Bharata B Rao <bharata@linux.vnet.ibm.com>	2017-04-03 13:46:18 +10:00
Marc-André Lureau	24ec2863b1	spapr: fix buffer-overflow Running postcopy-test with ASAN produces the following error: QTEST_QEMU_BINARY=ppc64-softmmu/qemu-system-ppc64 tests/postcopy-test ... ================================================================= ==23641==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7f1556600000 at pc 0x55b8e9d28208 bp 0x7f1555f4d3c0 sp 0x7f1555f4d3b0 READ of size 8 at 0x7f1556600000 thread T6 #0 0x55b8e9d28207 in htab_save_first_pass /home/elmarco/src/qq/hw/ppc/spapr.c:1528 #1 0x55b8e9d2939c in htab_save_iterate /home/elmarco/src/qq/hw/ppc/spapr.c:1665 #2 0x55b8e9beae3a in qemu_savevm_state_iterate /home/elmarco/src/qq/migration/savevm.c:1044 #3 0x55b8ea677733 in migration_thread /home/elmarco/src/qq/migration/migration.c:1976 #4 0x7f15845f46c9 in start_thread (/lib64/libpthread.so.0+0x76c9) #5 0x7f157d9d0f7e in clone (/lib64/libc.so.6+0x107f7e) 0x7f1556600000 is located 0 bytes to the right of 2097152-byte region [0x7f1556400000,0x7f1556600000) allocated by thread T0 here: #0 0x7f159bb76980 in posix_memalign (/lib64/libasan.so.3+0xc7980) #1 0x55b8eab185b2 in qemu_try_memalign /home/elmarco/src/qq/util/oslib-posix.c:106 #2 0x55b8eab186c8 in qemu_memalign /home/elmarco/src/qq/util/oslib-posix.c:122 #3 0x55b8e9d268a8 in spapr_reallocate_hpt /home/elmarco/src/qq/hw/ppc/spapr.c:1214 #4 0x55b8e9d26e04 in ppc_spapr_reset /home/elmarco/src/qq/hw/ppc/spapr.c:1261 #5 0x55b8ea12e913 in qemu_system_reset /home/elmarco/src/qq/vl.c:1697 #6 0x55b8ea13fa40 in main /home/elmarco/src/qq/vl.c:4679 #7 0x7f157d8e9400 in __libc_start_main (/lib64/libc.so.6+0x20400) Thread T6 created by T0 here: #0 0x7f159bae0488 in __interceptor_pthread_create (/lib64/libasan.so.3+0x31488) #1 0x55b8eab1d9cb in qemu_thread_create /home/elmarco/src/qq/util/qemu-thread-posix.c:465 #2 0x55b8ea67874c in migrate_fd_connect /home/elmarco/src/qq/migration/migration.c:2096 #3 0x55b8ea66cbb0 in migration_channel_connect /home/elmarco/src/qq/migration/migration.c:500 #4 0x55b8ea678f38 in socket_outgoing_migration /home/elmarco/src/qq/migration/socket.c:87 #5 0x55b8eaa5a03a in qio_task_complete /home/elmarco/src/qq/io/task.c:142 #6 0x55b8eaa599cc in gio_task_thread_result /home/elmarco/src/qq/io/task.c:88 #7 0x7f15823e38e6 (/lib64/libglib-2.0.so.0+0x468e6) SUMMARY: AddressSanitizer: heap-buffer-overflow /home/elmarco/src/qq/hw/ppc/spapr.c:1528 in htab_save_first_pass index seems to be wrongly incremented, unless I miss something that would be worth a comment. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-29 11:35:02 +11:00
Laurent Vivier	55641213fc	numa,spapr: align default numa node memory size to 256MB Since commit `224245b` ("spapr: Add LMB DR connectors"), NUMA node memory size must be aligned to 256MB (SPAPR_MEMORY_BLOCK_SIZE). But when "-numa" option is provided without "mem" parameter, the memory is equally divided between nodes, but 8MB aligned. This can be not valid for pseries. In that case we can have: $ ./ppc64-softmmu/qemu-system-ppc64 -m 4G -numa node -numa node -numa node qemu-system-ppc64: Node 0 memory size 0x55000000 is not aligned to 256 MiB With this patch, we have: (qemu) info numa 3 nodes node 0 cpus: 0 node 0 size: 1280 MB node 1 cpus: node 1 size: 1280 MB node 2 cpus: node 2 size: 1536 MB Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-22 11:32:42 +11:00
David Gibson	82516263ce	pseries: Don't expose PCIe extended config space on older machine types `bb9986452` "spapr_pci: Advertise access to PCIe extended config space" allowed guests to access the extended config space of PCI Express devices via the PAPR interfaces, even though the paravirtualized bus mostly acts like plain PCI. However, that patch enabled access unconditionally, including for existing machine types, which is an unwise change in behaviour. This patch limits the change to pseries-2.9 (and later) machine types. Suggested-by: Andrea Bolognani <abologna@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-14 11:54:17 +11:00
Cédric Le Goater	7ea6e06717	ppc/xics: register reset handlers for the ICP and ICS objects The recent changes on the XICS layer removed the XICSState object to let the sPAPR machine handle the ICP and ICS directly. The reset of these objects was previously handled by XICSState, which was a SysBus device, and to keep the same behavior, the ICP and ICS were assigned to SysbBus. But that broke the 'info qtree' command in the monitor. 'qtree' performs a loop on the children of a bus to print their properties and SysBus devices are expected to be found under SysBus, which is not the case anymore. The fix for this problem is to register reset handlers for the ICP and ICS objects and stop using SysBus for such devices. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Tested-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-06 10:07:38 +11:00
Sam Bobroff	ec975e839c	spapr: Small cleanup of PPC MMU enums The PPC MMU types are sometimes treated as if they were a bit field and sometime as if they were an enum which causes maintenance problems: flipping bits in the MMU type (which is done on both the 1TB segment and 64K segment bits) currently produces new MMU type values that are not handled in every "switch" on it, sometimes causing an abort(). This patch provides some macros that can be used to filter out the "bit field-like" bits so that the remainder of the value can be switched on, like an enum. This allows removal of all of the "degraded" types from the list and should ease maintenance. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-03 11:30:59 +11:00
Suraj Jitindar Singh	4975c098c9	target/ppc/POWER9: Add POWER9 pa-features definition Add a pa-features definition which includes all of the new fields which have been added, note we don't claim support for any of these new features at this stage. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-03 11:30:59 +11:00
Suraj Jitindar Singh	9861bb3efd	target/ppc: Add patb_entry to sPAPRMachineState ISA v3.00 adds the idea of a partition table which is used to store the address translation details for all partitions on the system. The partition table consists of double word entries indexed by partition id where the second double word contains the location of the process table in guest memory. The process table is registered by the guest via a h-call. We need somewhere to store the address of the process table so we add an entry to the sPAPRMachineState struct called patb_entry to represent the second doubleword of a single partition table entry corresponding to the current guest. We need to store this value so we know if the guest is using radix or hash translation and the location of the corresponding process table in guest memory. Since we only have a single guest per qemu instance, we only need one entry. Since the partition table is technically a hypervisor resource we require that access to it is abstracted by the virtual hypervisor through the get_patbe() call. Currently the value of the entry is never set (and thus defaults to 0 indicating hash), but it will be required to both implement POWER9 kvm support and tcg radix support. We also add this field to be migrated as part of the sPAPRMachineState as we will need it on the receiving side as the guest will never tell us this information again and we need it to perform translation. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-03 11:30:59 +11:00
Cédric Le Goater	6449da4545	ppc/xics: move InterruptStatsProvider to the sPAPR machine It provides a better monitor output of the ICP and ICS objects, else the objects are printed out of order. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-01 11:23:40 +11:00
Cédric Le Goater	a7ff1212e9	ppc/xics: move ics-simple post_load under the machine The ICS object uses a post_load() handler which is implicitly relying on the fact that the internal state of the ICS and ICP objects has been restored but this is not guaranteed. So, let's move the code under the post_load() handler of the machine where we know the objects have been fully restored. The icp_resend() handler of the XICSFabric QOM interface is also removed as it is now obsolete. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-01 11:23:40 +11:00
Cédric Le Goater	e6f7e110ee	ppc/xics: remove the XICSState classes The XICSState classes are not used anymore. They have now been fully deprecated by the XICSFabric QOM interface. Do the cleanups. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-01 11:23:40 +11:00
Cédric Le Goater	2192a9303d	ppc/xics: export the XICS init routines There is nothing left related to the XICS object in the realize functions of the KVMXICSState and XICSState class. So adapt the interfaces to call these routines directly from the sPAPR machine init sequence. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-01 11:23:40 +11:00
Cédric Le Goater	852ad27e14	ppc/xics: move the ICP array under the sPAPR machine This is the last step to remove the XICSState abstraction and have the machine hold all the objects related to interrupts : ICSs and ICPs. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-01 11:23:40 +11:00
Cédric Le Goater	20147f2fce	ppc/xics: register the reset handler of ICP objects The reset of the ICP objects is currently handled by XICS but this can be done for each individual ICP. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-01 11:23:40 +11:00
Cédric Le Goater	b0ec31290c	ppc/xics: simplify spapr_dt_xics() interface spapr_dt_xics() only needs the number of servers to build the device tree nodes. Let's change the routine interface to reflect that. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-01 11:23:39 +11:00
Cédric Le Goater	b4f27d71e3	ppc/xics: use the QOM interface to grab an ICP Also introduce a xics_icp_get() helper to simplify the changes. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-01 11:23:39 +11:00
Cédric Le Goater	b2fc59aaf9	ppc/xics: extend the QOM interface to handle ICPs Let's add two new handlers for ICPs. One is to get an ICP object from a server number and a second is to resend the irqs when needed. The icp_resend() handler is a temporary workaround needed by the ics-simple post_load() handler. It will be removed when the post_load portion can be done at the machine level. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-01 11:23:39 +11:00
Cédric Le Goater	d114a66225	ppc/xics: remove the XICS list of ICS This is not used anymore. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-01 11:23:39 +11:00
Cédric Le Goater	c79b2fdd7b	ppc/xics: register the reset handler of ICS objects The reset of the ICS objects is currently handled by XICS but this can be done for each individual ICS. This also reduces the use of the XICS list of ICS. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-01 11:23:39 +11:00
Cédric Le Goater	2cd908d0ad	ppc/xics: use the QOM interface to resend irqs Also change the ICPState 'xics' backlink to be a XICSFabric, this removes the need of using qdev_get_machine() to get the QOM interface in some of the routines. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-01 11:23:39 +11:00
Cédric Le Goater	7844e12b28	ppc/xics: use the QOM interface under the sPAPR machine Add 'ics_get' and 'ics_resend' handlers to the sPAPR machine. These are relatively simple for a single ICS. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-01 11:23:39 +11:00
Cédric Le Goater	681bfaded6	ppc/xics: store the ICS object under the sPAPR machine A list of ICS objects was introduced under the XICS object for the PowerNV machine but, for the sPAPR machine, it brings extra complexity as there is only a single ICS. To simplify the code, let's add the ICS pointer under the sPAPR machine and try to reduce the use of this list where possible. Also, change the xics_spapr_*() routines to use an ICS object instead of an XICSState and change their name to reflect that these are specific to the sPAPR ICS object. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-01 11:23:39 +11:00
Cédric Le Goater	817bb6a446	ppc/xics: remove set_nr_servers() handler from XICSStateClass Today, the ICP (Interrupt Controller Presenter) objects are created by the 'nr_servers' property handler of the XICS object and a class handler. They are realized in the XICS object realize routine. Let's simplify the process by creating the ICP objects along with the XICS object at the machine level. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-01 11:23:39 +11:00
Cédric Le Goater	4e4169f7a2	ppc/xics: remove set_nr_irqs() handler from XICSStateClass Today, the ICS (Interrupt Controller Source) object is created and realized by the init and realize routines of the XICS object, but some of the parameters are only known at the machine level. These parameters are passed from the sPAPR machine to the ICS object in a rather convoluted way using property handlers and a class handler of the XICS object. The number of irqs required to allocate the IRQ state objects in the ICS realize routine is one of them. Let's simplify the process by creating the ICS object along with the XICS object at the machine level and link the ICS into the XICS list of ICSs at this level also. In the sPAPR machine, there is only a single ICS but that will change with the PowerNV machine. Also, QOMify the creation of the objects and get rid of the superfluous code. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-01 11:23:39 +11:00
David Gibson	738d5db824	xics: XICS should not be a SysBusDevice Currently xics - the component of the IBM POWER interrupt controller representing the overall interrupt fabric / architecture is represented as a descendent of SysBusDevice. However, this is not really correct - the xics presents nothing in MMIO space so it should be an "unattached" device in the current QOM model. Since this device will always be created by the machine type, not created specifically from the command line, and because it has no migrated state it should be safe to move it around the device composition tree. Therefore this patch changes it to a descendent of TYPE_DEVICE, and makes it an unattached device. So that its reset handler still gets called correctly, we add a qdev_set_parent_bus() to attach it to sysbus. It's not really clear that's correct (instead of using register_reset()) but it appears to a common technique. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> [clg corrected problems with reset] Signed-off-by: Cédric Le Goater <clg@kaod.org> [dwg folded together and updated commit message] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-01 11:23:39 +11:00
David Gibson	e57ca75ce3	target/ppc: Manage external HPT via virtual hypervisor The pseries machine type implements the behaviour of a PAPR compliant hypervisor, without actually executing such a hypervisor on the virtual CPU. To do this we need some hooks in the CPU code to make hypervisor facilities get redirected to the machine instead of emulated internally. For hypercalls this is managed through the cpu->vhyp field, which points to a QOM interface with a method implementing the hypercall. For the hashed page table (HPT) - also a hypervisor resource - we use an older hack. CPUPPCState has an 'external_htab' field which when non-NULL indicates that the HPT is stored in qemu memory, rather than within the guest's address space. For consistency - and to make some future extensions easier - this merges the external HPT mechanism into the vhyp mechanism. Methods are added to vhyp for the basic operations the core hash MMU code needs: map_hptes() and unmap_hptes() for reading the HPT, store_hpte() for updating it and hpt_mask() to retrieve its size. To match this, the pseries machine now sets these vhyp fields in its existing vhyp class, rather than reaching into the cpu object to set the external_htab field. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>	2017-03-01 11:23:39 +11:00
Greg Kurz	6244bb7e58	sysemu: support up to 1024 vCPUs Some systems can already provide more than 255 hardware threads. Bumping the QEMU limit to 1024 seems reasonable: - it has no visible overhead in top; - the limit itself has no effect on hot paths. Cc: Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-01 11:23:39 +11:00
Peter Maydell	28f997a82c	This is the MTTCG pull-request as posted yesterday. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJYsBZfAAoJEPvQ2wlanipElJ0H+QGoStSPeHrvKu7Q07v4F9zM Pvf05gRsaxvXl7UbwmXC4oKhvZf9rVJ6ITk0x/y0WvmK0mHCmNBWkC0nn5UFL5IH cdxetLz21Q+Ghpc36tZvqn2HYwRQFoEznge2LdtBDG0TyVA4jwquHU3HCG2D51zi BaImI6lYW1e4ejjZHw8cEInSxsj/HJZE4pPas2Tkci+uAnrJroErwBVRRcE/y/Tn aupl9TJFs2JdyJFNDibIm0kjB+i+jvCiLgYjbKZ/dR/+GZt73TtiBk/q9ZOFjdmT 7YFPI3F46QbGHoZahtzh0Xt7WMj94SlQgQ9OJ3zmNMfpXrze6Yc78xo/nbQ33U0= =wR0/ -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/stsquad/tags/pull-mttcg-240217-1' into staging This is the MTTCG pull-request as posted yesterday. # gpg: Signature made Fri 24 Feb 2017 11:17:51 GMT # gpg: using RSA key 0xFBD0DB095A9E2A44 # gpg: Good signature from "Alex Bennée (Master Work Key) <alex.bennee@linaro.org>" # Primary key fingerprint: 6685 AE99 E751 67BC AFC8 DF35 FBD0 DB09 5A9E 2A44 * remotes/stsquad/tags/pull-mttcg-240217-1: (24 commits) tcg: enable MTTCG by default for ARM on x86 hosts hw/misc/imx6_src: defer clearing of SRC_SCR reset bits target-arm: ensure all cross vCPUs TLB flushes complete target-arm: don't generate WFE/YIELD calls for MTTCG target-arm/powerctl: defer cpu reset work to CPU context cputlb: introduce tlb_flush__all_cpus[_synced] cputlb: atomically update tlb fields used by tlb_reset_dirty cputlb: add tlb_flush_by_mmuidx async routines cputlb and arm/sparc targets: convert mmuidx flushes from varg to bitmap cputlb: introduce tlb_flush_ async work. cputlb: tweak qemu_ram_addr_from_host_nofail reporting cputlb: add assert_cpu_is_self checks tcg: handle EXCP_ATOMIC exception for system emulation tcg: enable thread-per-vCPU tcg: enable tb_lock() for SoftMMU tcg: remove global exit_request tcg: drop global lock during TCG code execution tcg: rename tcg_current_cpu to tcg_current_rr_cpu tcg: add kick timer for single-threaded vCPU emulation tcg: add options for enabling MTTCG ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-02-25 18:43:52 +00:00
Jan Kiszka	8d04fb55de	tcg: drop global lock during TCG code execution This finally allows TCG to benefit from the iothread introduction: Drop the global mutex while running pure TCG CPU code. Reacquire the lock when entering MMIO or PIO emulation, or when leaving the TCG loop. We have to revert a few optimization for the current TCG threading model, namely kicking the TCG thread in qemu_mutex_lock_iothread and not kicking it in qemu_cpu_kick. We also need to disable RAM block reordering until we have a more efficient locking mechanism at hand. Still, a Linux x86 UP guest and my Musicpal ARM model boot fine here. These numbers demonstrate where we gain something: 20338 jan 20 0 331m 75m 6904 R 99 0.9 0:50.95 qemu-system-arm 20337 jan 20 0 331m 75m 6904 S 20 0.9 0:26.50 qemu-system-arm The guest CPU was fully loaded, but the iothread could still run mostly independent on a second core. Without the patch we don't get beyond 32206 jan 20 0 330m 73m 7036 R 82 0.9 1:06.00 qemu-system-arm 32204 jan 20 0 330m 73m 7036 S 21 0.9 0:17.03 qemu-system-arm We don't benefit significantly, though, when the guest is not fully loading a host CPU. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Message-Id: <1439220437-23957-10-git-send-email-fred.konrad@greensocs.com> [FK: Rebase, fix qemu_devices_reset deadlock, rm address_space_* mutex] Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com> [EGC: fixed iothread lock for cpu-exec IRQ handling] Signed-off-by: Emilio G. Cota <cota@braap.org> [AJB: -smp single-threaded fix, clean commit msg, BQL fixes] Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com> [PM: target-arm changes] Acked-by: Peter Maydell <peter.maydell@linaro.org>	2017-02-24 10:32:45 +00:00
Thomas Huth	df58713396	hw/ppc/spapr: Check for valid page size when hot plugging memory On POWER, the valid page sizes that the guest can use are bound to the CPU and not to the memory region. QEMU already has some fancy logic to find out the right maximum memory size to tell it to the guest during boot (see getrampagesize() in the file target/ppc/kvm.c for more information). However, once we're booted and the guest is using huge pages already, it is currently still possible to hot-plug memory regions that does not support huge pages - which of course does not work on POWER, since the guest thinks that it is possible to use huge pages everywhere. The KVM_RUN ioctl will then abort with -EFAULT, QEMU spills out a not very helpful error message together with a register dump and the user is annoyed that the VM unexpectedly died. To avoid this situation, we should check the page size of hot-plugged DIMMs to see whether it is possible to use it in the current VM. If it does not fit, we can print out a better error message and refuse to add it, so that the VM does not die unexpectely and the user has a second chance to plug a DIMM with a matching memory backend instead. Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1419466 Signed-off-by: Thomas Huth <thuth@redhat.com> [dwg: Fix a build error on 32-bit builds with KVM] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-02-22 14:28:53 +11:00
Igor Mammedov	c5514d0e4b	machine: replace query_hotpluggable_cpus() callback with has_hotpluggable_cpus flag Generic helper machine_query_hotpluggable_cpus() replaced target specific query_hotpluggable_cpus() callbacks so there is no need in it anymore. However inon NULL callback value is used to detect/report hotpluggable cpus support, therefore it can be removed completely. Replace it with MachineClass.has_hotpluggable_cpus boolean which is sufficient for the task. Suggested-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-02-22 11:28:28 +11:00
Igor Mammedov	f2d672c248	machine: unify [pc_\|spapr_]query_hotpluggable_cpus() callbacks All callbacks FOO_query_hotpluggable_cpus() are practically the same except of setting vcpus_count to different values. Convert them to a generic machine_query_hotpluggable_cpus() callback by moving vcpus_count initialization to per machine specific callback possible_cpu_arch_ids(). Signed-off-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-02-22 11:28:28 +11:00
Igor Mammedov	535455fdee	spapr: reuse machine->possible_cpus instead of cores[] Replace SPAPR specific cores[] array with generic machine->possible_cpus and store core objects there. It makes cores bookkeeping similar to x86 cpus and will allow to unify similar code. It would allow to replace cpu_index based NUMA node mapping with iproperty based one (for -device created cores) since possible_cpus carries board defined topology/layout. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-02-22 11:28:28 +11:00
Igor Mammedov	115debf26c	spapr: make cpu core unplug follow expected hotunplug call flow spapr_core_unplug() were essentially spapr_core_unplug_request() handler that requested CPU removal and registered callback which did actual cpu core removali but it was called from spapr_machine_device_unplug() which is intended for actual object removal. Commit (`cf632463` spapr: Memory hot-unplug support) sort of fixed it introducing spapr_machine_device_unplug_request() and calling spapr_core_unplug() but it hasn't renamed callback and by mistake calls it from spapr_machine_device_unplug(). However spapr_machine_device_unplug() isn't ever called for cpu core since spapr_core_release() doesn't follow expected hotunplug call flow which is: 1: device_del() -> hotplug_handler_unplug_request() -> set destroy_cb() 2: destroy_cb() -> hotplug_handler_unplug() -> object_unparent // actual device removal Fix it by renaming spapr_core_unplug() to spapr_core_unplug_request() which is called from spapr_machine_device_unplug_request() and making spapr_core_release() call hotplug_handler_unplug() which will call spapr_machine_device_unplug() -> spapr_core_unplug() to remove cpu core. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reveiwed-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-02-22 11:28:27 +11:00
Igor Mammedov	ff9006ddbf	spapr: move spapr_core_[foo]plug() callbacks close to machine code in spapr.c spapr_core_pre_plug/spapr_core_plug/spapr_core_unplug() are managing wiring CPU core into spapr machine state and not internal CPU core state. So move them from spapr_cpu_core.c to spapr.c where other similar (spapr_memory_[foo]plug()) callbacks are located, which also matches x86 target practice. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-02-22 11:28:27 +11:00

... 4 5 6 7 8 ...

870 Commits