qemu-e2k

Commit Graph

Author	SHA1	Message	Date
Markus Armbruster	0b8fa32f55	Include qemu/module.h where needed, drop it from qemu-common.h Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190523143508.25387-4-armbru@redhat.com> [Rebased with conflicts resolved automatically, except for hw/usb/dev-hub.c hw/misc/exynos4210_rng.c hw/misc/bcm2835_rng.c hw/misc/aspeed_scu.c hw/display/virtio-vga.c hw/arm/stm32f205_soc.c; ui/cocoa.m fixed up]	2019-06-12 13:18:33 +02:00
Greg Kurz	75de59416d	spapr: Print out extra hints when CAS negotiation of interrupt mode fails Let's suggest to the user how the machine should be configured to allow the guest to boot successfully. Suggested-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155799221739.527449.14907564571096243745.stgit@bahia.lan> Reviewed-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Tested-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> [dwg: Adjusted for style error] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-05-29 11:39:45 +10:00
Greg Kurz	e7f78db9fb	spapr/xive: Sanity checks of OV5 during CAS If a machine is started with ic-mode=xive but the guest only knows about XICS, eg. an RHEL 7.6 guest, the kernel panics. This is expected but a bit unfortunate since the crash doesn't provide much information for the end user to guess what's happening. Detect that during CAS and exit QEMU with a proper error message instead, like it is already done for the MMU. Even if this is less likely to happen, the opposite case of a guest that only knows about XIVE would certainly fail all the same if the machine is started with ic-mode=xics. Also, the only valid values a guest can pass in byte 23 of OV5 during CAS are 0b00 (XIVE legacy mode) and 0b01 (XIVE exploitation mode). Any other value is a bug, at least with the current spec. Again, it does not seem right to let the guest go on without a precise idea of the interrupt mode it asked for. Handle these cases as well. Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155793986451.464434.12887933000007255549.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-05-29 11:39:45 +10:00
Benjamin Herrenschmidt	a2dd4e83e7	ppc/hash64: Rework R and C bit updates With MT-TCG, we are now running translation in a racy way, thus we need to mimic hardware when it comes to updating the R and C bits, by doing byte stores. The current "store_hpte" abstraction is ill suited for this, we replace it with two separate callbacks for setting R and C. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190411080004.8690-4-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-04-26 11:37:57 +10:00
Benjamin Herrenschmidt	993aaf0c00	ppc/spapr: Use proper HPTE accessors for H_READ Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190411080004.8690-3-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-04-26 11:37:57 +10:00
David Gibson	49e9fdd741	spapr: Correctly set LPCR[GTSE] in H_REGISTER_PROCESS_TABLE `176dccee` "target/ppc/spapr: Clear partition table entry when allocating hash table" reworked the H_REGISTER_PROCESS_TABLE hypercall, but unfortunately due to a small error no longer correctly sets the LPCR[GTSE] bit which allows the guest to directly execute (some types of) tlbie (TLB flush) instructions without involving the hypervisor. We got away with this, initially, because POWER9 did not have hypervisor mode enabled in its msr_mask, which meant we didn't actually run hypervisor privilege checks in TCG at all. However, `da874d90` "target/ppc: add HV support for POWER9" turned on HV support on POWER9 for the benefit of the powernv machine type. This exposed the earlier bug in H_REGISTER_PROCESS_TABLE, and causes guests which rely on LPCR[GTSE] (i.e. basically all of them) to crash during early boot when their first tlbie instruction causes an unexpected trap. Fixes: `176dccee` target/ppc/spapr: Clear partition table entry when allocating hash table Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Tested-by: Cleber Rosa <crosa@redhat.com>	2019-03-19 15:20:14 +11:00
David Gibson	ce2918cbc3	spapr: Use CamelCase properly The qemu coding standard is to use CamelCase for type and structure names, and the pseries code follows that... sort of. There are quite a lot of places where we bend the rules in order to preserve the capitalization of internal acronyms like "PHB", "TCE", "DIMM" and most commonly "sPAPR". That was a bad idea - it frequently leads to names ending up with hard to read clusters of capital letters, and means they don't catch the eye as type identifiers, which is kind of the point of the CamelCase convention in the first place. In short, keeping type identifiers look like CamelCase is more important than preserving standard capitalization of internal "words". So, this patch renames a heap of spapr internal type names to a more standard CamelCase. In addition to case changes, we also make some other identifier renames: VIOsPAPR* -> SpaprVio* The reverse word ordering was only ever used to mitigate the capital cluster, so revert to the natural ordering. VIOsPAPRVTYDevice -> SpaprVioVty VIOsPAPRVLANDevice -> SpaprVioVlan Brevity, since the "Device" didn't add useful information sPAPRDRConnector -> SpaprDrc sPAPRDRConnectorClass -> SpaprDrcClass Brevity, and makes it clearer this is the same thing as a "DRC" mentioned in many other places in the code This is 100% a mechanical search-and-replace patch. It will, however, conflict with essentially any and all outstanding patches touching the spapr code. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-03-12 14:33:05 +11:00
Suraj Jitindar Singh	176dcceedd	target/ppc/spapr: Clear partition table entry when allocating hash table If we allocate a hash page table then we know that the guest won't be using process tables, so set the partition table entry maintained for the guest to zero. If this isn't done, then the guest radix bit will remain set in the entry. This means that when the guest calls H_REGISTER_PROCESS_TABLE there will be a mismatch between then flags and the value in spapr->patb_entry, and the call will fail. The guest will then panic: Failed to register process table (rc=-4) kernel BUG at arch/powerpc/platforms/pseries/lpar.c:959 The result being that it isn't possible to boot a hash guest on a P9 system. Also fix a bug in the flags parsing in h_register_process_table() which was introduced by the same patch, and simplify the handling to make it less likely that errors will be introduced in the future. The effect would have been setting the host radix bit LPCR_HR for a hash guest using process tables, which currently isn't supported and so couldn't have been triggered. Fixes: `00fd075e18` "target/ppc/spapr: Set LPCR:HR when using Radix mode" Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Message-Id: <20190305022102.17610-1-sjitindarsingh@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-03-12 14:33:04 +11:00
Suraj Jitindar Singh	8ff43ee404	target/ppc/spapr: Add SPAPR_CAP_CCF_ASSIST Introduce a new spapr_cap SPAPR_CAP_CCF_ASSIST to be used to indicate the requirement for a hw-assisted version of the count cache flush workaround. The count cache flush workaround is a software workaround which can be used to flush the count cache on context switch. Some revisions of hardware may have a hardware accelerated flush, in which case the software flush can be shortened. This cap is used to set the availability of such hardware acceleration for the count cache flush routine. The availability of such hardware acceleration is indicated by the H_CPU_CHAR_BCCTR_FLUSH_ASSIST flag being set in the characteristics returned from the KVM_PPC_GET_CPU_CHAR ioctl. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Message-Id: <20190301031912.28809-2-sjitindarsingh@gmail.com> [dwg: Small style fixes] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-03-12 12:07:49 +11:00
Suraj Jitindar Singh	399b2896d4	target/ppc/spapr: Add workaround option to SPAPR_CAP_IBS The spapr_cap SPAPR_CAP_IBS is used to indicate the level of capability for mitigations for indirect branch speculation. Currently the available values are broken (default), fixed-ibs (fixed by serialising indirect branches) and fixed-ccd (fixed by diabling the count cache). Introduce a new value for this capability denoted workaround, meaning that software can work around the issue by flushing the count cache on context switch. This option is available if the hypervisor sets the H_CPU_BEHAV_FLUSH_COUNT_CACHE flag in the cpu behaviours returned from the KVM_PPC_GET_CPU_CHAR ioctl. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Message-Id: <20190301031912.28809-1-sjitindarsingh@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-03-12 12:07:49 +11:00
Benjamin Herrenschmidt	79825f4d58	target/ppc: Rename PATB/PATBE -> PATE That "b" means "base address" and thus shouldn't be in the name of actual entries and related constants. This patch keeps the synthetic patb_entry field of the spapr virtual hypervisor unchanged until I figure out if that has an impact on the migration stream. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190215170029.15641-11-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-02-26 09:21:25 +11:00
Benjamin Herrenschmidt	00fd075e18	target/ppc/spapr: Set LPCR:HR when using Radix mode The HW relies on LPCR:HR along with the PATE to determine whether to use Radix or Hash mode. In fact it uses LPCR:HR more commonly than the PATE. For us, it's also more efficient to do so, especially since unlike the HW we do not maintain a cache of the current PATE and HV PATE in a generic place. Prepare the grounds for that by ensuring that LPCR:HR is set properly on SPAPR machines. Another option would have been to use a callback to get the PATE but this gets messy when implementing bare metal support, it's much simpler (and faster) to use LPCR. Since existing migration streams may not have it, fix it up in spapr_post_load() as well based on the pseudo-PATE entry that we keep. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190215170029.15641-2-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-02-26 09:21:25 +11:00
Cédric Le Goater	13db0cd9b8	spapr: introduce a new sPAPR IRQ backend supporting XIVE and XICS The 'dual' sPAPR IRQ backend supports both interrupt mode, XIVE exploitation mode and the legacy compatibility mode (XICS). both modes are not supported at the same time. The machine starts with the legacy mode and a new interrupt mode can then be negotiated by the CAS process. In this case, the new mode is activated after a reset to take into account the required changes in the machine. These impact the device tree layout, the interrupt presenter object and the exposed MMIO regions in the case of XIVE. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-01-09 09:28:14 +11:00
Alexey Kardashevskiy	fea35ca4b8	ppc/spapr: Receive and store device tree blob from SLOF SLOF receives a device tree and updates it with various properties before switching to the guest kernel and QEMU is not aware of any changes made by SLOF. Since there is no real RTAS (QEMU implements it), it makes sense to pass the SLOF final device tree to QEMU to let it implement RTAS related tasks better, such as PCI host bus adapter hotplug. Specifially, now QEMU can find out the actual XICS phandle (for PHB hotplug) and the RTAS linux,rtas-entry/base properties (for firmware assisted NMI - FWNMI). This stores the initial DT blob in the sPAPR machine and replaces it in the KVMPPC_H_UPDATE_DT (new private hypercall) handler. This adds an @update_dt_enabled machine property to allow backward migration. SLOF already has a hypercall since https://github.com/aik/SLOF/commit/e6fc84652c9c0073f9183 This makes use of the new fdt_check_full() helper. In order to allow the configure script to pick the correct DTC version, this adjusts the DTC presense test. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-01-09 09:28:13 +11:00
Laurent Vivier	c24ba3d0a3	spapr: Add H-Call H_HOME_NODE_ASSOCIATIVITY H_HOME_NODE_ASSOCIATIVITY H-Call returns the associativity domain designation associated with the identifier input parameter This fixes a crash when we try to hotplug a CPU in memory-less and CPU-less numa node. In this case, the kernel tries to online the node, but without the information provided by this h-call, the node id, it cannot and the CPU is started while the node is not onlined. It also removes the warning message from the kernel: VPHN is not supported. Disabling polling.. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-01-09 09:28:13 +11:00
David Gibson	7388efafc2	target/ppc, spapr: Move VPA information to machine_data CPUPPCState currently contains a number of fields containing the state of the VPA. The VPA is a PAPR specific concept covering several guest/host shared memory areas used to communicate some information with the hypervisor. As a PAPR concept this is really machine specific information, although it is per-cpu, so it doesn't really belong in the core CPU state structure. There's also other information that's per-cpu, but platform/machine specific. So create a (void *)machine_data in PowerPCCPU which can be used by the machine to locate per-cpu data. Intialization, lifetime and cleanup of machine_data is entirely up to the machine type. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Tested-by: Greg Kurz <groug@kaod.org>	2018-06-16 16:32:50 +10:00
Greg Kurz	2c9dfdacc5	spapr: fix leak in h_client_architecture_support() If the negotiated compat mode can't be set, but raw mode is supported, we decide to ignore the error. An so, we should free it to prevent a memory leak. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-06-16 16:32:33 +10:00
David Hildenbrand	e017da370b	machine: rename MemoryHotplugState to DeviceMemoryState Rename it to better match the new terminology. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20180423165126.15441-9-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2018-05-07 10:00:02 -03:00
David Hildenbrand	b0c14ec4ef	machine: make MemoryHotplugState accessible via the machine Let's allow to query the MemoryHotplugState directly from the machine. If the pointer is NULL, the machine does not support memory devices. If the pointer is !NULL, the machine supports memory devices and the data structure contains information about the applicable physical guest address space region. This allows us to generically detect if a certain machine has support for memory devices, and to generically manage it (find free address range, plug/unplug a memory region). We will rename "MemoryHotplugState" to something more meaningful ("DeviceMemory") after we completed factoring out the pc-dimm code into MemoryDevice code. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20180423165126.15441-3-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> [ehabkost: rebased series, solved conflicts at spapr.c] [ehabkost: squashed fix to use g_malloc0()] Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2018-05-07 10:00:02 -03:00
David Hildenbrand	2cc0e2e814	pc-dimm: factor out MemoryDevice interface On the qmp level, we already have the concept of memory devices: "query-memory-devices" Right now, we only support NVDIMM and PCDIMM. We want to map other devices later into the address space of the guest. Such device could e.g. be virtio devices. These devices will have a guest memory range assigned but won't be exposed via e.g. ACPI. We want to make them look like memory device, but not glued to pc-dimm. Especially, it will not always be possible to have TYPE_PC_DIMM as a parent class (e.g. virtio devices). Let's use an interface instead. As a first part, convert handling of - qmp_pc_dimm_device_list - get_plugged_memory_size to our new model. plug/unplug stuff etc. will follow later. A memory device will have to provide the following functions: - get_addr(): Necessary, as the property "addr" can e.g. not be used for virtio devices (already defined). - get_plugged_size(): The amount this device offers to the guest as of now. - get_region_size(): Because this can later on be bigger than the plugged size. - fill_device_info(): Fill MemoryDeviceInfo, e.g. for qmp. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20180423165126.15441-2-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2018-05-07 10:00:02 -03:00
David Gibson	295b6c26ac	spapr: Clean up LPCR updates from hypercalls There are several places in spapr_hcall.c where we need to update the LPCR value on all CPUs. We do this with the set_spr() helper. That's not really correct because this directly sets the SPR value, without going through the ppc_store_lpcr() helper which may need to update state based on the LPCR change. In fact, set_spr() is only ever used for the LPCR, so replace it with an explicit LPCR updated which uses the right low-level helper. While we're there, move the CPU_FOREACH() which was in every one of the callers into the new helper: set_all_lpcrs(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Tested-by: Cédric Le Goater <clg@kaod.org>	2018-05-04 15:00:37 +10:00
Suraj Jitindar Singh	c76c0d3090	ppc/spapr-caps: Convert cap-ibs to custom spapr-cap Convert cap-ibs (indirect branch speculation) to a custom spapr-cap type. All tristate caps have now been converted to custom spapr-caps, so remove the remaining support for them. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> [dwg: Don't explicitly list "?"/help option, trust convention] [dwg: Fold tristate removal into here, to not break bisect] [dwg: Fix minor style problems] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-03-06 13:16:29 +11:00
Daniel Henrique Barboza	9478956794	hw/ppc/spapr_hcall: set htab_shift after kvmppc_resize_hpt_commit Newer kernels have a htab resize capability when adding or remove memory. At these situations, the guest kernel might reallocate its htab to a more suitable size based on the resulting memory. However, we're not setting the new value back into the machine state when a KVM guest resizes its htab. At first this doesn't seem harmful, but when migrating or saving the guest state (via virsh managedsave, for instance) this mismatch between the htab size of QEMU and the kernel makes the guest hangs when trying to load its state. Inside h_resize_hpt_commit, the hypercall that commits the hash page resize changes, let's set spapr->htab_shift to the new value if we're sure that kvmppc_resize_hpt_commit were successful. While we're here, add a "not RADIX" sanity check as it is already done in the related hypercall h_resize_hpt_prepare. Fixes: https://github.com/open-power-host-os/qemu/issues/28 Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-02-16 12:14:26 +11:00
Daniel Henrique Barboza	b472b1a727	hw/ppc: rename functions in comments Commit `bcb5ce08cf` ("spapr: Rename machine init functions for clarity") renamed ppc_spapr_reset to spapr_machine_reset and ppc_spapr_init to spapr_machine_init. Let's also rename the references in comments. Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-02-10 12:17:17 +11:00
Greg Kurz	fa86f59234	spapr: add missing break in h_get_cpu_characteristics() Detected by Coverity (CID 1385702). This fixes the recently added hypercall to let guests properly apply Spectre and Meltdown workarounds. Fixes: `c59704b254` "target/ppc/spapr: Add H-Call H_GET_CPU_CHARACTERISTICS" Reported-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-02-10 12:17:17 +11:00
Suraj Jitindar Singh	c59704b254	target/ppc/spapr: Add H-Call H_GET_CPU_CHARACTERISTICS The new H-Call H_GET_CPU_CHARACTERISTICS is used by the guest to query behaviours and available characteristics of the cpu. Implement the handler for this new H-Call which formulates its response based on the setting of the spapr_caps cap-cfpc, cap-sbbc and cap-ibs. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-01-29 14:24:55 +11:00
Philippe Mathieu-Daudé	1945e6ab47	ppc: remove duplicated includes applied using ./scripts/clean-includes not needed since `7ebaf79556` Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2017-12-18 17:07:02 +03:00
Sam Bobroff	e05fba5004	target/ppc: correct htab shift for hash on radix KVM HV will soon support running a guest in hash mode on a POWER9 host running in radix mode (see [1]), however the guest currently fails to boot. This is because the "htab_shift" value (the size of the MMU's hash table) is added to the device tree before KVM has had a chance to change it. If the host is in hash mode, KVM does not need to change it and so the problem is not seen, but when the host is in radix mode a change is required and we see a problem. To fix this, move the call spapr_setup_hpt_and_vrma() (where htab_shift could be changed) up a little so that it's called before spapr_h_cas_compose_response() (where htab_shift is added to the device tree). Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> [1] See http://www.spinics.net/lists/kvm-ppc/msg13057.html Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-11-14 10:28:32 +11:00
David Gibson	db50f280cf	spapr: Correct RAM size calculation for HPT resizing In order to prevent the guest from forcing the allocation of large amounts of qemu memory (or host kernel memory, in the case of KVM HV), we limit the size of Hashed Page Table (HPT) it is allowed to allocated, based on its RAM size. However, the current calculation is not correct: it only adds up the size of plugged memory, ignoring the base memory size. This patch corrects it. While we're there, use get_plugged_memory_size() instead of directly calling pc_existing_dimms_capacity(). The only difference is that it will abort on failure, which is right: a failure here indicates something wrong within qemu. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Laurent Vivier <lvivier@redhat.com>	2017-10-17 10:34:01 +11:00
Greg Kurz	1ec26c757d	spapr: fix the value of SDR1 in kvmppc_put_books_sregs() When running with KVM PR, if a new HPT is allocated we need to inform KVM about the HPT address and size. This is currently done by hacking the value of SDR1 and pushing it to KVM in several places. Also, migration breaks the guest since it is very unlikely the HPT has the same address in source and destination, but we push the incoming value of SDR1 to KVM anyway. This patch introduces a new virtual hypervisor hook so that the spapr code can provide the correct value of SDR1 to be pushed to KVM each time kvmppc_put_books_sregs() is called. It allows to get rid of all the hacking in the spapr/kvmppc code and it fixes migration of nested KVM PR. Suggested-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-09-27 13:05:41 +10:00
Cédric Le Goater	30bf9ed168	spapr: fix CAS-generated reset The OV5_MMU_RADIX_300 requires special handling in the CAS negotiation process. It is cleared from the option vector of the guest before evaluating the changes and re-added later. But, when testing for a possible CAS reset : spapr->cas_reboot = spapr_ovec_diff(ov5_updates, ov5_cas_old, spapr->ov5_cas); the bit OV5_MMU_RADIX_300 will each time be seen as removed from the previous OV5 set, hence generating a reset loop. Fix this problem by also clearing the same bit in the ov5_cas_old set. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-09-15 10:29:48 +10:00
Greg Kurz	4c563d9df5	spapr: only update SDR1 once per-cpu during CAS Commit `b55d295e3e` added the possibility to support HPT resizing with KVM. In the case of PR, we need to pass the userspace address of the HPT to KVM using the SDR1 slot. This is handled by kvmppc_update_sdr1() which uses CPU_FOREACH() to update all CPUs. It is hence not needed to call kvmppc_update_sdr1() for each CPU. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-09-15 10:29:48 +10:00
Greg Kurz	cc7b35b169	spapr: fallback to raw mode if best compat mode cannot be set during CAS KVM PR doesn't allow to set a compat mode. This causes ppc_set_compat_all() to fail and we return H_HARDWARE to the guest right away. This is excessive: even if we favor compat mode since commit `152ef803ce`, we should at least fallback to raw mode if the guest supports it. This patch modifies cas_check_pvr() so that it also reports that the real PVR was found in the table supplied by the guest. Note that this is only makes sense if raw mode isn't explicitely disabled (ie, the user didn't set the machine "max-cpu-compat" property). If this is the case, we can simply ignore ppc_set_compat_all() failures, and let the guest run in raw mode. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-09-08 09:30:55 +10:00
Sam Bobroff	2e886fb391	ppc: spapr: Make VCPU ID handling private to SPAPR The concept of a VCPU ID that differs from the CPU's index (cpu->cpu_index) exists only within SPAPR machines so, move the functions ppc_get_vcpu_id() and ppc_get_cpu_by_vcpu_id() into spapr.c and rename them appropriately. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-09-08 09:30:55 +10:00
Sam Bobroff	81210c2009	ppc: spapr: Rename cpu_dt_id to vcpu_id This field actually records the VCPU ID used by KVM and, although the value is also used in the device tree it is primarily the VCPU ID so rename it as such. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> [dwg: Updated comment missed in cpu.h] Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-09-08 09:30:55 +10:00
Sam Bobroff	f57467e3b3	spapr: Fix bug in h_signal_sys_reset() The unicast case in h_signal_sys_reset() seems to be broken: rather than selecting the target CPU, it looks like it will pick either the first CPU or fail to find one at all. Fix it by using the search function rather than open coding the search. This was found by inspection; the code appears to be unused because the Linux kernel only uses the broadcast target. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-08-09 14:04:28 +10:00
David Gibson	b55d295e3e	pseries: Allow HPT resizing with KVM So far, qemu implements the PAPR Hash Page Table (HPT) resizing extension with TCG. The same implementation will work with KVM PR, but we don't currently allow that. For KVM HV we can only implement resizing with the assistance of the host kernel, which needs a new capability and ioctl()s. This patch adds support for testing the new KVM capability and implementing the resize in terms of KVM facilities when necessary. If we're running on a kernel which doesn't have the new capability flag at all, we fall back to testing for PR vs. HV KVM using the same hack that we already use in a number of places for older kernels. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-07-17 15:07:05 +10:00
David Gibson	2772cf6be9	pseries: Use smaller default hash page tables when guest can resize We've now implemented a PAPR extension allowing PAPR guest to resize their hash page table (HPT) during runtime. This patch makes use of that facility to allocate smaller HPTs by default. Specifically when a guest is aware of the HPT resize facility, qemu sizes the HPT to the initial memory size, rather than the maximum memory size on the assumption that the guest will resize its HPT if necessary for hot plugged memory. When the initial memory size is much smaller than the maximum memory size (a common configuration with e.g. oVirt / RHEV) then this can save significant memory on the HPT. If the guest does not advertise HPT resize awareness when it makes the ibm,client-architecture-support call, qemu resizes the HPT for maxmimum memory size (unless it's been configured not to allow such guests at all). For now we make that reallocation assuming the guest has not yet used the HPT at all. That's true in practice, but not, strictly, an architectural or PAPR requirement. If we need to in future we can fix this by having the client-architecture-support call reboot the guest with the revised HPT size (the client-architecture-support call is explicitly permitted to trigger a reboot in this way). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>	2017-07-17 15:07:05 +10:00
David Gibson	0b0b831016	pseries: Implement HPT resizing This patch implements hypercalls allowing a PAPR guest to resize its own hash page table. This will eventually allow for more flexible memory hotplug. The implementation is partially asynchronous, handled in a special thread running the hpt_prepare_thread() function. The state of a pending resize is stored in SPAPR_MACHINE->pending_hpt. The H_RESIZE_HPT_PREPARE hypercall will kick off creation of a new HPT, or, if one is already in progress, monitor it for completion. If there is an existing HPT resize in progress that doesn't match the size specified in the call, it will cancel it, replacing it with a new one matching the given size. The H_RESIZE_HPT_COMMIT completes transition to a resized HPT, and can only be called successfully once H_RESIZE_HPT_PREPARE has successfully completed initialization of a new HPT. The guest must ensure that there are no concurrent accesses to the existing HPT while this is called (this effectively means stop_machine() for Linux guests). For now H_RESIZE_HPT_COMMIT goes through the whole old HPT, rehashing each HPTE into the new HPT. This can have quite high latency, but it seems to be of the order of typical migration downtime latencies for HPTs of size up to ~2GiB (which would be used in a 256GiB guest). In future we probably want to move more of the rehashing to the "prepare" phase, by having H_ENTER and other hcalls update both current and pending HPTs. That's a project for another day, but should be possible without any changes to the guest interface. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-07-17 15:07:05 +10:00
David Gibson	30f4b05bd0	pseries: Stubs for HPT resizing This introduces stub implementations of the H_RESIZE_HPT_PREPARE and H_RESIZE_HPT_COMMIT hypercalls which we hope to add in a PAPR extension to allow run time resizing of a guest's hash page table. It also adds a new machine property for controlling whether this new facility is available. For now we only allow resizing with TCG, allowing it with KVM will require kernel changes as well. Finally, it adds a new string to the hypertas property in the device tree, advertising to the guest the availability of the HPT resizing hypercalls. This is a tentative suggested value, and would need to be standardized by PAPR before being merged. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com>	2017-07-17 15:07:05 +10:00
David Gibson	7843c0d60d	pseries: Move CPU compatibility property to machine Server class POWER CPUs have a "compat" property, which is used to set the backwards compatibility mode for the processor. However, this only makes sense for machine types which don't give the guest access to hypervisor privilege - otherwise the compatibility level is under the guest's control. To reflect this, this removes the CPU 'compat' property and instead creates a 'max-cpu-compat' property on the pseries machine. Strictly speaking this breaks compatibility, but AFAIK the 'compat' option was never (directly) used with -device or device_add. The option was used with -cpu. So, to maintain compatibility, this patch adds a hack to the cpu option parsing to strip out any compat options supplied with -cpu and set them on the machine property instead of the now deprecated cpu property. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Tested-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: Greg Kurz <groug@kaod.org> Tested-by: Greg Kurz <groug@kaod.org> Tested-by: Andrea Bolognani <abologna@redhat.com>	2017-06-30 14:03:31 +10:00
Suraj Jitindar Singh	60694bc678	target/ppc: Fixup set_spr error in h_register_process_table set_spr is used in the function h_register_process_table() to update the LPCR_GTSE and LPCR_UPRT values based on the flags passed by the guest. The set_spr function takes the last two arguments mask and value used to mask and set the value of the spr respectively. The current call site passes these arguments in the wrong order and thus bot GTSE and UPRT will be set irrespective, which is obviously incorrect. Rearrange the function call so that these arguments are passed in the correct order and the correct behaviour is exhibited. It is worth noting that this wasn't detected earlier since these were always both set in all cases where this H_CALL was made. Fixes: `6de833070c` ("target/ppc: Set UPRT and GTSE on all cpus in H_REGISTER_PROCESS_TABLE") Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-06-06 08:53:24 +10:00
Stefan Hajnoczi	5bb0d22cb4	ppc patch queue 2017-05-25 Assorted accumulated patches. These are nearly all bugfixes at one level or another - some for longstanding problems, others for some regressions caused by more recent cleanups. This includes preliminary patches towards fixing migration for Radix Page Table guests under POWER9 and also fixing some migration regressions due to the re-organization of the interrupt controller code. Not all the pieces are there yet, so those still won't quite work, but the preliminary changes make sense on their own. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJZJlRoAAoJEGw4ysog2bOS4m0P/0fm0k9znGQ8jpbGDJ18PF4g Z7rhEcz5Ab1f5xn+ujYSc23ViJ0wgonhQB0F2d02O50Br0Gu2zN1XMrstysUEN/6 qg7nngsDqe+mGFMXASNb+YIzK4mYZQXmW8qscVm6fdaGXq/tZ13zMRPoRHdJQpsg uN/uDWvQqwZO4RizKFbXlosoeNS1Q4c+Bm5MszV+B6TfVvgNd81Od7rjY/ucj4tr 9e8oG3lx1YpRjg6XN3uT/AEtPxgUe6hAS5RlsAWk/B0FBUK6JvRSaDAS8ojg8UIg 8cPWix5OrHQSpjcTsNW3X2FRb31O8YvExPYFHrVZeVhaB5HzVLPXEudeSIMiuqjn CfZxRz6+IToWUJWFn30NozfJUwgQlJ2sf92CHcmMKHu2Zd/hUWdApIukmEFY43Y5 jyhDkubrRtSsCcR6wd4mGeAg2iQWubSOPFdM/TAGzlbGWoT4qXBK1Ol03DaiF971 fkxWaHrmgiKhe8G1sUIZXfDDxpTIvFv1bcmGOnhGmsELFh65bMXVLmwjNvVK9fdE hTuWibRPPE3btyI4eOMbtVdooliCfp+0XvraACnuOXQlgD1bqCPSrnsS2HLPiDS+ npRKlHGlf4cYSVCeTCjmsAVIqzsDfyvpd67qP3xPsaX/pxI/i+I2H9usZWWJBXMp I5M78EL5NCkMnZgYIFad =nlnV -----END PGP SIGNATURE----- Merge remote-tracking branch 'dgibson/tags/ppc-for-2.10-20170525' into staging ppc patch queue 2017-05-25 Assorted accumulated patches. These are nearly all bugfixes at one level or another - some for longstanding problems, others for some regressions caused by more recent cleanups. This includes preliminary patches towards fixing migration for Radix Page Table guests under POWER9 and also fixing some migration regressions due to the re-organization of the interrupt controller code. Not all the pieces are there yet, so those still won't quite work, but the preliminary changes make sense on their own. # gpg: Signature made Thu 25 May 2017 04:50:00 AM BST # gpg: using RSA key 0x6C38CACA20D9B392 # gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" # gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" # gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" # gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" # Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392 * dgibson/tags/ppc-for-2.10-20170525: xics: add unrealize handler hw/ppc/spapr.c: recover pending LMB unplug info in spapr_lmb_release hw/ppc: migrating the DRC state of hotplugged devices hw/ppc: removing drc->detach_cb and drc->detach_cb_opaque hw/ppc/spapr.c: adding pending_dimm_unplugs to sPAPRMachineState spapr: add pre_plug function for memory pseries: Restore support for total vcpus not a multiple of threads-per-core for old machine types pseries: Split CAS PVR negotiation out into a separate function spapr: fix error reporting in xics_system_init() spapr_cpu_core: drop reference on ICP object during CPU realization hw/ppc/spapr_events.c: removing 'exception' from sPAPREventLogEntry spapr: ensure core_slot isn't NULL in spapr_core_unplug() xics_kvm: cache already enabled vCPU ids spapr: Consolidate HPT freeing code into a routine spapr-cpu-core: release ICP object when realization fails spapr: sanitize error handling in spapr_ics_create() ppc/xics: simplify prototype of xics_spapr_init() target/ppc: reset reservation in do_rfi() Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-05-30 09:44:58 +01:00
David Gibson	80c33d343f	pseries: Split CAS PVR negotiation out into a separate function Guests of the qemu machine type go through a feature negotiation process known as "client architecture support" (CAS) during early boot. This does a number of things, one of which is finding a CPU compatibility mode which can be supported by both guest and host. In fact the CPU negotiation is probably the single most complex part of the CAS process, so this splits it out into a helper function. We've recently made some mistakes in maintaining backward compatibility for old machine types here. Splitting this out will also make it easier to fix this. This also adds a possibly useful error message if the negotiation fails (i.e. if there isn't a CPU mode that's suitable for both guest and host). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org>	2017-05-24 11:39:53 +10:00
Bharata B Rao	06ec79e865	spapr: Consolidate HPT freeing code into a routine Consolidate the code that frees HPT into a separate routine spapr_free_hpt() as the same chunk of code is called from two places. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-05-24 11:39:52 +10:00
Eric Blake	cf83f14005	shutdown: Add source information to SHUTDOWN and RESET Time to wire up all the call sites that request a shutdown or reset to use the enum added in the previous patch. It would have been less churn to keep the common case with no arguments as meaning guest-triggered, and only modified the host-triggered code paths, via a wrapper function, but then we'd still have to audit that I didn't miss any host-triggered spots; changing the signature forces us to double-check that I correctly categorized all callers. Since command line options can change whether a guest reset request causes an actual reset vs. a shutdown, it's easy to also add the information to reset requests. Signed-off-by: Eric Blake <eblake@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> [ppc parts] Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> [SPARC part] Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> [s390x parts] Message-Id: <20170515214114.15442-5-eblake@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2017-05-23 13:28:17 +02:00
Suraj Jitindar Singh	6de833070c	target/ppc: Set UPRT and GTSE on all cpus in H_REGISTER_PROCESS_TABLE The UPRT and GTSE bits are set when a guest calls H_REGISTER_PROCESS_TABLE to choose determine how address translation is performed. Currently these bits in the LPCR are only set for the cpu which handles the H_CALL, however they need to be set for all cpus for that guest as address translation cannot be performed differently on a per cpu basis. Update the H_CALL handler to set these bits in the LPCR correctly for all cpus of the guest. Note it is the reponsibility of the guest to ensure that any secondary cpus are suspended when the H_CALL is made and thus we can safely update these values here. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-05-11 09:45:15 +10:00
Sam Bobroff	e957f6a9b9	spapr: Workaround for broken radix guests For a little while around 4.9, Linux kernels that saw the radix bit in ibm,pa-features would attempt to set up the MMU as if they were a hypervisor, even if they were a guest, which would cause them to crash. Work around this by detecting pre-ISA 3.0 guests by their lack of that bit in option vector 1, and then removing the radix bit from ibm,pa-features. Note: This now requires regeneration of that node after CAS negotiation. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> [dwg: Fix style nits] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:00:41 +10:00
Sam Bobroff	9fb4541f58	spapr: Enable ISA 3.0 MMU mode selection via CAS Add the new node, /chosen/ibm,arch-vec-5-platform-support to the device tree. This allows the guest to determine which modes are supported by the hypervisor. Update the option vector processing in h_client_architecture_support() to handle the new MMU bits. This allows guests to request hash or radix mode and QEMU to create the guest's HPT at this time if it is necessary but hasn't yet been done. QEMU will terminate the guest if it requests an unavailable mode, as required by the architecture. Extend the ibm,pa-features node with the new ISA 3.0 values and set the radix bit if KVM supports radix mode. This probably won't be used directly by guests to determine the availability of radix mode (that is indicated by the new node added above) but the architecture requires that it be set when the hardware supports it. If QEMU is using KVM, and KVM is capable of running in radix mode, guests can be run in real-mode without allocating a HPT (because KVM will use a minimal RPT). So in this case, we avoid creating the HPT at reset time and later (during CAS) create it if it is necessary. ISA 3.0 guests will now begin to call h_register_process_table(), which has been added previously. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> [dwg: Strip some unneeded prefix from error messages] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:00:41 +10:00
Suraj Jitindar Singh	b4db54132f	target/ppc: Implement H_REGISTER_PROCESS_TABLE H_CALL The H_REGISTER_PROCESS_TABLE H_CALL is used by a guest to indicate to the hypervisor where in memory its process table is and how translation should be performed using this process table. Provide the implementation of this H_CALL for a guest. We first check for invalid flags, then parse the flags to determine the operation, and then check the other parameters for valid values based on the operation (register new table/deregister table/maintain registration). The process table is then stored in the appropriate location and registered with the hypervisor (if running under KVM), and the LPCR_[UPRT/GTSE] bits are updated as required. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> [dwg: Correct missing prototype and uninitialized variable] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:00:41 +10:00
Suraj Jitindar Singh	d77a98b015	target/ppc: Add new H-CALL shells for in memory table translation The use of the new in memory tables introduced in ISAv3.00 for translation, also referred to as process tables, requires the introduction of 3 new H-CALLs; H_REGISTER_PROCESS_TABLE, H_CLEAN_SLB, and H_INVALIDATE_PID. Add shells for each of these and register them as the hypercall handlers. Currently they all log an unimplemented hypercall and return H_FUNCTION. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> [dwg: Fix style nits] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:00:41 +10:00
David Gibson	e57ca75ce3	target/ppc: Manage external HPT via virtual hypervisor The pseries machine type implements the behaviour of a PAPR compliant hypervisor, without actually executing such a hypervisor on the virtual CPU. To do this we need some hooks in the CPU code to make hypervisor facilities get redirected to the machine instead of emulated internally. For hypercalls this is managed through the cpu->vhyp field, which points to a QOM interface with a method implementing the hypercall. For the hashed page table (HPT) - also a hypervisor resource - we use an older hack. CPUPPCState has an 'external_htab' field which when non-NULL indicates that the HPT is stored in qemu memory, rather than within the guest's address space. For consistency - and to make some future extensions easier - this merges the external HPT mechanism into the vhyp mechanism. Methods are added to vhyp for the basic operations the core hash MMU code needs: map_hptes() and unmap_hptes() for reading the HPT, store_hpte() for updating it and hpt_mask() to retrieve its size. To match this, the pseries machine now sets these vhyp fields in its existing vhyp class, rather than reaching into the cpu object to set the external_htab field. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>	2017-03-01 11:23:39 +11:00
David Gibson	36778660d7	target/ppc: Eliminate htab_base and htab_mask variables CPUPPCState includes fields htab_base and htab_mask which store the base address (GPA) and size (as a mask) of the guest's hashed page table (HPT). These are set when the SDR1 register is updated. Keeping these in sync with the SDR1 is actually a little bit fiddly, and probably not useful for performance, since keeping them expands the size of CPUPPCState. It also makes some upcoming changes harder to implement. This patch removes these fields, in favour of calculating them directly from the SDR1 contents when necessary. This does make a change to the behaviour of attempting to write a bad value (invalid HPT size) to the SDR1 with an mtspr instruction. Previously, the bad value would be stored in SDR1 and could be retrieved with a later mfspr, but the HPT size as used by the softmmu would be, clamped to the allowed values. Now, writing a bad value is treated as a no-op. An error message is printed in both new and old versions. I'm not sure which behaviour, if either, matches real hardware. I don't think it matters that much, since it's pretty clear that if an OS writes a bad value to SDR1, it's not going to boot. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>	2017-03-01 11:23:39 +11:00
David Gibson	7222b94a83	target/ppc: Cleanup HPTE accessors for 64-bit hash MMU Accesses to the hashed page table (HPT) are complicated by the fact that the HPT could be in one of three places: 1) Within guest memory - when we're emulating a full guest CPU at the hardware level (e.g. powernv, mac99, g3beige) 2) Within qemu, but outside guest memory - when we're emulating user and supervisor instructions within TCG, but instead of emulating the CPU's hypervisor mode, we just emulate a hypervisor's behaviour (pseries in TCG or KVM-PR) 3) Within the host kernel - a pseries machine using KVM-HV acceleration. Mostly accesses to the HPT are handled by KVM, but there are a few cases where qemu needs to access it via a special fd for the purpose. In order to batch accesses to the fd in case (3), we use a somewhat awkward ppc_hash64_start_access() / ppc_hash64_stop_access() pair, which for case (3) reads / releases several HPTEs from the kernel as a batch (usually a whole PTEG). For cases (1) & (2) it just returns an address value. The actual HPTE load helpers then need to interpret the returned token differently in the 3 cases. This patch keeps the same basic structure, but simplfiies the details. First start_access() / stop_access() are renamed to map_hptes() and unmap_hptes() to make their operation more obvious. Second, map_hptes() now always returns a qemu pointer, which can always be used in the same way by the load_hpte() helpers. In case (1) it comes from address_space_map() in case (2) directly from qemu's HPT buffer and in case (3) from a temporary buffer read from the KVM fd. While we're at it, make things a bit more consistent in terms of types and variable names: avoid variables named 'index' (it shadows index(3) which can lead to confusing results), use 'hwaddr ptex' for HPTE indices and uint64_t for each of the HPTE words, use ptex throughout the call stack instead of pte_offset in some places (we still need that at the bottom layer, but nowhere else). Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-01 11:23:39 +11:00
David Gibson	c6404adebf	pseries: Minor cleanups to HPT management hypercalls * Standardize on 'ptex' instead of 'pte_index' for HPTE index variables for consistency and brevity * Avoid variables named 'index'; shadowing index(3) from libc can lead to surprising bugs if the variable is removed, because compiler errors might not appear for remaining references * Clarify index calculations in h_enter() - we have two cases, H_EXACT where the exact HPTE slot is given, and !H_EXACT where we search for an empty slot within the hash bucket. Make the calculation more consistent between the cases. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>	2017-03-01 11:23:39 +11:00
David Gibson	f6f242c757	ppc: Add ppc_set_compat_all() Once a compatiblity mode is negotiated with the guest, h_client_architecture_support() uses run_on_cpu() to update each CPU to the new mode. We're going to want this logic somewhere else shortly, so make a helper function to do this global update. We put it in target-ppc/compat.c - it makes as much sense at the CPU level as it does at the machine level. We also move the cpu_synchronize_state() into ppc_set_compat(), since it doesn't really make any sense to call that without synchronizing state. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-01-31 10:10:14 +11:00
David Gibson	152ef803ce	pseries: Rewrite CAS PVR compatibility logic During boot, PAPR guests negotiate CPU model support with the ibm,client-architecture-support mechanism. The logic to implement this in qemu is very convoluted. This cleans it up to be cleaner, using the new ppc_check_compat() call. The new logic for choosing a compatibility mode is: 1. Usually, use the most recent compatibility mode that is a) supported by the guest b) supported by the CPU and c) no later than the maximum allowed (if specified) 2. If no suitable compatibility mode was found, the guest does support this CPU explicitly, and no maximum compatibility mode is specified, then use "raw" mode for the current CPU 3. Otherwise, fail the boot. This differs from the results of the old code: the old code preferred using "raw" mode to a compatibility mode, whereas the new code prefers a compatibility mode if available. Using compatibility mode preferentially means that we're more likely to be able to migrate the guest to a similar but not identical host. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-01-31 10:10:14 +11:00
Nicholas Piggin	1c7ad77e56	ppc/spapr: implement H_SIGNAL_SYS_RESET The H_SIGNAL_SYS_RESET hcall allows a guest CPU to raise a system reset exception on CPUs within the same guest -- all CPUs, all-but-self, or a specific CPU (including self). This has not made its way to a PAPR release yet, but we have an hcall number assigned. H_SIGNAL_SYS_RESET = 0x380 Syntax: hcall(uint64 H_SIGNAL_SYS_RESET, int64 target); Generate a system reset NMI on the threads indicated by target. Values for target: -1 = target all online threads including the caller -2 = target all online threads except for the caller All other negative values: reserved Positive values: The thread to be targeted, obtained from the value of the "ibm,ppc-interrupt-server#s" property of the CPU in the OF device tree. Semantics: - Invalid target: return H_Parameter. - Otherwise: Generate a system reset NMI on target thread(s), return H_Success. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-01-31 10:10:13 +11:00
David Gibson	d6e166c082	ppc: Rename cpu_version to compat_pvr The 'cpu_version' field in PowerPCCPU is badly named. It's named after the 'cpu-version' device tree property where it is advertised, but that meaning may not be obvious in most places it appears. Worse, it doesn't even really correspond to that device tree property. The property contains either the processor's PVR, or, if the CPU is running in a compatibility mode, a special "logical PVR" representing which mode. Rename the cpu_version field, and a number of related variables to compat_pvr to make this clearer. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Thomas Huth <thuth@redhat.com>	2017-01-31 10:10:13 +11:00
David Gibson	5b120785e7	pseries: Make cpu_update during CAS unconditional spapr_h_cas_compose_response() includes a cpu_update parameter which controls whether it includes updated information on the CPUs in the device tree fragment returned from the ibm,client-architecture-support (CAS) call. Providing the updated information is essential when CAS has negotiated compatibility options which require different cpu information to be presented to the guest. However, it should be safe to provide in other cases (it will just override the existing data in the device tree with identical data). This simplifies the code by removing the parameter and always providing the cpu update information. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>	2017-01-31 10:10:13 +11:00
Vincent Palatin	b39466269b	kvm: move cpu synchronization code Move the generic cpu_synchronize_ functions to the common hw_accel.h header, in order to prepare for the addition of a second hardware accelerator. Signed-off-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Vincent Palatin <vpalatin@chromium.org> Message-Id: <f5c3cffe8d520011df1c2e5437bb814989b48332.1484045952.git.vpalatin@chromium.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-01-19 22:07:46 +01:00
Peter Maydell	6bc56d317f	Base patches for MTTCG enablement. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQExBAABCAAbBQJYF07FFBxwYm9uemluaUByZWRoYXQuY29tAAoJEL/70l94x66D ppoIAI4AxWocso5WIUH6uEHjOAxw9ZNhZ92nF8VtcbvGtN/eh8Qk4jfRX+W/Jl0q D13Rm3m8ynNHqh8YFs+O6i/WSgxHGxKwb75mNr36HDnYnMFluTvRQkvYJUXRyRuL CVtNgy8+q8FbbWo+NiJ5I7gfk2Si4BQfZN0uCLqGuCwqvvA/spN13xUcpeBXEKhL TeDGZBT/atDnT2bRcve8E8g5/0RKjTL9EB0jwfJjHocT5bs+toPe6js9VnZDRNWN ZldcONgEHj3zAj9j7hTkVWFTGPSCx/tt6y6JeORq1oxk0mCCswEk0U9A3hLzLjc/ 94XHsLaEoZ7HNAKtkLc07NYhkQM= =+6Sj -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream-mttcg' into staging Base patches for MTTCG enablement. # gpg: Signature made Mon 31 Oct 2016 14:01:41 GMT # gpg: using RSA key 0xBFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream-mttcg: tcg: move locking for tb_invalidate_phys_page_range up *_run_on_cpu: introduce run_on_cpu_data type cpus: re-factor out handle_icount_deadline tcg: cpus rm tcg_exec_all() tcg: move tcg_exec_all and helpers above thread fn target-arm/arm-powerctl: wake up sleeping CPUs tcg: protect translation related stuff with tb_lock. translate-all: Add assert_(memory\|tb)_lock annotations linux-user/elfload: ensure mmap_lock() held while setting up tcg: comment on which functions have to be called with tb_lock held cpu-exec: include cpu_index in CPU_LOG_EXEC messages translate-all: add DEBUG_LOCKING asserts translate_all: DEBUG_FLUSH -> DEBUG_TB_FLUSH cpus: make all_vcpus_paused() return bool Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2016-10-31 15:29:12 +00:00
Paolo Bonzini	14e6fe12a7	_run_on_cpu: introduce run_on_cpu_data type This changes the _run_on_cpu APIs (and helpers) to pass data in a run_on_cpu_data type instead of a plain void *. This is because we sometimes want to pass a target address (target_ulong) and this fails on 32 bit hosts emulating 64 bit guests. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20161027151030.20863-24-alex.bennee@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-10-31 15:00:25 +01:00
Michael Roth	6787d27b04	spapr: add option vector handling in CAS-generated resets In some cases, ibm,client-architecture-support calls can fail. This could happen in the current code for situations where the modified device tree segment exceeds the buffer size provided by the guest via the call parameters. In these cases, QEMU will reset, allowing an opportunity to regenerate the device tree from scratch via boot-time handling. There are potentially other scenarios as well, not currently reachable in the current code, but possible in theory, such as cases where device-tree properties or nodes need to be removed. We currently don't handle either of these properly for option vector capabilities however. Instead of carrying the negotiated capability beyond the reset and creating the boot-time device tree accordingly, we start from scratch, generating the same boot-time device tree as we did prior to the CAS-generated and the same device tree updates as we did before. This could (in theory) cause us to get stuck in a reset loop. This hasn't been observed, but depending on the extensiveness of CAS-induced device tree updates in the future, could eventually become an issue. Address this by pulling capability-related device tree updates resulting from CAS calls into a common routine, spapr_dt_cas_updates(), and adding an sPAPROptionVector* parameter that allows us to test for newly-negotiated capabilities. We invoke it as follows: 1) When ibm,client-architecture-support gets called, we call spapr_dt_cas_updates() with the set of capabilities added since the previous call to ibm,client-architecture-support. For the initial boot, or a system reset generated by something other than the CAS call itself, this set will consist of all options supported both the platform and the guest. For calls to ibm,client-architecture-support immediately after a CAS-induced reset, we call spapr_dt_cas_updates() with only the set of capabilities added since the previous call, since the other capabilities will have already been addressed by the boot-time device-tree this time around. In the unlikely event that capabilities are removed since the previous CAS, we will generate a CAS-induced reset. In the unlikely event that we cannot fit the device-tree updates into the buffer provided by the guest, well generate a CAS-induced reset. 2) When a CAS update results in the need to reset the machine and include the updates in the boot-time device tree, we call the spapr_dt_cas_updates() using the full set of negotiated capabilities as part of the reset path. At initial boot, or after a reset generated by something other than the CAS call itself, this set will be empty, resulting in what should be the same boot-time device-tree as we generated prior to this patch. For CAS-induced reset, this routine will be called with the full set of capabilities negotiated by the platform/guest in the previous CAS call, which should result in CAS updates from previous call being accounted for in the initial boot-time device tree. Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> [dwg: Changed an int -> bool conversion to be more explicit] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-10-28 09:38:26 +11:00
Michael Roth	facdb8b63b	spapr_hcall: use spapr_ovec_* interfaces for CAS options Currently we access individual bytes of an option vector via ldub_phys() to test for the presence of a particular capability within that byte. Currently this is only done for the "dynamic reconfiguration memory" capability bit. If that bit is present, we pass a boolean value to spapr_h_cas_compose_response() to generate a modified device tree segment with the additional properties required to enable this functionality. As more capability bits are added, will would need to modify the code to add additional option vector accesses and extend the param list for spapr_h_cas_compose_response() to include similar boolean values for these parameters. Avoid this by switching to spapr_ovec_* helpers so we can do all the parsing in one shot and then test for these additional bits within spapr_h_cas_compose_response() directly. Cc: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-10-28 09:38:26 +11:00
Alex Bennée	e0eeb4a21a	cpus: pass CPUState to run_on_cpu helpers CPUState is a fairly common pointer to pass to these helpers. This means if you need other arguments for the async_run_on_cpu case you end up having to do a g_malloc to stuff additional data into the routine. For the current users this isn't a massive deal but for MTTCG this gets cumbersome when the only other parameter is often an address. This adds the typedef run_on_cpu_func for helper functions which has an explicit CPUState * passed as the first parameter. All the users of run_on_cpu and async_run_on_cpu have had their helpers updated to use CPUState where available. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> [Sergey Fedorov: - eliminate more CPUState in user data; - remove unnecessary user data passing; - fix target-s390x/kvm.c and target-s390x/misc_helper.c] Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> (ppc parts) Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> (s390 parts) Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <1470158864-17651-3-git-send-email-alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-09-27 11:57:29 +02:00
Nikunj A Dadhania	d76ab5e1c7	target-ppc: tlbie/tlbivax should have global effect tlbie (BookS) and tlbivax (BookE) plus the H_CALLs(pseries) should have a global effect. Introduces TLB_NEED_GLOBAL_FLUSH flag. During lazy tlb flush, after taking care of pending local flushes, check broadcast flush(at context synchronizing event ptesync/tlbsync, etc) is needed. Depending on the bitmask state of the tlb_need_flush, tlb is flushed from other cpus if needed and the flags are cleared. Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> [dwg: Use 'true' instead of '1' for call to check_tlb_flush()] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-09-23 12:39:07 +10:00
Nikunj A Dadhania	e3cffe6fad	target-ppc: add flag in check_tlb_flush() We flush the qemu TLB lazily. check_tlb_flush is called whenever we hit a context synchronizing event or instruction that requires a pending flush to be performed. However, we fail to handle broadcast TLB flush operations. In order to fix that efficiently, we want to differentiate whether check_tlb_flush() needs to only apply pending local flushes (isync instructions, interrupts, ...) or also global pending flush operations. The latter is only needed when executing instructions that are defined architecturally as synchronizing global TLB flush operations. This in our case is ptesync on BookS and tlbsync on BookE along with the paravirtualized hypervisor calls. Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> [dwg: Changed gen_check_tlb_flush() to also take a bool, and fixed some spelling errors in commit message] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-09-23 12:39:07 +10:00
Cédric Le Goater	1f0252e66e	ppc: simplify ppc_hash64_hpte_page_shift_noslb() The segment page shift parameter is never used. Let's remove it. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-07-05 14:31:08 +10:00
Aneesh Kumar K.V	c117590769	powerpc/mm: Update the WIMG check during H_ENTER Support for 0 value for memeory coherence is optional and with ppc64 we can always enable memory coherence. Linux kernel did that during the development of 4.7 kernel. But that resulted in failure in Qemu in H_ENTER hcall due to below check. The mentioned change was reverted in the kernel and kernel right now enable memory coherence only if cache inhibited is not set. Nevertheless update qemu WIMG flag check to cover the case where we enable memory coherence along with cache inhibited flag. In order to handle older and newer kernel version consider both Cache inhibitted and (cache inhibitted \| memory conference) as valid values for wimg flags. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-06-22 11:12:17 +10:00
Thomas Huth	b30ff227c2	ppc: Add PowerISA 2.07 compatibility mode Make sure that guests can use the PowerISA 2.07 CPU sPAPR compatibility mode when they request it and the target CPU supports it. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-06-14 10:41:38 +10:00
Thomas Huth	8cd2ce7aaa	ppc: Split pcr_mask settings into supported bits and the register mask The current pcr_mask values are ambiguous: Should these be the mask that defines valid bits in the PCR register? Or should these rather indicate which compatibility levels are possible? Anyway, POWER6 and POWER7 should certainly not use the same values here. So let's introduce an additional variable "pcr_supported" here which is used to indicate the valid compatibility levels, and use pcr_mask to signal the valid bits in the PCR register. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-06-14 10:41:38 +10:00
Thomas Huth	7386ae6372	ppc/spapr: Refactor h_client_architecture_support() CPU parsing code The h_client_architecture_support() function has become quite big and nested already. So factor out the code that takes care of the sPAPR compatibility PVRs (which will be modified by the following patches). Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-06-14 10:41:37 +10:00
Benjamin Herrenschmidt	cd0c6f4735	ppc: Do some batching of TCG tlb flushes On ppc64 especially, we flush the tlb on any slbie or tlbie instruction. However, those instructions often come in bursts of 3 or more (context switch will favor a series of slbie's for example to an slbia if the SLB has less than a certain number of entries in it, and tlbie's can happen in a series, with PAPR, H_BULK_REMOVE can remove up to 4 entries at a time. Doing a tlb_flush() each time is a waste of time. We end up doing a memset of the whole TLB, reloading it for the next instruction, memset'ing again, etc... Those instructions don't have to take effect immediately. For slbie, they can wait for the next context synchronizing event. For tlbie, the next tlbsync. This implements batching by keeping a flag that indicates that we have a TLB in need of flushing. We check it on interrupts, rfi's, isync's and tlbsync and flush the TLB if needed. This reduces the number of tlb_flush() on a boot to a ubuntu installer first dialog screen from roughly 360K down to 36K. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [clg: added a 'CPUPPCState *' variable in h_remove() and h_bulk_remove() ] Signed-off-by: Cédric Le Goater <clg@kaod.org> [dwg: removed spurious whitespace change, use 0/1 not true/false consistently, since tlb_need_flush has int type] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-05-30 13:20:04 +10:00
Paolo Bonzini	63c915526d	cpu: move exec-all.h inclusion out of cpu.h exec-all.h contains TCG-specific definitions. It is not needed outside TCG-specific files such as translate.c, exec.c or *helper.c. One generic function had snuck into include/exec/exec-all.h; move it to include/qom/cpu.h. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-05-19 16:42:29 +02:00
Paolo Bonzini	03dd024ff5	hw: explicitly include qemu/log.h Move the inclusion out of hw/hw.h, most files do not need it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-05-19 16:42:29 +02:00
Paolo Bonzini	77ac58ddc6	dma: do not depend on kvm_enabled() Memory barriers are needed also by Xen and, when the ioeventfd bugs are fixed, by TCG as well. sysemu/kvm.h is not anymore needed in sysemu/dma.h, move it to the actual users. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-05-19 16:42:28 +02:00
Cédric Le Goater	5c94b2a5e5	ppc: Rework POWER7 & POWER8 exception model From: Benjamin Herrenschmidt <benh@kernel.crashing.org> This patch fixes the current AIL implementation for POWER8. The interrupt vector address can be calculated directly from LPCR when the exception is handled. The excp_prefix update becomes useless and we can cleanup the H_SET_MODE hcall. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [clg: Removed LPES0/1 handling for HV vs. !HV Fixed LPCR_ILE case for POWERPC_EXCP_POWER8 ] Signed-off-by: Cédric Le Goater <clg@fr.ibm.com> [dwg: This was written as a cleanup, but it also fixes a real bug where setting an alternative interrupt location would not be correctly migrated] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-04-05 10:38:24 +10:00
Markus Armbruster	da34e65cb4	include/qemu/osdep.h: Don't include qapi/error.h Commit `57cb38b` included qapi/error.h into qemu/osdep.h to get the Error typedef. Since then, we've moved to include qemu/osdep.h everywhere. Its file comment explains: "To avoid getting into possible circular include dependencies, this file should not include any other QEMU headers, with the exceptions of config-host.h, compiler.h, os-posix.h and os-win32.h, all of which are doing a similar job to this file and are under similar constraints." qapi/error.h doesn't do a similar job, and it doesn't adhere to similar constraints: it includes qapi-types.h. That's in excess of 100KiB of crap most .c files don't actually need. Add the typedef to qemu/typedefs.h, and include that instead of qapi/error.h. Include qapi/error.h in .c files that need it and don't get it now. Include qapi-types.h in qom/object.h for uint16List. Update scripts/clean-includes accordingly. Update it further to match reality: replace config.h by config-target.h, add sysemu/os-posix.h, sysemu/os-win32.h. Update the list of includes in the qemu/osdep.h comment quoted above similarly. This reduces the number of objects depending on qapi/error.h from "all of them" to less than a third. Unfortunately, the number depending on qapi-types.h shrinks only a little. More work is needed for that one. Signed-off-by: Markus Armbruster <armbru@redhat.com> [Fix compilation without the spice devel packages. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-03-22 22:20:15 +01:00
David Gibson	c18ad9a54b	target-ppc: Eliminate kvmppc_kern_htab global `fa48b43` "target-ppc: Remove hack for ppc_hash64_load_hpte*() with HV KVM" purports to remove a hack in the handling of hash page tables (HPTs) managed by KVM instead of qemu. However, it actually went in the wrong direction. That patch requires anything looking for an external HPT (that is one not managed by the guest itself) to check both env->external_htab (for a qemu managed HPT) and kvmppc_kern_htab (for a KVM managed HPT). That's a problem because kvmppc_kern_htab is local to mmu-hash64.c, but some places which need to check for an external HPT are outside that, such as kvm_arch_get_registers(). The latter was subtly broken by the earlier patch such that gdbstub can no longer access memory. Basically a KVM managed HPT is much more like a qemu managed HPT than it is like a guest managed HPT, so the original "hack" was actually on the right track. This partially reverts `fa48b43`, so we again mark a KVM managed external HPT by putting a special but non-NULL value in env->external_htab. It then goes further, using that marker to eliminate the kvmppc_kern_htab global entirely. The ppc_hash64_set_external_hpt() helper function is extended to set that marker if passed a NULL value (if you're setting an external HPT, but don't have an actual HPT to set, the assumption is that it must be a KVM managed HPT). This also has some flow-on changes to the HPT access helpers, required by the above changes. Reported-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Tested-by: Greg Kurz <gkurz@linux.vnet.ibm.com>	2016-03-16 09:55:06 +11:00
Thomas Huth	3240dd9a69	hw/ppc/spapr: Implement the h_page_init hypercall This hypercall either initializes a page with zeros, or copies another page. According to LoPAPR, the i-cache of the page should also be flushed if using H_ICACHE_INVALIDATE or H_ICACHE_SYNCHRONIZE, and the d-cache should be synchronized to the RAM if the H_ICACHE_SYNCHRONIZE flag is used. For this, two new functions are introduced, kvmppc_dcbst_range() and kvmppc_icbi()_range, which use the corresponding assembler instructions to flush the caches if running with KVM on Power. If the code runs with TCG instead, the code only uses tb_flush(), assuming that this will be enough for synchronization. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-02-25 13:58:44 +11:00
Thomas Huth	e49ff266f8	hw/ppc/spapr: Implement the h_set_xdabr hypercall The H_SET_XDABR hypercall is similar to H_SET_DABR, but also sets the extended DABR (DABRX) register. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-02-17 09:59:30 +11:00
Thomas Huth	af08a58f0c	hw/ppc/spapr: Implement h_set_dabr According to LoPAPR, h_set_dabr should simply set DABRX to 3 (if the register is available), and load the parameter into DABR. If DABRX is not available, the hypervisor has to check the "Breakpoint Translation" bit of the DABR register first. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-02-17 09:59:30 +11:00
Thomas Huth	423576f771	hw/ppc/spapr: Add h_set_sprg0 hypercall This is a very simple hypercall that only sets up the SPRG0 register for the guest (since writing to SPRG0 was only permitted to the hypervisor in older versions of the PowerISA). Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-02-17 09:59:30 +11:00
David Gibson	1114e712c9	target-ppc: Helper to determine page size information from hpte alone h_enter() in the spapr code needs to know the page size of the HPTE it's about to insert. Unlike other paths that do this, it doesn't have access to the SLB, so at the moment it determines this with some open-coded tests which assume POWER7 or POWER8 page size encodings. To make this more flexible add ppc_hash64_hpte_page_shift_noslb() to determine both the "base" page size per segment, and the individual effective page size from an HPTE alone. This means that the spapr code should now be able to handle any page size listed in the env->sps table. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: Alexander Graf <agraf@suse.de>	2016-01-30 23:49:27 +11:00
David Gibson	61a36c9b5a	target-ppc: Add new TLB invalidate by HPTE call for hash64 MMUs When HPTEs are removed or modified by hypercalls on spapr, we need to invalidate the relevant pages in the qemu TLB. Currently we do that by doing some complicated calculations to work out the right encoding for the tlbie instruction, then passing that to ppc_tlb_invalidate_one()... which totally ignores the argument and flushes the whole tlb. Avoid that by adding a new flush-by-hpte helper in mmu-hash64.c. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: Alexander Graf <agraf@suse.de>	2016-01-30 23:49:27 +11:00
David Gibson	7ef23068bf	target-ppc: Convert mmu-hash{32,64}.[ch] from CPUPPCState to PowerPCCPU Like a lot of places these files include a mixture of functions taking both the older CPUPPCState env and newer PowerPCCPU cpu. Move a step closer to cleaning this up by standardizing on PowerPCCPU, except for the helper_* functions which are called with the CPUPPCState * from tcg. Callers and some related functions are updated as well, the boundaries of what's changed here are a bit arbitrary. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Alexander Graf <agraf@suse.de>	2016-01-30 23:37:38 +11:00
David Gibson	ecbc25fa86	pseries: Allow TCG h_enter to work with hotplugged memory The implementation of the H_ENTER hypercall for PAPR guests needs to enforce correct access attributes on the inserted HPTE. This means determining if the HPTE's real address is a regular RAM address (which requires attributes for coherent access) or an IO address (which requires attributes for cache-inhibited access). At the moment this check is implemented with (raddr < machine->ram_size), but that only handles addresses in the base RAM area, not any hotplugged RAM. This patch corrects the problem with a new helper. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>	2016-01-30 23:37:38 +11:00
David Gibson	f9ab1e87ed	ppc: Clean up error handling in ppc_set_compat() Current ppc_set_compat() returns -1 for errors, and also (unconditionally) reports an error message. The caller in h_client_architecture_support() may then report it again using an outdated fprintf(). Clean this up by using the modern error reporting mechanisms. Also add strerror(errno) to the error message. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Markus Armbruster <armbru@redhat.com>	2016-01-30 23:37:37 +11:00
David Gibson	27ac3e06d5	spapr: Remove abuse of rtas_ld() in h_client_architecture_support h_client_architecture_support() uses rtas_ld() for general purpose memory access, despite the fact that it's not an RTAS routine at all and rtas_ld makes things more awkward. Clean this up by replacing rtas_ld() calls with appropriate ldXX_phys() calls. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>	2016-01-30 23:37:36 +11:00
Peter Maydell	0d75590d91	ppc: Clean up includes Clean up includes so that osdep.h is included first and headers which it implies are not included manually. This commit was created with scripts/clean-includes. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 1453832250-766-6-git-send-email-peter.maydell@linaro.org	2016-01-29 15:07:22 +00:00
Bharata B Rao	03d196b7c5	spapr: Support ibm,dynamic-reconfiguration-memory Parse ibm,architecture.vec table obtained from the guest and enable memory node configuration via ibm,dynamic-reconfiguration-memory if guest supports it. This is in preparation to support memory hotplug for sPAPR guests. This changes the way memory node configuration is done. Currently all memory nodes are built upfront. But after this patch, only memory@0 node for RMA is built upfront. Guest kernel boots with just that and rest of the memory nodes (via memory@XXX or ibm,dynamic-reconfiguration-memory) are built when guest does ibm,client-architecture-support call. Note: This patch needs a SLOF enhancement which is already part of SLOF binary in QEMU. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2015-09-23 10:51:10 +10:00
Thomas Huth	aaf87c6616	ppc/spapr: Use qemu_log_mask() for hcall_dprintf() To see the output of the hcall_dprintf statements, you currently have to enable the DEBUG_SPAPR_HCALLS macro in include/hw/ppc/spapr.h. This is ugly because a) not every user who wants to debug guest problems can or wants to recompile QEMU to be able to see such issues, and b) since this macro is disabled by default, the code in the hcall_dprintf() brackets tends to bitrot until somebody temporarily enables that macro again. Since the hcall_dprintf statements except one indicate guest problems, let's always use qemu_log_mask(LOG_GUEST_ERROR, ...) for this macro instead. One spot indicated an unimplemented host feature, so this is changed into qemu_log_mask(LOG_UNIMP, ...) instead. Now it's possible to see all those messages by simply adding the CLI parameter "-d guest_errors,unimp", without the need to re-compile the binary. Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2015-09-23 10:51:09 +10:00
David Gibson	fb16499418	spapr: Remove obsolete ram_limit field from sPAPRMachineState The ram_limit field was imported from sPAPREnvironment where it predates the machine's ram size being available generically from machine->ram_size. Worse, the existing code was inconsistent about where it got the ram size from. Sometimes it used spapr->ram_limit, sometimes the global 'ram_size' and sometimes a local 'ram_size' masking the global. This cleans up the code to consistently use machine->ram_size, eliminating spapr->ram_limit in the process. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>	2015-07-07 17:44:50 +02:00
David Gibson	28e0204254	spapr: Merge sPAPREnvironment into sPAPRMachineState The code for -machine pseries maintains a global sPAPREnvironment structure which keeps track of general state information about the guest platform. This predates the existence of the MachineState structure, but performs basically the same function. Now that we have the generic MachineState, fold sPAPREnvironment into sPAPRMachineState, the pseries specific subclass of MachineState. This is mostly a matter of search and replace, although a few places which relied on the global spapr variable are changed to find the structure via qdev_get_machine(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>	2015-07-07 17:44:50 +02:00
David Gibson	eefaccc02b	pseries: Switch VGA endian on H_SET_MODE When the guest switches the interrupt endian mode, which essentially means a global machine endian switch, we want to change the VGA framebuffer endian mode as well in order to be backward compatible with existing guests who don't know about the new endian control register. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2015-03-09 15:00:03 +01:00
Peter Maydell	7d0cd464a7	hw/ppc/spapr_hcall.c: Fix typo in function names Fix a typo in the names of a couple of functions (s/resouce/resource/). Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2014-09-08 12:50:47 +02:00
Peter Maydell	b653282ecc	hw/ppc/spapr_hcall.c: Add ULL suffix to 64 bit constant Add ULL suffix to 64 bit constant to prevent compiler warnings on some 32 bit platforms. Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2014-07-08 16:03:19 +01:00
Alexey Kardashevskiy	d5ac4f5433	spapr_hcall: Add address-translation-mode-on-interrupt resource in H_SET_MODE This adds handling of the RESOURCE_ADDR_TRANS_MODE resource from the H_SET_MODE, for POWER8 (PowerISA 2.07) only. This defines AIL flags for LPCR special register. This changes @excp_prefix according to the mode, takes effect in TCG. This turns support of a new capability PPC2_ISA207S flag for TCG. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Tom Musta <tommusta@gmail.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2014-06-16 13:24:45 +02:00
Alexey Kardashevskiy	c4015bbd50	spapr_hcall: Split h_set_mode() This moves H_SET_MODE_RESOURCE_LE handler to a separate function as there are other "resources" coming and this is going to become ugly. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Tom Musta <tommusta@gmail.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2014-06-16 13:24:45 +02:00

1 2 3 4

181 Commits