TYPE_ICS_SIMPLE is the only subtype of TYPE_ICS_BASE that's ever
instantiated. The existence of different classes is mostly a hang
over from when we (misguidedly) had separate subtypes for the KVM and
non-KVM version of the device.
There could be some call for an abstract base type for ICS variants
that use a different representation of their state (PowerNV PHB3 might
want this). The current split isn't really in the right place for
that though. If we need this in future, we can re-implement it more
in line with what we actually need.
So, collapse the two classes together into just TYPE_ICS.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Currently TYPE_XICS_BASE and TYPE_XICS_SIMPLE have their own reset methods,
using the standard technique for having the subtype call the supertype's
methods before doing its own thing.
But TYPE_XICS_SIMPLE is the only subtype of TYPE_XICS_BASE ever
instantiated, so there's no point having the split here. Merge them
together into just an ics_reset() function.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
There are a number of ics_simple_*() functions that aren't actually
specific to TYPE_XICS_SIMPLE at all, and are equally valid on
TYPE_XICS_BASE. Rename them to ics_*() accordingly.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Currently ics_reject(), ics_resend() and ics_eoi() indirect through
class methods. But there's only one implementation of each method,
the one in TYPE_ICS_SIMPLE. TYPE_ICS_BASE has no implementation, but
it's never instantiated, and has no other subtypes.
So clean up by eliminating the method and just having ics_reject(),
ics_resend() and ics_eoi() contain the logic directly.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Interface instances should never be directly dereferenced. So, the common
practice is to make them incomplete types to make sure no-one does that.
XICSFrabric, however, had a dummy type which is less safe.
We were also using OBJECT_CHECK() where we should have been using
INTERFACE_CHECK().
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
When vCPUs are hotplugged, they are added to the QEMU CPU list before
being fully realized. This can crash the XIVE presenter because the
'tctx' pointer is not necessarily initialized when looking for a
matching target.
These vCPUs are not valid targets for the presenter. Skip them.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191001085722.32755-1-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
This allows us to remove more endian-specific defines from int_helper.c.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20190926204453.31837-1-mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
The ibm,client-architecture-support call is a way for the guest to
negotiate capabilities with a hypervisor. It is implemented as:
- the guest calls SLOF via client interface;
- SLOF calls QEMU (H_CAS hypercall) with an options vector from the guest;
- QEMU returns a device tree diff (which uses FDT format with
an additional header before it);
- SLOF walks through the partial diff tree and updates its internal tree
with the values from the diff.
This changes QEMU to simply re-render the entire tree and send it as
an update. SLOF can handle this already mostly, [1] is needed before this
can be applied. This stores the resulting tree in the spapr machine to have
the latest valid FDT copy possible (this should not matter much as
H_UPDATE_DT happens right after that but nevertheless).
The benefit is reduced code size as there is no need for another set of
DT rendering helpers such as spapr_fixup_cpu_dt().
The downside is that the updates are bigger now (as they include all
nodes and properties) but the difference on a '-smp 256,threads=1' system
before/after is 2.35s vs. 2.5s.
[1] https://patchwork.ozlabs.org/patch/1152915/
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
QEMU does not allocate PCI resources (BARs) in any case - coldplug devices
are configured by the firmware and hotplug devices rely on the guest
system to do the assignment via the PCI rescan mechanism. Also in order
to create non empty "assigned-addresses", the device has to be enabled
(i.e. PCI_COMMAND needs the MMIO bit set) first as otherwise
io_regions[i].addr are -1, and devices are not enabled at this point.
This removes "assigned-addresses" and leaves it to those who actually
do resource allocation.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20190927022651.71642-1-aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Now that the parameters to both set_dfp64() and set_dfp128() are exactly the
same, there is no need for an explicit if() statement to determine which
function should be called based upon size. Instead we can simply use the
preprocessor to generate the call to set_dfp##size() directly.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20190926185801.11176-8-mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Switch over all accesses to the decimal numbers held in struct PPC_DFP from
using HI_IDX and LO_IDX to using the VsrD() macro instead. Not only does this
allow the compiler to ensure that the various dfp_* functions are being passed
a ppc_vsr_t rather than an arbitrary uint64_t pointer, but also allows the
host endian-specific HI_IDX and LO_IDX to be completely removed from
dfp_helper.c.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20190926185801.11176-7-mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
There are several places in dfp_helper.c that access the decimal number
representations in struct PPC_DFP via HI_IDX and LO_IDX defines which are set
at the top of dfp_helper.c according to the host endian.
However we can instead switch to using ppc_vsr_t for decimal numbers and then
make subsequent use of the existing VsrD() macros to access the correct
element regardless of host endian. Note that 64-bit decimals are stored in the
LSB of ppc_vsr_t (equivalent to VsrD(1)).
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20190926185801.11176-6-mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Most of the DFP helper functions call decimal{64,128}FromNumber() just before
returning in order to convert the decNumber stored in dfp.t64 back to a
Decimal{64,128} to write back to the FP registers.
Introduce new dfp_finalize_decimal{64,128}() helper functions which both enable
the parameter list to be reduced considerably, and also help minimise the
changes required in the next patch.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20190926185801.11176-5-mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Since commit ef96e3ae96 "target/ppc: move FP and VMX registers into aligned vsr
register array" FP registers are no longer stored consecutively in memory and so
the current method of combining FP register pairs into DFP numbers is incorrect.
Firstly update the definition of the dh_*_fprp defines in helper.h to reflect
that FP registers are now stored as part of an array of ppc_vsr_t elements
rather than plain uint64_t elements, and then introduce a new ppc_fprp_t type
which conceptually represents a DFP even-odd register pair to be consumed by the
DFP helper functions.
Finally update the new DFP {get,set}_dfp{64,128}() helper functions to convert
between DFP numbers and DFP even-odd register pairs correctly, making use of the
existing VsrD() macro to access the correct elements regardless of host endian.
Fixes: ef96e3ae96 "target/ppc: move FP and VMX registers into aligned vsr register array"
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20190926185801.11176-4-mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The existing functions (now incorrectly) assume that the MSB and LSB of DFP
numbers are stored as consecutive 64-bit words in memory. Instead of accessing
the DFP numbers directly, introduce set_dfp{64,128}() helper functions to ease
the switch to the correct representation.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20190926185801.11176-3-mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The existing functions (now incorrectly) assume that the MSB and LSB of DFP
numbers are stored as consecutive 64-bit words in memory. Instead of accessing
the DFP numbers directly, introduce get_dfp{64,128}() helper functions to ease
the switch to the correct representation.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20190926185801.11176-2-mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This fixes USB host bus adapter name in the device tree to match QEMU's
one.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
SLOF implements one itself so let's remove it from QEMU. It is one less
image and simpler setup as the RTAS blob never stays in its initial place
anyway as the guest OS always decides where to put it.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We are going to use spapr_build_fdt() for the boot time FDT and as an
update for SLOF during handling of H_CAS. SLOF will apply all properties
from the QEMU's FDT which is usually ok unless there are properties
changed by grub or guest kernel. The properties are:
bootargs, linux,initrd-start, linux,initrd-end, linux,stdout-path,
linux,rtas-base, linux,rtas-entry. Resetting those during CAS will most
likely cause grub failure.
Don't create such properties if we're booting without "-kernel" and
"-initrd" so they won't get included into the DT update blob and
therefore the guest is more likely to boot successfully.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[dwg: Tweaked commit message based on Greg Kurz's input]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The device tree build by QEMU at the machine reset time is used by SLOF
to build its internal device tree but the node names are not preserved
exactly so when QEMU provides a device tree update in response to H_CAS,
it might become tricky to match a node from the update blob to
the actual node in SLOF.
This removed leading zeroes from "memory@" nodes and makes
the DTC checker happy.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Add a missing g_free(fdt) if the resulting tree is bigger
than the space allocated by SLOF.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
The number of NUMA nodes in the system is fixed from the command line.
Therefore, there's no need to recalculate it at reset time, and we can
determine the special gpu_numa_id value used for NVLink2 devices at init
time.
This simplifies the reset path a bit which will make further improvements
easier.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Certain old guest versions don't understand the radix MMU introduced with
POWER ISA 3.0, but incorrectly select it if presented with the option at
CAS time. We workaround this in qemu by explicitly excluding the radix
(and other ISA 3.0 linked) options if the guest doesn't explicitly note
support for ISA 3.0.
This is handled by the 'cas_legacy_guest_workaround' flag, which is pretty
vague. Rename it to 'cas_pre_isa3_guest' to be clearer about what it's for.
In addition, we unnecessarily call spapr_populate_pa_features() with
different options when initially constructing the device tree and when
adjusting it at CAS time. At the initial construct time cas_pre_isa3_guest
is already false, so we can still use the flag, rather than explicitly
overriding it to be false at the callsite.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
On POWER8 systems the Directed Privileged Door-bell Exception State
register (DPDES) stores doorbell pending status, one bit per a thread
of a core, set by "msgsndp" instruction. The register is shared among
threads of the same core and KVM on POWER9 emulates it in a similar way
(POWER9 does not have DPDES).
DPDES is shared but QEMU assumes all SPRs are per thread so the only safe
way to write DPDES back to VCPU before running a guest is doing so
while all threads are pulled out of the guest so DPDES cannot change.
There is only one situation when this condition is met: incoming migration
when all threads are stopped. Otherwise any QEMU HMP/QMP command causing
kvm_arch_put_registers() (for example printing registers or dumping memory)
can clobber DPDES in a race with other vcpu threads.
This changes DPDES handling so it is not written to KVM at runtime.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20190923084110.34643-1-aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
There are FPSCR-related defines in target/ppc/cpu.h which can be used in
place of constants and explicit shifts which arguably improve the code a
bit in places.
Signed-off-by: Paul A. Clarke <pc@us.ibm.com>
Message-Id: <1568817169-1721-1-git-send-email-pc@us.ibm.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
ISA 3.0B added a set of Floating-Point Status and Control Register (FPSCR)
instructions: mffsce, mffscdrn, mffscdrni, mffscrn, mffscrni, mffsl.
This patch adds support for 'mffsce' instruction.
'mffsce' is identical to 'mffs', except that it also clears the exception
enable bits in the FPSCR.
On CPUs without support for 'mffsce' (below ISA 3.0), the
instruction will execute identically to 'mffs'.
Signed-off-by: Paul A. Clarke <pc@us.ibm.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <1568817082-1384-1-git-send-email-pc@us.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
ISA 3.0B added a set of Floating-Point Status and Control Register (FPSCR)
instructions: mffsce, mffscdrn, mffscdrni, mffscrn, mffscrni, mffsl.
This patch adds support for 'mffscrn' and 'mffscrni' instructions.
'mffscrn' and 'mffscrni' are similar to 'mffsl', except they do not return
the status bits (FI, FR, FPRF) and they also set the rounding mode in the
FPSCR.
On CPUs without support for 'mffscrn'/'mffscrni' (below ISA 3.0), the
instructions will execute identically to 'mffs'.
Signed-off-by: Paul A. Clarke <pc@us.ibm.com>
Message-Id: <1568817081-1345-1-git-send-email-pc@us.ibm.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
A typical pseries VM with 16 vCPUs, one disk, one network adapater
uses less than 100 interrupts but the whole IRQ number space of the
QEMU machine is allocated at reset time and it is 8K wide. This is
wasting a considerable amount of interrupt numbers in the global IRQ
space which has 1M interrupts per socket on a POWER9.
To optimise the HW resources, only request at the KVM level interrupts
which have been claimed by the guest. This will help to increase the
maximum number of VMs per system and also help supporting nested guests
using the XIVE interrupt mode.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190911133937.2716-3-clg@kaod.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <156942766014.1274533.10792048853177121231.stgit@bahia.lan>
[dwg: Folded in fix up from Greg Kurz]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
It will help us to discard interrupt numbers which have not been
claimed in the next patch.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190911133937.2716-2-clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
add PnvHomer device model to emulate homer memory access
for pstate table, occ-sensors, slw, occ static and dynamic
values for Power8 and Power9 chips.
Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Message-Id: <20190912093056.4516-4-bala24@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
emulate occ common area region with occ sram device model which
occ and skiboot uses it to communicate regarding sensors, slw
and HWMON in PowerNV emulated host.
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Message-Id: <20190912093056.4516-3-bala24@linux.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
During PowerNV boot skiboot populates the device tree by
retrieving base address of homer/occ common area from
PBA BARs and prd ipoll mask by accessing xscom read/write
accesses.
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Message-Id: <20190912093056.4516-2-bala24@linux.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Unless the machine was started with kernel-irqchip=on, we cannot easily
tell if we're actually using an in-kernel or an emulated irqchip. This
information is important enough that it is worth printing it in 'info
pic'.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <156829860985.2073005.5893493824873412773.stgit@bahia.tls.ibm.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
There were few trailing comments after `/*` instead in
new line and line more than 80 character, these fixes are
trivial and doesn't change any logic in code.
Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Message-Id: <20190911142925.19197-5-bala24@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Coverity is reporting in CID 1405304 that tpm_execute() may pass a NULL
tpm_proxy->host_path pointer to open(). This is based on the fact that
h_tpm_comm() does a NULL check on tpm_proxy->host_path and then passes
tpm_proxy to tpm_execute().
The check in h_tpm_comm() is abusive actually since a spapr-proxy-tpm
requires a non NULL host_path property, as checked during realize.
Fixes: 0fb6bd0732
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <156805260916.1779401.11054185183758185247.stgit@bahia.lan>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Fixes the dtc output :
ERROR (node_name_chars): //bmc: Bad character '/' in node name
Warning (avoid_unnecessary_addr_size): /bmc: unnecessary #address-cells/#size-cells without "ranges" or child "reg" property
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190902092932.20200-1-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
When we hotplug a CPU on memory-less/cpu-less node, the linux kernel
crashes.
This happens because linux kernel needs to know the NUMA topology at
start to be able to initialize the distance lookup table.
On pseries, the topology is provided by the firmware via the existing
CPUs and memory information. Thus a node without memory and CPU cannot be
discovered by the kernel.
To avoid the kernel crash, do not allow to start pseries with empty
nodes.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-Id: <20190830161345.22436-1-lvivier@redhat.com>
[dwg: Rework to cope with movement of numa state from globals to MachineState]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>