On POWER9, the Client Architecture Support (CAS) negotiation process
determines whether the guest operates in XIVE Legacy compatibility or
in XIVE exploitation mode. Now that we have initial guest support for
the XIVE interrupt controller, let's fix the bits definition which have
evolved in the latest specs.
The platform advertises the XIVE Exploitation Mode support using the
property "ibm,arch-vec-5-platform-support-vec-5", byte 23 bits 0-1 :
- 0b00 XIVE legacy mode Only
- 0b01 XIVE exploitation mode Only
- 0b10 XIVE legacy or exploitation mode
The OS asks for XIVE Exploitation Mode support using the property
"ibm,architecture-vec-5", byte 23 bits 0-1:
- 0b00 XIVE legacy mode Only
- 0b01 XIVE exploitation mode Only
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
KVM now allows writing to KVM_CAP_PPC_SMT which has previously been
read only. Doing so causes KVM to act, for that VM, as if the host's
SMT mode was the given value. This is particularly important on Power
9 systems because their default value is 1, but they are able to
support values up to 8.
This patch introduces a way to control this capability via a new
machine property called VSMT ("Virtual SMT"). If the value is not set
on the command line a default is chosen that is, when possible,
compatible with legacy systems.
Note that the intialization of KVM_CAP_PPC_SMT has changed slightly
because it has changed (in KVM) from a global capability to a
VM-specific one. This won't cause a problem on older KVMs because VM
capabilities fall back to global ones.
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Allow MAL with more RX and TX channels as found in newer versions.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This device appears in other SoCs as well not just in 405 ones
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The concept of a VCPU ID that differs from the CPU's index
(cpu->cpu_index) exists only within SPAPR machines so, move the
functions ppc_get_vcpu_id() and ppc_get_cpu_by_vcpu_id() into spapr.c
and rename them appropriately.
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This patch is a follow up on the discussions made in patch
"hw/ppc: disable hotplug before CAS is completed" that can be
found at [1].
At this moment, we do not support CPU/memory hotplug in early
boot stages, before CAS. When a hotplug occurs, the event is logged
in an internal RTAS event log queue and an IRQ pulse is fired. In
regular conditions, the guest handles the interrupt by executing
check_exception, fetching the generated hotplug event and enabling
the device for use.
In early boot, this IRQ isn't caught (SLOF does not handle hotplug
events), leaving the event in the rtas event log queue. If the guest
executes check_exception due to another hotplug event, the re-assertion
of the IRQ ends up de-queuing the first hotplug event as well. In short,
a device hotplugged before CAS is considered coldplugged by SLOF.
This leads to device misbehavior and, in some cases, guest kernel
Ooops when trying to unplug the device.
A proper fix would be to turn every device hotplugged before CAS
as a colplugged device. This is not trivial to do with the current
code base though - the FDT is written in the guest memory at
ppc_spapr_reset and can't be retrieved without adding extra state
(fdt_size for example) that will need to managed and migrated. Adding
the hotplugged DT in the middle of CAS negotiation via the updated DT
tree works with CPU devs, but panics the guest kernel at boot. Additional
analysis would be necessary for LMBs and PCI devices. There are
questions to be made in QEMU/SLOF/kernel level about how we can make
this change in a sustainable way.
With Linux guests, a fix would be the kernel executing check_exception
at boot time, de-queueing the events that happened in early boot and
processing them. However, even if/when the newer kernels start
fetching these events at boot time, we need to take care of older
kernels that won't be doing that.
This patch works around the situation by issuing a CAS reset if a hotplugged
device is detected during CAS:
- the DRC conditions that warrant a CAS reset is the same as those that
triggers a DRC migration - the DRC must have a device attached and
the DRC state is not equal to its ready_state. With that in mind, this
patch makes use of 'spapr_drc_needed' to determine if a CAS reset
is needed.
- In the middle of CAS negotiations, the function
'spapr_hotplugged_dev_before_cas' goes through all the DRCs to see
if there are any DRC that requires a reset, using spapr_drc_needed. If
that happens, returns '1' in 'spapr_h_cas_compose_response' which will set
spapr->cas_reboot to true, causing the machine to reboot.
No changes are made for coldplug devices.
[1] http://lists.nongnu.org/archive/html/qemu-devel/2017-08/msg02855.html
Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The sPAPR machine isn't clearing up the pending events QTAILQ on
machine reboot. This allows for unprocessed hotplug/epow events
to persist in the queue after reset and, when reasserting the IRQs in
check_exception later on, these will be being processed by the OS.
This patch implements a new function called 'spapr_clear_pending_events'
that clears up the pending_events QTAILQ. This helper is then called
inside ppc_spapr_reset to clear up the events queue, preventing
old/deprecated events from persisting after a reset.
Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We've now implemented a PAPR extension allowing PAPR guest to resize
their hash page table (HPT) during runtime.
This patch makes use of that facility to allocate smaller HPTs by default.
Specifically when a guest is aware of the HPT resize facility, qemu sizes
the HPT to the initial memory size, rather than the maximum memory size on
the assumption that the guest will resize its HPT if necessary for hot
plugged memory.
When the initial memory size is much smaller than the maximum memory size
(a common configuration with e.g. oVirt / RHEV) then this can save
significant memory on the HPT.
If the guest does *not* advertise HPT resize awareness when it makes the
ibm,client-architecture-support call, qemu resizes the HPT for maxmimum
memory size (unless it's been configured not to allow such guests at all).
For now we make that reallocation assuming the guest has not yet used the
HPT at all. That's true in practice, but not, strictly, an architectural
or PAPR requirement. If we need to in future we can fix this by having
the client-architecture-support call reboot the guest with the revised
HPT size (the client-architecture-support call is explicitly permitted to
trigger a reboot in this way).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
This patch implements hypercalls allowing a PAPR guest to resize its own
hash page table. This will eventually allow for more flexible memory
hotplug.
The implementation is partially asynchronous, handled in a special thread
running the hpt_prepare_thread() function. The state of a pending resize
is stored in SPAPR_MACHINE->pending_hpt.
The H_RESIZE_HPT_PREPARE hypercall will kick off creation of a new HPT, or,
if one is already in progress, monitor it for completion. If there is an
existing HPT resize in progress that doesn't match the size specified in
the call, it will cancel it, replacing it with a new one matching the
given size.
The H_RESIZE_HPT_COMMIT completes transition to a resized HPT, and can only
be called successfully once H_RESIZE_HPT_PREPARE has successfully
completed initialization of a new HPT. The guest must ensure that there
are no concurrent accesses to the existing HPT while this is called (this
effectively means stop_machine() for Linux guests).
For now H_RESIZE_HPT_COMMIT goes through the whole old HPT, rehashing each
HPTE into the new HPT. This can have quite high latency, but it seems to
be of the order of typical migration downtime latencies for HPTs of size
up to ~2GiB (which would be used in a 256GiB guest).
In future we probably want to move more of the rehashing to the "prepare"
phase, by having H_ENTER and other hcalls update both current and
pending HPTs. That's a project for another day, but should be possible
without any changes to the guest interface.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This introduces stub implementations of the H_RESIZE_HPT_PREPARE and
H_RESIZE_HPT_COMMIT hypercalls which we hope to add in a PAPR
extension to allow run time resizing of a guest's hash page table. It
also adds a new machine property for controlling whether this new
facility is available.
For now we only allow resizing with TCG, allowing it with KVM will require
kernel changes as well.
Finally, it adds a new string to the hypertas property in the device
tree, advertising to the guest the availability of the HPT resizing
hypercalls. This is a tentative suggested value, and would need to be
standardized by PAPR before being merged.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
e6f7e110ee "ppc/xics: remove the XICSState classes" got rid of
XICSState, this is just an leftover.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
According to PAPR, the DR-indicator should only be valid for physical DRCs,
not logical DRCs. At the moment we implement it for all DRCs, so restrict
it to physical ones only.
We move the state to the physical DRC subclass, which means adding some
QOM boilerplate to handle the newly distinct type.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
Most of the time, the state of a DRC object is contained in the single
'state' variable. However, during the transition from UNISOLATE to
CONFIGURED state requires multiple calls to the ibm,configure-connector
RTAS call to retrieve the device tree for the attached device. We need
some extra state to keep track of where we're up to in delivering the
device tree information to the guest.
Currently that extra state is in a sPAPRConfigureConnectorState
substructure which is only allocated when we're in the middle of the
configure connector process. That sounds like a good idea, but the extra
state is only two integers - on many platforms that will take up the same
room as the (maybe NULL) ccs pointer even before malloc() overhead. Plus
it's another object whose lifetime we need to manage. In short, it's not
worth it.
So, fold the sPAPRConfigureConnectorState substructure directly into the
DRC object.
Previously the structure was allocated lazily when the configure-connector
call discovers it's not there. Now, we need to initialize the subfields
pre-emptively, as soon as we enter UNISOLATE state.
Although it's not strictly necessary (the field values should only ever
be consulted when in UNISOLATE state), we try to keep them at -1 when in
other states, as a debugging aid.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
Each DRC has three fields describing its state: isolation_state,
allocation_state and configured. At first this seems like a reasonable
representation, since its based directly on the PAPR defined
isolation-state and allocation-state indicators. However:
* Only a few combinations of the two fields' values are permitted
* allocation_state isn't used at all for physical DRCs
* The indicators are write only so they don't really have a well
defined current value independent of each other
This replaces these variables with a single state variable, whose names
and numbers are based on the diagram in LoPAPR section 13.4. Along with
this we add code to check the current state on various operations and make
sure the requested transition is permitted.
Strictly speaking, this makes guest visible changes to behaviour (since we
probably allowed some transitions we shouldn't have before). However, a
hypothetical guest broken by that wasn't PAPR compliant, and probably
wouldn't have worked under PowerVM.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
'awaiting_release' indicates that the host has requested an unplug of the
device attached to the DRC, but the guest has not (yet) put the device
into a state where it is safe to complete removal.
1. Rename it to 'unplug_requested' which to me at least is clearer
2. Remove the ->release_pending() method used to check this from outside
spapr_drc.c. The method only plausibly has one implementation, so use
a plain function (spapr_drc_unplug_requested()) instead.
3. Remove it from the migration stream. Attempting to migrate mid-unplug
is broken not just for spapr - in general management has no good way to
determine if the device should be present on the destination or not. So,
until that's fixed, there's no point adding extra things to the stream.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
This function has two unused parameters - remove them.
It also sets awaiting_release on all paths, except one. On that path
setting it is harmless, since it will be immediately cleared by
spapr_drc_release(). So factor it out of the if statements.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
The awaiting_allocation flag in the DRC was introduced by aab9913
"spapr_drc: Prevent detach racing against attach for CPU DR", allegedly to
prevent a guest crash on racing attach and detach. Except.. information
from the BZ actually suggests a qemu crash, not a guest crash. And there
shouldn't be a problem here anyway: if the guest has already moved the DRC
away from UNUSABLE state, the detach would already be deferred, and if it
hadn't it should be safe to detach it (the guest should fail gracefully
when it attempts to change the allocation state).
I think this was probably just a bandaid for some other problem in the
state management. So, remove awaiting_allocation and associated code.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Greg Kurz <groug@kaod.org>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
When migrating a guest which has already had devices hotplugged,
libvirt typically starts the destination qemu with -incoming defer,
adds those hotplugged devices with qmp, then initiates the incoming
migration.
This causes problems for the management of spapr DRC state. Because
the device is treated as hotplugged, it goes into a DRC state for a
device immediately after it's plugged, but before the guest has
acknowledged its presence. However, chances are the guest on the
source machine *has* acknowledged the device's presence and configured
it.
If the source has fully configured the device, then DRC state won't be
sent in the migration stream: for maximum migration compatibility with
earlier versions we don't migrate DRCs in coldplug-equivalent state.
That means that the DRC effectively changes state over the migrate,
causing problems later on.
In addition, logging hotplug events for these devices isn't what we
want because a) those events should already have been issued on the
source host and b) the event queue should get wiped out by the
incoming state anyway.
In short, what we really want is to treat devices added before an
incoming migration as if they were coldplugged.
To do this, we first add a spapr_drc_hotplugged() helper which
determines if the device is hotplugged in the sense relevant for DRC
state management. We only send hotplug events when this is true.
Second, when we add a device which isn't hotplugged in this sense, we
force a reset of the DRC state - this ensures the DRC is in a
coldplug-equivalent state (there isn't usually a system reset between
these device adds and the incoming migration).
This is based on an earlier patch by Laurent Vivier, cleaned up and
extended.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
The rtas_error_log structure is marked packed, which strongly suggests its
precise layout is important to match an external interface. Along with
that one could expect it to have a fixed endianness to match the same
interface. That used to be the case - matching the layout of PAPR RTAS
event format and requiring BE fields.
Now, however, it's only used embedded within sPAPREventLogEntry with the
fields in native order, since they're processed internally.
Clear that up by removing the nested structure in sPAPREventLogEntry.
struct rtas_error_log is moved back to spapr_events.c where it is used as
a temporary to help convert the fields in sPAPREventLogEntry to the correct
in memory format when delivering an event to the guest.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
In racing situations between hotplug events and migration operation,
a rtas hotplug event could have not yet be delivered to the source
guest when migration is started. In this case the pending_events of
spapr state need be transmitted to the target so that the hotplug
event can be finished on the target.
To achieve the minimal VMSD possible to migrate the pending_events list,
this patch makes the changes in spapr_events.c:
- 'log_type' of sPAPREventLogEntry struct deleted. This information can be
derived by inspecting the rtas_error_log summary field. A new function
called 'spapr_event_log_entry_type' was added to retrieve the type of
a given sPAPREventLogEntry.
- sPAPREventLogEntry, epow_log_full and hp_log_full were redesigned. The
only data we're going to migrate in the VMSD is the event log data itself,
which can be divided in two parts: a rtas_error_log header and an extended
event log field. The rtas_error_log header contains information about the
size of the extended log field, which can be used inside VMSD as the size
parameter of the VBUFFER_ALOC field that will store it. To allow this use,
the header.extended_length field must be exposed inline to the VMSD instead
of embedded into a 'data' field that holds everything. With this in mind,
the following changes were done:
* a new 'header' field was added to sPAPREventLogEntry. This field holds a
a struct rtas_error_log inline.
* the declaration of the 'rtas_error_log' struct was moved to spapr.h
to be visible to the VMSD macros.
* 'data' field of sPAPREventLogEntry was renamed to 'extended_log' and
now holds only the contents of the extended event log.
* 'struct rtas_error_log hdr' were taken away from both epow_log_full
and hp_log_full. This information is now available at the header field of
sPAPREventLogEntry.
* epow_log_full and hp_log_full were renamed to epow_extended_log and
hp_extended_log respectively. This rename makes it clearer to understand
the new purpose of both structures: hold the information of an extended
event log field.
* spapr_powerdown_req and spapr_hotplug_req_event now creates a
sPAPREventLogEntry structure that contains the full rtas log entry.
* rtas_event_log_queue and rtas_event_log_dequeue now receives a
sPAPREventLogEntry pointer as a parameter instead of a void pointer.
- the endianess of the sPAPREventLogEntry header is now native instead
of be32. We can use the fields in native endianess internally and write
them in be32 in the guest physical memory inside 'check_exception'. This
allows the VMSD inside spapr.c to read the correct size of the
entended_log field.
- inside spapr.c, pending_events is put in a subsection in the spapr state
VMSD to make sure migration across different versions is not broken.
A small change in rtas_event_log_queue and rtas_event_log_dequeue were also
made: instead of calling qdev_get_machine(), both functions now receive
a pointer to the sPAPRMachineState. This pointer is already available in
the callers of these functions and we don't need to waste resources
calling qdev() again.
Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This finishes QOM'fication of IOMMUMemoryRegion by introducing
a IOMMUMemoryRegionClass. This also provides a fastpath analog for
IOMMU_MEMORY_REGION_GET_CLASS().
This makes IOMMUMemoryRegion an abstract class.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170711035620.4232-3-aik@ozlabs.ru>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This defines new QOM object - IOMMUMemoryRegion - with MemoryRegion
as a parent.
This moves IOMMU-related fields from MR to IOMMU MR. However to avoid
dymanic QOM casting in fast path (address_space_translate, etc),
this adds an @is_iommu boolean flag to MR and provides new helper to
do simple cast to IOMMU MR - memory_region_get_iommu. The flag
is set in the instance init callback. This defines
memory_region_is_iommu as memory_region_get_iommu()!=NULL.
This switches MemoryRegion to IOMMUMemoryRegion in most places except
the ones where MemoryRegion may be an alias.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20170711035620.4232-2-aik@ozlabs.ru>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
On POWER9, the Client Architecture Support (CAS) negotiation process
determines whether the guest operates in XIVE Legacy compatibility
(the former POWER8 interrupt model) or in XIVE exploitation mode (the
newer POWER9 interrupt model).
Bit 7 of Byte 23 of vector 5 is used for this purpose.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
spapr_drc_attach() has a 'coldplug' parameter which sets the DRC into
configured state initially, instead of the usual ISOLATED/UNUSABLE state.
It turns out this is unnecessary: although coldplugged devices do need to
be in CONFIGURED state once the guest starts, that will already be
accomplished by the reset code which will move DRCs for already plugged
devices into a coldplug equivalent state.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
At the moment, spapr_drc_release() has an ugly switch on the DRC type to
call the right, device-specific release function. This cleans it up by
doing that via a proper QOM method.
It's still arguably an abstraction violation for the DRC code to call into
the specific device code, but one mess at a time.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
We have more of these since the addition of KVMPPC_H_LOGICAL_MEMOP in 2012.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Since commit ff9006ddbf ("spapr: move spapr_core_[foo]plug() callbacks
close to machine code in spapr.c"), this function doesn't need to be extern
anymore.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
There are substantial differences in the various paths through
set_isolation_state(), both for setting to ISOLATED versus UNISOLATED
state and for logical versus physical DRCs.
So, split the set_isolation_state() method into isolate() and unisolate()
methods, and give it different implementations for the two DRC types.
Factor some minimal common checks, including for valid indicator values
(which we weren't previously checking) into rtas_set_isolation_state().
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
The allocation-state indicator should only actually be implemented for
"logical" DRCs, not physical ones. Factor a check for this, and also for
valid indicator state values into rtas_set_allocation_state(). Because
they don't exist for physical DRCs, there's no reason that we'd ever want
more than one method implementation, so it can just be a plain function.
In addition, the setting to USABLE and setting to UNUSABLE paths in
set_allocation_state() don't actually have much in common. So, split the
method separate functions for each parameter value (drc_set_usable()
and drc_set_unusable()).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
The 'signalled' field in the DRC appears to be entirely a torturous
workaround for the fact that PCI devices were started in UNISOLATED state
for unclear reasons.
1) 'signalled' is already meaningless for logical (so far, all non PCI)
DRCs. It's always set to true (at least at any point it might be tested),
and can't be assigned any real meaning due to the way signalling works for
logical DRCs.
2) For PCI DRCs, the only time signalled would be false is when non-zero
functions of a multifunction device are hotplugged, followed by function
zero (the other way around is explicitly not permitted). In that case the
secondary function DRCs are attached, but the notification isn't sent to
the guest until function 0 is plugged.
3) signalled being false is used to allow a DRC detach to switch mode
back to ISOLATED state, which allows a secondary function to be hotplugged
then unplugged with function 0 never inserted. Without this a secondary
function starting in UNISOLATED state couldn't be detached again without
function 0 being inserted, all the functions configured by the guest, then
sent back to ISOLATED state.
4) But now that PCI DRCs start in ISOLATED state, there's nothing to be
done. If the guest doesn't get the notification, it won't switch the
device to UNISOLATED state, so nothing prevents it from being unplugged.
If the guest does move it to UNISOLATED state without the signal (due to
a manual drmgr call, for instance) then it really isn't safe to unplug it.
So, this patch removes the signalled variable and all code related to it.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Commit 5bc8d26de2 ("spapr: allocate the ICPState object from under
sPAPRCPUCore") moved ICPState objects from the machine to CPU cores.
This is an improvement since we no longer allocate ICPState objects
that will never be used. But it has the side-effect of breaking
migration of older machine types from older QEMU versions.
This patch allows spapr to register dummy "icp/server" entries to vmstate.
These entries use a dedicated VMStateDescription that can swallow and
discard state of an incoming migration stream, and that don't send anything
on outgoing migration.
As for real ICPState objects, the instance_id is the cpu_index of the
corresponding vCPU, which happens to be equal to the generated instance_id
of older machine types.
The machine can unregister/register these entries when CPUs are dynamically
plugged/unplugged.
This is only available for pseries-2.9 and older machines, thanks to a
compat property.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Server class POWER CPUs have a "compat" property, which is used to set the
backwards compatibility mode for the processor. However, this only makes
sense for machine types which don't give the guest access to hypervisor
privilege - otherwise the compatibility level is under the guest's control.
To reflect this, this removes the CPU 'compat' property and instead
creates a 'max-cpu-compat' property on the pseries machine. Strictly
speaking this breaks compatibility, but AFAIK the 'compat' option was
never (directly) used with -device or device_add.
The option was used with -cpu. So, to maintain compatibility, this
patch adds a hack to the cpu option parsing to strip out any compat
options supplied with -cpu and set them on the machine property
instead of the now deprecated cpu property.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Tested-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Greg Kurz <groug@kaod.org>
Tested-by: Andrea Bolognani <abologna@redhat.com>
This reverts commit fe6824d126.
Conflicts hw/ppc/spapr_drc.c, because get_index() has been renamed
spapr_get_index().
This didn't fix the problem. Once the hotplug has been started
some memory is allocated and some structures are allocated.
We don't free it when we ignore the unplug, and we can't because
they can be in use by the kernel.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The cpu_setup() handler is only implemented by xics_kvm, where it really
does a typical "realize" job. Moreover, the realize() handler is called
shortly after cpu_setup(), on the same path.
This patch converts xics_kvm to implement realize() instead of cpu_setup().
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Until recently, spapr used to allocate ICPState objects for the lifetime
of the machine. They would only be associated to vCPUs in xics_cpu_setup()
when plugging a CPU core.
Now that ICPState objects have the same lifecycle as vCPUs, it is
possible to associate them during realization.
This patch hence open-codes xics_cpu_setup() in icp_realize(). The vCPU
is passed as a property. Note that vCPU now needs to be realized first
for the IRQs to be allocated. It also needs to resetted before ICPState
realization in order to synchronize with KVM.
Since ICPState objects are freed when unrealized, xics_cpu_destroy() isn't
needed anymore and can be safely dropped.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
It makes more sense to pass an IPCState * to handlers of ICPStateClass
instead of a DeviceState *, if only to benefit from compile time type
checking. The same goes with ICSStateClass.
While here, we also change the declaration of ICPStateClass in xics.h
for consistency.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
These properties are part of the XICS API. They deserve to appear
explicitely in the XICS header file.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Taking into account that qemu_set_irq() returns immediatly if its first
argument is NULL, icp_kvm_reset() largely duplicates icp_reset().
This patch introduces a reset() handler, so that the common logic can
be implemented in icp_reset() only.
While there we can also drop icp_kvm_realize() and icp_kvm_unrealize(). This
causes icp-kvm to be realized in icp_realize(), which sets icp->xics, but
it has no impact.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
DRC objects have a get_name method which returns the DRC name generated
when the DRC is created. Replace that with a fixed spapr_drc_name()
function which generates the name on the fly from other information. This
means:
* We get rid of a method with only one implementation, and only local
callers
* We don't have to carry the name string around for the lifetime of the
DRC
* We use information added to the class structure to generate the name
in standard format, so we don't need an explicit switch on drc type
any more
We also eliminate the 'name' property; it's basically useless since the
only information in it can easily be deduced from other things.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>
DRC objects have attach & detach methods, but there's only one
implementation. Although there are some differences in its behaviour for
different DRC types, the overall structure is the same, so while we might
want different method implementations for some parts, we're unlikely to
want them for the top-level functions.
So, replace them with direct function calls.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>
There are 3 types of "indicator" associated with hotplug in the PAPR spec
the "allocation state", "isolation state" and "DR-indicator". The first
two are intimately tied to the various state transitions associated with
hotplug. The DR-indicator, however, is different and simpler.
It's basically just a guest controlled variable which can be used by the
guest to flag state or problems associated with a device. The idea is that
the hypervisor can use it to present information back on management
consoles (on some machines with PowerVM it may even control physical LEDs
on the machine case associated with the relevant device).
For that reason, there's only ever likely to be a single update
implementation so the set_indicator_state method isn't useful. Replace it
with a direct function call.
While we're there, make some small associated cleanups:
* PAPR doesn't use the term "indicator state", just "DR-indicator" and
the allocation state and isolation state are also considered "indicators".
Rename things to be less confusing
* Fold set_indicator_state() and rtas_set_indicator_state() into a single
rtas_set_dr_indicator() function.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>
DRC classes have an entity_sense method to determine (in a specific PAPR
sense) the presence or absence of a device plugged into a DRC. However,
we only have one implementation of the method, which explicitly tests for
different DRC types. This changes it to instead have different method
implementations for the two cases: "logical" and "physical" DRCs.
While we're at it, the entity sense method always returns RTAS_OUT_SUCCESS,
and the interesting value is returned via pass-by-reference. Simplify this
to directly return the value we care about
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>
This function was used in generating the device tree. However, now that
we have different QOM types for different DRC types we can easily store
the information we need in the class structure and avoid this specialized
lookup function.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Currently the sPAPRMachineState contains a list of sPAPRConfigureConnector
structures which store intermediate state for the ibm,configure-connector
RTAS call.
This was an attempt to separate this state from the core of the DRC state.
However the configure connector process is intimately tied to the DRC
model, so there's really no point trying to have two levels of interface
here.
Moving the configure-connector state into its corresponding DRC allows
removal of a number of helpers for maintaining the anciliary list.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>
* Change names to something less ludicrously verbose
* Now that we have QOM subclasses for the different DRC types, use a QOM
typename instead of a PAPR type value parameter
The latter allows removal of the get_type_shift() helper.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Currently we only have a single QOM type for all DRCs, but lots of
places where we switch behaviour based on the DRC's PAPR defined type.
This is a poor use of our existing type system.
So, instead create QOM subclasses for each PAPR defined DRC type. We
also introduce intermediate subclasses for physical and logical DRCs,
a division which will be useful later on.
Instead of being stored in the DRC object itself, the PAPR type is now
stored in the class structure. There are still many places where we
switch directly on the PAPR type value, but this at least provides the
basis to start to remove those.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>
These two methods only have one implementation, and the spec they're
implementing means any other implementation is unlikely, verging on
impossible.
So replace them with simple functions.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
DRConnectorClass has a set_configured method, however:
* There is only one implementation, and only ever likely to be one
* There's exactly one caller, and that's (now) local
* The implementation is very straightforward
So abolish the method entirely, and just open-code what we need.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
The DRConnectorClass includes a get_fdt method. However
* There's only one implementation, and there's only likely to ever be one
* Both callers are local to spapr_drc
* Each caller only uses one half of the actual implementation
So abolish get_fdt() entirely, and just open-code what we need.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
The pointer drc->detach_cb is being used as a way of informing
the detach() function inside spapr_drc.c which cb to execute. This
information can also be retrieved simply by checking drc->type and
choosing the right callback based on it. In this context, detach_cb
is redundant information that must be managed.
After the previous spapr_lmb_release change, no detach_cb_opaques
are being used by any of the three callbacks functions. This is
yet another information that is now unused and, on top of that, can't
be migrated either.
This patch makes the following changes:
- removal of detach_cb_opaque. the 'opaque' argument was removed from
the callbacks and from the detach() function of sPAPRConnectorClass. The
attribute detach_cb_opaque of sPAPRConnector was removed.
- removal of detach_cb from the detach() call. The function pointer
detach_cb of sPAPRConnector was removed. detach() now uses a
switch(drc->type) to execute the apropriate callback. To achieve this,
spapr_core_release, spapr_lmb_release and spapr_phb_remove_pci_device_cb
callbacks were made public to be visible inside detach().
Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>