This pull requests supersedes the one from 2017-07-14. That one had a
couple of subtle regressions: there was a build error for mingw32, and
an instance_size which was theoretically wrong everywhere, but only
actually bit on the Travis OSX build.
There are two major batches in this set, rather than the usual
collection of assorted fixes.
* More DRC cleanup. This gets the state management into a state
which should fix many of the hotplug+migration problems we've
had. Plus it gets the migration stream format into something
well defined and pretty minimal which we can reasonably support
into the future.
* Hashed Page Table resizing. It's been a while since this was
posted, but it's been through several previous rounds of review.
The kernel parts (both guest and host) are merged in 4.11, so
this is the only remaining piece left to allow resizing of the
HPT in a running guest.
There are also a handful of unrelated fixes.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAllsWwQACgkQbDjKyiDZ
s5LMnA//dpoqWrTPiEmx2DsXMkjLefn/2Yl1dkQDzhyb7v+tNGFYmxpbb7nPRfJE
tfvcKu1Tz23NPOp6+1VC9eTyTO1YOXTgvQrNSbF1MmIg4PGN6s2DHrLviAqCS15M
29x6+RdRaeLUSCsk8elsViiWb8h7cISDuN0SMA0WWjWP3bO/drz5nq5z5dRgdVFe
Z5O0qwDNoN0NypJ68Cld+riP1uDAYMONPxA0QOWCLx8qowoJ3hYMuyNnqBQU5OJn
PpAA3EfdxkN6rtaBjDt7xHkJfm9Xkm9SsT8qTcj/R2JjkENef8EbzrdjFE+pSVz0
7c9C4evgYgmhUCUFvnZfgN+VBL1lS/p5UGnFPyNQ7KbSXDE71OAgWH/f/7kzsJPy
MxbJWM6eUN9Ny0APxM8olLV1FM4GzEoCSLfDVhStrdJ6P5wBmjLSugqSOLB8aMtd
8NwBY06nTpmo9xXGz9enLUWlpSeoReKU3TxvQvY+JcOWWpasDZOO4zD8B3bdLbA/
I8jdkH5Vs0pyPLaWD+1FxlQvlF45CuwpwoiAz00V2XkkMu8jKCGsQ0iuqXorSqvs
/7tQ1pHlUybAX+5W9raaJmphgc4gk33P3PlQCjhgYzxRu4yzRsEzS9hahoO/TAmq
Y70CooZaaeGNOBEDcKLZEzJdBr52cqW4MM8t1xHWTg3VCHJGeYI=
=O6NQ
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.10-20170717' into staging
ppc patch queue 2017-07-17
This pull requests supersedes the one from 2017-07-14. That one had a
couple of subtle regressions: there was a build error for mingw32, and
an instance_size which was theoretically wrong everywhere, but only
actually bit on the Travis OSX build.
There are two major batches in this set, rather than the usual
collection of assorted fixes.
* More DRC cleanup. This gets the state management into a state
which should fix many of the hotplug+migration problems we've
had. Plus it gets the migration stream format into something
well defined and pretty minimal which we can reasonably support
into the future.
* Hashed Page Table resizing. It's been a while since this was
posted, but it's been through several previous rounds of review.
The kernel parts (both guest and host) are merged in 4.11, so
this is the only remaining piece left to allow resizing of the
HPT in a running guest.
There are also a handful of unrelated fixes.
# gpg: Signature made Mon 17 Jul 2017 07:36:52 BST
# gpg: using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>"
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dgibson/tags/ppc-for-2.10-20170717: (21 commits)
target/ppc: fix CPU hotplug when radix is enabled (TCG)
spapr: fix memory leak in spapr_core_pre_plug()
pseries: Allow HPT resizing with KVM
pseries: Use smaller default hash page tables when guest can resize
pseries: Enable HPT resizing for 2.10
pseries: Implement HPT resizing
pseries: Stubs for HPT resizing
ppc/pnv: Remove unused XICSState reference
spapr: fix potential memory leak in spapr_core_plug()
spapr: Implement DR-indicator for physical DRCs only
spapr: Remove sPAPRConfigureConnectorState sub-structure
spapr: Consolidate DRC state variables
spapr: Cleanups relating to DRC awaiting_release field
spapr: Refactor spapr_drc_detach()
spapr: Abort on delete failure in spapr_drc_release()
spapr: Simplify unplug path
spapr: Remove 'awaiting_allocation' DRC flag
spapr: Treat devices added before inbound migration as coldplugged
spapr: Minor cleanups to events handling
spapr: migrate pending_events of spapr state
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
If the cursor resource id isn't set the guest didn't define a cursor.
Skip the cursor update in post_load in that that case.
Reported-by: wanghaibin <wanghaibin.wang@huawei.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Tested-by: wanghaibin <wanghaibin.wang@huawei.com>
Message-id: 20170710070432.856-1-kraxel@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
In case of error, we must ensure the dynamically allocated base_core_type
is freed, like it is done everywhere else in this function.
This is a regression introduced in QEMU 2.9 by commit 8149e2992f.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
So far, qemu implements the PAPR Hash Page Table (HPT) resizing extension
with TCG. The same implementation will work with KVM PR, but we don't
currently allow that. For KVM HV we can only implement resizing with the
assistance of the host kernel, which needs a new capability and ioctl()s.
This patch adds support for testing the new KVM capability and implementing
the resize in terms of KVM facilities when necessary. If we're running on
a kernel which doesn't have the new capability flag at all, we fall back to
testing for PR vs. HV KVM using the same hack that we already use in a
number of places for older kernels.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We've now implemented a PAPR extension allowing PAPR guest to resize
their hash page table (HPT) during runtime.
This patch makes use of that facility to allocate smaller HPTs by default.
Specifically when a guest is aware of the HPT resize facility, qemu sizes
the HPT to the initial memory size, rather than the maximum memory size on
the assumption that the guest will resize its HPT if necessary for hot
plugged memory.
When the initial memory size is much smaller than the maximum memory size
(a common configuration with e.g. oVirt / RHEV) then this can save
significant memory on the HPT.
If the guest does *not* advertise HPT resize awareness when it makes the
ibm,client-architecture-support call, qemu resizes the HPT for maxmimum
memory size (unless it's been configured not to allow such guests at all).
For now we make that reallocation assuming the guest has not yet used the
HPT at all. That's true in practice, but not, strictly, an architectural
or PAPR requirement. If we need to in future we can fix this by having
the client-architecture-support call reboot the guest with the revised
HPT size (the client-architecture-support call is explicitly permitted to
trigger a reboot in this way).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
We've now implemented a PAPR extensions which allows PAPR guests (i.e.
"pseries" machine type) to resize their hash page table during runtime.
However, that extension is only enabled if explicitly chosen on the
command line. This patch enables it by default for spapr-2.10, but leaves
it disabled (by default) for older machine types.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
This patch implements hypercalls allowing a PAPR guest to resize its own
hash page table. This will eventually allow for more flexible memory
hotplug.
The implementation is partially asynchronous, handled in a special thread
running the hpt_prepare_thread() function. The state of a pending resize
is stored in SPAPR_MACHINE->pending_hpt.
The H_RESIZE_HPT_PREPARE hypercall will kick off creation of a new HPT, or,
if one is already in progress, monitor it for completion. If there is an
existing HPT resize in progress that doesn't match the size specified in
the call, it will cancel it, replacing it with a new one matching the
given size.
The H_RESIZE_HPT_COMMIT completes transition to a resized HPT, and can only
be called successfully once H_RESIZE_HPT_PREPARE has successfully
completed initialization of a new HPT. The guest must ensure that there
are no concurrent accesses to the existing HPT while this is called (this
effectively means stop_machine() for Linux guests).
For now H_RESIZE_HPT_COMMIT goes through the whole old HPT, rehashing each
HPTE into the new HPT. This can have quite high latency, but it seems to
be of the order of typical migration downtime latencies for HPTs of size
up to ~2GiB (which would be used in a 256GiB guest).
In future we probably want to move more of the rehashing to the "prepare"
phase, by having H_ENTER and other hcalls update both current and
pending HPTs. That's a project for another day, but should be possible
without any changes to the guest interface.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This introduces stub implementations of the H_RESIZE_HPT_PREPARE and
H_RESIZE_HPT_COMMIT hypercalls which we hope to add in a PAPR
extension to allow run time resizing of a guest's hash page table. It
also adds a new machine property for controlling whether this new
facility is available.
For now we only allow resizing with TCG, allowing it with KVM will require
kernel changes as well.
Finally, it adds a new string to the hypertas property in the device
tree, advertising to the guest the availability of the HPT resizing
hypercalls. This is a tentative suggested value, and would need to be
standardized by PAPR before being merged.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Since commit 5c1da81215 ("spapr: Remove unnecessary differences between
hotplug and coldplug paths"), the CPU DT for the DRC is always allocated.
This causes a memory leak for pseries-2.6 and older machine types, that
don't support CPU hotplug and don't allocate DRCs for CPUs.
Reported-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
According to PAPR, the DR-indicator should only be valid for physical DRCs,
not logical DRCs. At the moment we implement it for all DRCs, so restrict
it to physical ones only.
We move the state to the physical DRC subclass, which means adding some
QOM boilerplate to handle the newly distinct type.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
Most of the time, the state of a DRC object is contained in the single
'state' variable. However, during the transition from UNISOLATE to
CONFIGURED state requires multiple calls to the ibm,configure-connector
RTAS call to retrieve the device tree for the attached device. We need
some extra state to keep track of where we're up to in delivering the
device tree information to the guest.
Currently that extra state is in a sPAPRConfigureConnectorState
substructure which is only allocated when we're in the middle of the
configure connector process. That sounds like a good idea, but the extra
state is only two integers - on many platforms that will take up the same
room as the (maybe NULL) ccs pointer even before malloc() overhead. Plus
it's another object whose lifetime we need to manage. In short, it's not
worth it.
So, fold the sPAPRConfigureConnectorState substructure directly into the
DRC object.
Previously the structure was allocated lazily when the configure-connector
call discovers it's not there. Now, we need to initialize the subfields
pre-emptively, as soon as we enter UNISOLATE state.
Although it's not strictly necessary (the field values should only ever
be consulted when in UNISOLATE state), we try to keep them at -1 when in
other states, as a debugging aid.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
Each DRC has three fields describing its state: isolation_state,
allocation_state and configured. At first this seems like a reasonable
representation, since its based directly on the PAPR defined
isolation-state and allocation-state indicators. However:
* Only a few combinations of the two fields' values are permitted
* allocation_state isn't used at all for physical DRCs
* The indicators are write only so they don't really have a well
defined current value independent of each other
This replaces these variables with a single state variable, whose names
and numbers are based on the diagram in LoPAPR section 13.4. Along with
this we add code to check the current state on various operations and make
sure the requested transition is permitted.
Strictly speaking, this makes guest visible changes to behaviour (since we
probably allowed some transitions we shouldn't have before). However, a
hypothetical guest broken by that wasn't PAPR compliant, and probably
wouldn't have worked under PowerVM.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
'awaiting_release' indicates that the host has requested an unplug of the
device attached to the DRC, but the guest has not (yet) put the device
into a state where it is safe to complete removal.
1. Rename it to 'unplug_requested' which to me at least is clearer
2. Remove the ->release_pending() method used to check this from outside
spapr_drc.c. The method only plausibly has one implementation, so use
a plain function (spapr_drc_unplug_requested()) instead.
3. Remove it from the migration stream. Attempting to migrate mid-unplug
is broken not just for spapr - in general management has no good way to
determine if the device should be present on the destination or not. So,
until that's fixed, there's no point adding extra things to the stream.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
This function has two unused parameters - remove them.
It also sets awaiting_release on all paths, except one. On that path
setting it is harmless, since it will be immediately cleared by
spapr_drc_release(). So factor it out of the if statements.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
We currently ignore errors from the object_property_del() in
spapr_drc_release(). But the only way that could fail is if the property
doesn't exist, in which case it's a bug that we're in spapr_drc_release()
at all. So change from ignoring to abort()ing on errors.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
spapr_lmb_release() and spapr_core_release() call hotplug_handler_unplug()
which after a bunch of indirection calls spapr_memory_unplug() or
spapr_core_unplug(). But we already know which is the appropriate thing
to call here, so we can just fold it directly into the release function.
Once that's done, there's no need for an hc->unplug method in the spapr
machine at all: since we also have an hc->unplug_request method, the
hotplug core will never use ->unplug.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
The awaiting_allocation flag in the DRC was introduced by aab9913
"spapr_drc: Prevent detach racing against attach for CPU DR", allegedly to
prevent a guest crash on racing attach and detach. Except.. information
from the BZ actually suggests a qemu crash, not a guest crash. And there
shouldn't be a problem here anyway: if the guest has already moved the DRC
away from UNUSABLE state, the detach would already be deferred, and if it
hadn't it should be safe to detach it (the guest should fail gracefully
when it attempts to change the allocation state).
I think this was probably just a bandaid for some other problem in the
state management. So, remove awaiting_allocation and associated code.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Greg Kurz <groug@kaod.org>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
When migrating a guest which has already had devices hotplugged,
libvirt typically starts the destination qemu with -incoming defer,
adds those hotplugged devices with qmp, then initiates the incoming
migration.
This causes problems for the management of spapr DRC state. Because
the device is treated as hotplugged, it goes into a DRC state for a
device immediately after it's plugged, but before the guest has
acknowledged its presence. However, chances are the guest on the
source machine *has* acknowledged the device's presence and configured
it.
If the source has fully configured the device, then DRC state won't be
sent in the migration stream: for maximum migration compatibility with
earlier versions we don't migrate DRCs in coldplug-equivalent state.
That means that the DRC effectively changes state over the migrate,
causing problems later on.
In addition, logging hotplug events for these devices isn't what we
want because a) those events should already have been issued on the
source host and b) the event queue should get wiped out by the
incoming state anyway.
In short, what we really want is to treat devices added before an
incoming migration as if they were coldplugged.
To do this, we first add a spapr_drc_hotplugged() helper which
determines if the device is hotplugged in the sense relevant for DRC
state management. We only send hotplug events when this is true.
Second, when we add a device which isn't hotplugged in this sense, we
force a reset of the DRC state - this ensures the DRC is in a
coldplug-equivalent state (there isn't usually a system reset between
these device adds and the incoming migration).
This is based on an earlier patch by Laurent Vivier, cleaned up and
extended.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
The rtas_error_log structure is marked packed, which strongly suggests its
precise layout is important to match an external interface. Along with
that one could expect it to have a fixed endianness to match the same
interface. That used to be the case - matching the layout of PAPR RTAS
event format and requiring BE fields.
Now, however, it's only used embedded within sPAPREventLogEntry with the
fields in native order, since they're processed internally.
Clear that up by removing the nested structure in sPAPREventLogEntry.
struct rtas_error_log is moved back to spapr_events.c where it is used as
a temporary to help convert the fields in sPAPREventLogEntry to the correct
in memory format when delivering an event to the guest.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
In racing situations between hotplug events and migration operation,
a rtas hotplug event could have not yet be delivered to the source
guest when migration is started. In this case the pending_events of
spapr state need be transmitted to the target so that the hotplug
event can be finished on the target.
To achieve the minimal VMSD possible to migrate the pending_events list,
this patch makes the changes in spapr_events.c:
- 'log_type' of sPAPREventLogEntry struct deleted. This information can be
derived by inspecting the rtas_error_log summary field. A new function
called 'spapr_event_log_entry_type' was added to retrieve the type of
a given sPAPREventLogEntry.
- sPAPREventLogEntry, epow_log_full and hp_log_full were redesigned. The
only data we're going to migrate in the VMSD is the event log data itself,
which can be divided in two parts: a rtas_error_log header and an extended
event log field. The rtas_error_log header contains information about the
size of the extended log field, which can be used inside VMSD as the size
parameter of the VBUFFER_ALOC field that will store it. To allow this use,
the header.extended_length field must be exposed inline to the VMSD instead
of embedded into a 'data' field that holds everything. With this in mind,
the following changes were done:
* a new 'header' field was added to sPAPREventLogEntry. This field holds a
a struct rtas_error_log inline.
* the declaration of the 'rtas_error_log' struct was moved to spapr.h
to be visible to the VMSD macros.
* 'data' field of sPAPREventLogEntry was renamed to 'extended_log' and
now holds only the contents of the extended event log.
* 'struct rtas_error_log hdr' were taken away from both epow_log_full
and hp_log_full. This information is now available at the header field of
sPAPREventLogEntry.
* epow_log_full and hp_log_full were renamed to epow_extended_log and
hp_extended_log respectively. This rename makes it clearer to understand
the new purpose of both structures: hold the information of an extended
event log field.
* spapr_powerdown_req and spapr_hotplug_req_event now creates a
sPAPREventLogEntry structure that contains the full rtas log entry.
* rtas_event_log_queue and rtas_event_log_dequeue now receives a
sPAPREventLogEntry pointer as a parameter instead of a void pointer.
- the endianess of the sPAPREventLogEntry header is now native instead
of be32. We can use the fields in native endianess internally and write
them in be32 in the guest physical memory inside 'check_exception'. This
allows the VMSD inside spapr.c to read the correct size of the
entended_log field.
- inside spapr.c, pending_events is put in a subsection in the spapr state
VMSD to make sure migration across different versions is not broken.
A small change in rtas_event_log_queue and rtas_event_log_dequeue were also
made: instead of calling qdev_get_machine(), both functions now receive
a pointer to the sPAPRMachineState. This pointer is already available in
the callers of these functions and we don't need to waste resources
calling qdev() again.
Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
All the DRC subtypes explicitly list instance_size in TypeInfo (all as
sizeof(sPAPRDRConnector). This isn't necessary, since if it's not listed
it will be derived from the parent type.
Worse, this is dangerous, because if a subtype is changed in future to
have a larger structure, then subtypes of that subtype also need to have
instance_size changed, or it will lead to hard to track memory corruption
bugs.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Switch to memory_region_init_ram(), since we pass the same DeviceState
to both memory_region_init_ram_nomigrate() and vmstate_register_ram().
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1499438577-7674-11-git-send-email-peter.maydell@linaro.org
Since we pass the same DeviceState object to
memory_region_init_rom_nomigrate() and vmstate_register_ram(), we can
switch to using memory_region_init_rom() instead.
(This isn't entirely obvious from the code since it is using
&pdev->qdev rather than DEVICE(pdov) for some reason, but
PCIDevice does indeed use 'qdev' for its parent DeviceState member.)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1499438577-7674-10-git-send-email-peter.maydell@linaro.org
Since we pass the same DeviceState object to
memory_region_init_rom_device_nomigrate() and vmstate_register_ram(),
we can switch to using memory_region_init_rom_device() instead.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1499438577-7674-9-git-send-email-peter.maydell@linaro.org
Use the new functions memory_region_init_{ram,rom,rom_device}()
instead of manually calling the _nomigrate() version and then
vmstate_register_ram_global().
Patch automatically created using coccinelle script:
spatch --in-place -sp_file scripts/coccinelle/memory-region-init-ram.cocci -dir hw
(As it turns out, there are no instances of the rom and
rom_device functions that are caught by this script.)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1499438577-7674-8-git-send-email-peter.maydell@linaro.org
Rename memory_region_init_rom() to memory_region_init_rom_nomigrate()
and memory_region_init_rom_device() to
memory_region_init_rom_device_nomigrate().
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1499438577-7674-5-git-send-email-peter.maydell@linaro.org
Rename memory_region_init_ram() to memory_region_init_ram_nomigrate().
This leaves the way clear for us to provide a memory_region_init_ram()
which does handle migration.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1499438577-7674-4-git-send-email-peter.maydell@linaro.org
- add a network boot rom for s390 (Thomas Huth)
- migration of storage attributes like the CMMA used/unused state
- PCI related enhancements - full support for aen, ais and zpci
- migration support for css with vmstates (Halil Pasic)
- cpu model enhancements for cpu features
- guarded storage support
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)
iQIcBAABAgAGBQJZaJ3gAAoJEBF7vIC1phx8VSAP/1zKh7ti4Y2dIVb94c1tvECE
LRNdCdAPhEqL6zybty85aG04sjAmSu50NGfo5t8AGq1U9WBWrCy7/wWSFdK2GI63
Umc1fR7aBF9FiFayKONhExaREh6gSWVHZF1RyaPIWnnjRIeX8nqgPEnpdZNiVVrG
5cKHV2SUd5pMDJUiQdZGZgbgG1c+MWJx2BHoduM+K0UnmFjpyLCL4Rq58Q2Q87Nj
/+yPSVApFFeMsDpem6DNttE6Msa+V+K+EmRhRKqZNOWrdRKH5vvj6Fl/LSxVtd9c
CEG+aZGjFd693uP9ge0WmjeUJtVHIGt9xKdeU0d7FijZWehjsIqalLoqapzK8ddF
h6HJuNsmk/SZF7O9JsbHT3Epyr+7Hk0dx78Ku1GNQuUxtFL93eyIJmRdgz7Zo3Lj
ZTPJvCA13GjPWtgzG5dG3JH1hiAS+Yai18BgdzGbs+qfMCwPdbWkoqg7sARwAJNe
50fo/ayJvcmHJnSNO6hErFoU38WctGgO8fWp+oVvD8Um1ny1aBFFuJgJIMf47nhu
x1IdA6UGrNN0yNC4/UgyYBDV1hfvo/phMdoHqle9AcMmPYOD1DBr0genK/bYbICk
Dio7og9nKgheLRBHz2u5TuYcCsfE/7rtwZX+iXMvoC7VE7Dqs+Q7Zjwwwtwj4x9F
FwWuf/Bv1s6IkVLlP8Ow
=2bOV
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/borntraeger/tags/s390x-20170714' into staging
s390x/kvm/migration/cpumodel: fixes, enhancements and cleanups
- add a network boot rom for s390 (Thomas Huth)
- migration of storage attributes like the CMMA used/unused state
- PCI related enhancements - full support for aen, ais and zpci
- migration support for css with vmstates (Halil Pasic)
- cpu model enhancements for cpu features
- guarded storage support
# gpg: Signature made Fri 14 Jul 2017 11:33:04 BST
# gpg: using RSA key 0x117BBC80B5A61C7C
# gpg: Good signature from "Christian Borntraeger (IBM) <borntraeger@de.ibm.com>"
# Primary key fingerprint: F922 9381 A334 08F9 DBAB FBCA 117B BC80 B5A6 1C7C
* remotes/borntraeger/tags/s390x-20170714: (40 commits)
s390x/gdb: add gs registers
s390x/arch_dump: also dump guarded storage control block
s390x/kvm: enable guarded storage
s390x/kvm: Enable KSS facility for nested virtualization
s390x/cpumodel: add esop/esop2 to z12 model
s390x/cpumodel: we are always in zarchitecture mode
s390x/cpumodel: wire up new hardware features
s390x/flic: migrate ais states
s390x/cpumodel: add zpci, aen and ais facilities
s390x: initialize cpu firstly
pc-bios/s390: rebuild s390-ccw.img
pc-bios/s390: add s390-netboot.img
pc-bios/s390-ccw: Link libnet into the netboot image and do the TFTP load
pc-bios/s390-ccw: Add virtio-net driver code
pc-bios/s390-ccw: Add core files for the network bootloading program
roms/SLOF: Update submodule to latest status
pc-bios/s390-ccw: Add code for virtio feature negotiation
pc-bios/s390-ccw: Remove unused structs from virtio.h
pc-bios/s390-ccw: Move byteswap functions to a separate header
pc-bios/s390-ccw: Add a write() function for stdio
...
Conflicts:
target/s390x/kvm.c
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Introduce guarded storage support for KVM guests on s390.
We need to enable the capability, extend machine check validity,
sigp store-additional-status-at-address, and migration.
The feature is fenced for older machine type versions.
Signed-off-by: Fan Zhang <zhangfan@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Add esop and esop2 features to z12 model where esop2 was originally
introduced. Disable esop and esop2 when using compatibility machine
v2.9 or earlier.
Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
During migration we should transfer ais states to the target guest.
This patch introduces a subsection to kvm_s390_flic_vmstate and new
vmsd for qemu_flic. The ais states need to be migrated only when
ais is supported.
Signed-off-by: Yi Min Zhao <zyimin@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
zPCI instructions and facilities are available since IBM zEnterprise
EC12. To support z/PCI in QEMU we enable zpci, aen and ais facilities
starting with zEC12 GA1. And we always set zpci and aen bits in max cpu
model. Later they might be switched off due to applied real cpu model.
For ais bit, we only provide it in the full cpu model beginning with
zEC12 and defer its enablement in the default cpu model to a later point
in time. At the same time, disable them for 2.9 and older machines.
Because of introducing AIS facility, we could check if it's enabled to
initialize flic->ais_supported with the real value.
Signed-off-by: Yi Min Zhao <zyimin@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
By initializing the CPU firstly, we are able to retrieve and use the
CPU model features when initializing other subsystem or devices.
Signed-off-by: Yi Min Zhao <zyimin@linux.vnet.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Instead of passing around a pointer to ORB let us simplify some
function signatures by using the previously introduced ORB saved at the
subchannel (SubchDev).
Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Message-Id: <20170711145441.33925-7-pasic@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Turn on migration for the channel subsystem for the next machine. For
legacy machines we still have to do things the old way.
Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Message-Id: <20170711145441.33925-6-pasic@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Since we are going to need a migration compatibility breaking change to
activate ChannelSubSys migration let us use the opportunity to introduce
ORB to the SubchDev before that (otherwise we would need separate
handling e.g. a compat property).
The ORB will be useful for implementing IDA, or async handling of
subchannel work.
Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: Guenther Hutzl <hutzl@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Message-Id: <20170711145441.33925-5-pasic@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Although we have recently vmstatified the migration of some css
infrastructure, for some css entities there is still state to be
migrated left, because the focus was keeping migration stream
compatibility (that is basically everything as-is).
Let us add vmstate helpers and extend existing vmstate descriptions so
that we have everything we need. Let us guard the added state via
css_migration_enabled, so we keep the compatible behavior if css
migration is disabled.
Let's also annotate the bits which do not need to be migrated for better
readability.
Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20170711145441.33925-4-pasic@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Currently the migration of the channel subsystem (css) is only partial
and is done by the virtio ccw proxies -- the only migratable css devices
existing at the moment.
With the current work on emulated and passthrough devices we need to
decouple the migration of the channel subsystem state from virtio ccw,
and have a separate section for it. A new section however necessarily
breaks the migration compatibility.
So let us introduce a switch at the machine class, and put it in 'off'
state for now. We will turn the switch 'on' for future machines once all
preparations are met. For compatibility machines the switch will stay
'off'.
Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Message-Id: <20170711145441.33925-3-pasic@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
We will need the machine class at machine initialization time, so the
usual way via qdev won't do. Let's cache the machine class and also use
the default values of the base machine for capability discovery.
Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Message-Id: <20170711145441.33925-2-pasic@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Let's use the new inject_airq callback of flic to inject adapter
interrupts. For kvm case, if the kernel flic doesn't support the new
interface, the irq routine remains unchanged. For non-kvm case,
qemu-flic handles the suppression process.
Signed-off-by: Yi Min Zhao <zyimin@linux.vnet.ibm.com>
Signed-off-by: Fei Li <sherrylf@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Currently, we do nothing for the SIC instruction, but we need to
implement it properly. Let's add proper handling in the backend code.
Co-authored-by: Yi Min Zhao <zyimin@linux.vnet.ibm.com>
Signed-off-by: Yi Min Zhao <zyimin@linux.vnet.ibm.com>
Signed-off-by: Fei Li <sherrylf@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Let's introduce a specialized way to inject adapter interrupts that,
unlike the common interrupt injection method, allows to take the
characteristics of the adapter into account.
For adapters subject to AIS facility:
- for non-kvm case, we handle the suppression for a given ISC in QEMU.
- for kvm case, we pass adapter id to kvm to do airq injection.
Add add tracepoint for suppressed airq and suppressing airq.
Signed-off-by: Yi Min Zhao <zyimin@linux.vnet.ibm.com>
Signed-off-by: Fei Li <sherrylf@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
In order to emulate the adapter interruption suppression (AIS)
facility properly, the guest needs to be able to modify the AIS mask.
Interrupt suppression will be handled via the flic (for kvm, via a
recently introduced kernel backend; for !kvm, in the flic code), so
let's introduce a method to change the mode via the flic interface.
We introduce the 'simm' and 'nimm' fields to QEMUS390FLICState
to store interruption modes for each ISC. Each bit in 'simm' and
'nimm' targets one ISC, and collaboratively indicate three modes:
ALL-Interruptions, SINGLE-Interruption and NO-Interruptions. This
interface can initiate most transitions between the states; transition
from SINGLE-Interruption to NO-Interruptions via adapter interrupt
injection will be introduced in a following patch. The meaningful
combinations are as follows:
interruption mode | simm bit | nimm bit
------------------|----------|----------
ALL | 0 | 0
SINGLE | 1 | 0
NO | 1 | 1
Co-authored-by: Yi Min Zhao <zyimin@linux.vnet.ibm.com>
Signed-off-by: Yi Min Zhao <zyimin@linux.vnet.ibm.com>
Signed-off-by: Fei Li <sherrylf@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>