Commit Graph

294 Commits

Author SHA1 Message Date
Thomas Huth aaf87c6616 ppc/spapr: Use qemu_log_mask() for hcall_dprintf()
To see the output of the hcall_dprintf statements, you currently have
to enable the DEBUG_SPAPR_HCALLS macro in include/hw/ppc/spapr.h.
This is ugly because a) not every user who wants to debug guest
problems can or wants to recompile QEMU to be able to see such issues,
and b) since this macro is disabled by default, the code in the
hcall_dprintf() brackets tends to bitrot until somebody temporarily
enables that macro again.
Since the hcall_dprintf statements except one indicate guest
problems, let's always use qemu_log_mask(LOG_GUEST_ERROR, ...) for
this macro instead. One spot indicated an unimplemented host feature,
so this is changed into qemu_log_mask(LOG_UNIMP, ...) instead. Now
it's possible to see all those messages by simply adding the CLI
parameter "-d guest_errors,unimp", without the need to re-compile
the binary.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-09-23 10:51:09 +10:00
Bharata B Rao a45863bda9 xics_kvm: Don't enable KVM_CAP_IRQ_XICS if already enabled
When supporting CPU hot removal by parking the vCPU fd and reusing
it during hotplug again, there can be cases where we try to reenable
KVM_CAP_IRQ_XICS CAP for the vCPU for which it was already enabled.
Introduce a boolean member in ICPState to track this and don't
reenable the CAP if it was already enabled earlier.

Re-enabling this CAP should ideally work, but currently it results in
kernel trying to create and associate ICP with this vCPU and that
fails since there is already an ICP associated with it. Hence this
patch is needed to work around this problem in the kernel.

This change allows CPU hot removal to work for sPAPR.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-07-07 17:44:52 +02:00
Bharata B Rao db4ef288f4 spapr: Support ibm, lrdr-capacity device tree property
Add support for ibm,lrdr-capacity since this is needed by the guest
kernel to know about the possible hot-pluggable CPUs and Memory. With
this, pseries kernels will start reporting correct maxcpus in
/sys/devices/system/cpu/possible.

Also define the minimum hotpluggable memory size as 256MB.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[agraf: Fix compile error on 32bit hosts]
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-07-07 17:44:51 +02:00
David Gibson 183930c0d7 spapr: Add sPAPRMachineClass
Currently although we have an sPAPRMachineState descended from MachineState
we don't have an sPAPRMAchineClass descended from MachineClass.  So far it
hasn't been needed, but several upcoming features are going to want it,
so this patch creates a stub implementation.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-07-07 17:44:50 +02:00
David Gibson 1b71890729 spapr: Remove obsolete entry_point field from sPAPRMachineState
The sPAPRMachineState structure includes an entry_point field containing
the initial PC value for starting the machine, even though this always has
the value 0x100.

I think this is a hangover from very early versions which bypassed the
firmware when using -kernel.  In any case it has no function now, so remove
it.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-07-07 17:44:50 +02:00
David Gibson fb16499418 spapr: Remove obsolete ram_limit field from sPAPRMachineState
The ram_limit field was imported from sPAPREnvironment where it predates
the machine's ram size being available generically from machine->ram_size.

Worse, the existing code was inconsistent about where it got the ram size
from.  Sometimes it used spapr->ram_limit, sometimes the global 'ram_size'
and sometimes a local 'ram_size' masking the global.

This cleans up the code to consistently use machine->ram_size, eliminating
spapr->ram_limit in the process.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-07-07 17:44:50 +02:00
David Gibson 28e0204254 spapr: Merge sPAPREnvironment into sPAPRMachineState
The code for -machine pseries maintains a global sPAPREnvironment structure
which keeps track of general state information about the guest platform.
This predates the existence of the MachineState structure, but performs
basically the same function.

Now that we have the generic MachineState, fold sPAPREnvironment into
sPAPRMachineState, the pseries specific subclass of MachineState.

This is mostly a matter of search and replace, although a few places which
relied on the global spapr variable are changed to find the structure via
qdev_get_machine().

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-07-07 17:44:50 +02:00
Peter Maydell 2e29dd7c44 -----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
 
 iQIcBAABAgAGBQJVcf+LAAoJEH3vgQaq/DkOMGYP/i/LxItDxCgfi+n6a1TIhgvs
 shp6FkatH50f0b4EoimmMhZJWZBU9GoTgNhULgreTm0fgv9d87ciFkabUZhJLO7e
 Sck+qMJYUpdB6goyVFC3efq7LpuwoGGBhekiQjMBe/a0f64BlYi8tL6hLpN/K3Cu
 i51ASrAD0rJvp58xAX9/Tgsopi8+++FPaOQzpUL81Zz6z6AGACkpepTndgxkt5DA
 +s5KHIod0+pUDVa6wjLxA3cWUPx8nA9emoRwl1eu98bwVPa/DX+TAa5ubLliIHtG
 CGwGZ97W0fHufmZbSCaCYsaHrRVKLBpmGQNorsZVcV/053yW8o5GKFcXMlH797sb
 dzcNvKPkHNlPJdPgU5akGeEtufesSWDqErOtlruoxR5HmBNHVkB4zFFpovAtrfGY
 YbpjrYk87W9OixQWOW6toHG3gE87jZnxj3GAIlWqJh1vSVqWWC2PNZTFqtQPu3lp
 gu3fv3kaa/EbLujmlfq6uD4MFQZlYjJG6JelLydLdimTnr/ad43vABbXGfgaDKAT
 X++CN9tst0hiuj3n/dJcDkox//U+yCbnPRk00H1bjT//YYJULtCeZk7Y0hVaB5mZ
 36PrTAK2FlTZ5caK7KrEtwzsRL4b61mrhzFi7E+Gd31wEiZA30fDu16sX0f+8+ex
 BqwG0voyXrQp1yx+M8cH
 =TBeO
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/jnsnow/tags/ide-pull-request' into staging

# gpg: Signature made Fri Jun  5 20:59:07 2015 BST using RSA key ID AAFC390E
# gpg: Good signature from "John Snow (John Huston) <jsnow@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: FAEB 9711 A12C F475 812F  18F2 88A9 064D 1835 61EB
#      Subkey fingerprint: F9B7 ABDB BCAC DF95 BE76  CBD0 7DEF 8106 AAFC 390E

* remotes/jnsnow/tags/ide-pull-request:
  macio: remove remainder_len DBDMA_io property
  macio: update comment/constants to reflect the new code
  macio: switch pmac_dma_write() over to new offset/len implementation
  macio: switch pmac_dma_read() over to new offset/len implementation
  fdc-test: Test state for existing cases more thoroughly
  fdc: Fix MSR.RQM flag
  fdc: Disentangle phases in fdctrl_read_data()
  fdc: Code cleanup in fdctrl_write_data()
  fdc: Use phase in fdctrl_write_data()
  fdc: Introduce fdctrl->phase
  fdc: Rename fdctrl_set_fifo() to fdctrl_to_result_phase()
  fdc: Rename fdctrl_reset_fifo() to fdctrl_to_command_phase()

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-06-08 14:07:32 +01:00
Mark Cave-Ayland 0ba98885a0 macio: remove remainder_len DBDMA_io property
Since the block alignment code is now effectively independent of the DMA
implementation, this variable is no longer required and can be removed.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 1433455177-21243-5-git-send-email-mark.cave-ayland@ilande.co.uk
Signed-off-by: John Snow <jsnow@redhat.com>
2015-06-04 20:25:39 -04:00
Mark Cave-Ayland ac58fe7b2c macio: switch pmac_dma_write() over to new offset/len implementation
In particular, this fixes a bug whereby chains of overlapping head/tail chains
would incorrectly write over each other's remainder cache. This is the access
pattern used by OS X/Darwin and fixes an issue with a corrupt Darwin
installation in my local tests.

While we are here, rename the DBDMA_io struct property remainder to
head_remainder for clarification.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 1433455177-21243-3-git-send-email-mark.cave-ayland@ilande.co.uk
Signed-off-by: John Snow <jsnow@redhat.com>
2015-06-04 20:25:39 -04:00
Michael Roth e4b798bb53 spapr_drc: add spapr_drc_populate_dt()
This function handles generation of ibm,drc-* array device tree
properties to describe DRC topology to guests. This will by used
by the guest to direct RTAS calls to manage any dynamic resources
we associate with a particular DR Connector as part of
hotplug/unplug.

Since general management of boot-time device trees are handled
outside of sPAPRDRConnector, we insert these values blindly given
an FDT and offset. A mask of sPAPRDRConnector types is given to
instruct us on what types of connectors entries should be generated
for, since descriptions for different connectors may live in
different parts of the device tree.

Based on code originally written by Nathan Fontenot.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-06-03 23:56:54 +02:00
Tyrel Datwyler 79853e18d9 spapr_events: event-scan RTAS interface
We don't actually rely on this interface to surface hotplug events, and
instead rely on the similar-but-interrupt-driven check-exception RTAS
interface used for EPOW events. However, the existence of this interface
is needed to ensure guest kernels initialize the event-reporting
interfaces which will in turn be used by userspace tools to handle these
events, so we implement this interface here.

Since events surfaced by this call are mutually exclusive to those
surfaced via check-exception, we also update the RTAS event queue code
to accept a boolean to mark/filter for events accordingly.

Events of this sort are not currently generated by QEMU, but the interface
has been tested by surfacing hotplug events via event-scan in place
of check-exception.

Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-06-03 23:56:53 +02:00
Nathan Fontenot 31fe14d15d spapr_events: re-use EPOW event infrastructure for hotplug events
This extends the data structures currently used to report EPOW events to
guests via the check-exception RTAS interfaces to also include event types
for hotplug/unplug events.

This is currently undocumented and being finalized for inclusion in PAPR
specification, but we implement this here as an extension for guest
userspace tools to implement (existing guest kernels simply log these
events via a sysfs interface that's read by rtas_errd, and current
versions of rtas_errd/powerpc-utils already support the use of this
mechanism for initiating hotplug operations).

We also add support for queues of pending RTAS events, since in the
case of hotplug there's chance for multiple events being in-flight
at any point in time.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-06-03 23:56:53 +02:00
Michael Roth 46503c2bc0 spapr_rtas: add ibm, configure-connector RTAS interface
This interface is used to fetch an OF device-tree nodes that describes a
newly-attached device to guest. It is called multiple times to walk the
device-tree node and fetch individual properties into a 'workarea'/buffer
provided by the guest.

The device-tree is generated by QEMU and passed to an sPAPRDRConnector during
the initial hotplug operation, and the state of these RTAS calls is tracked by
the sPAPRDRConnector. When the last of these properties is successfully
fetched, we report as special return value to the guest and transition
the device to a 'configured' state on the QEMU/DRC side.

See docs/specs/ppc-spapr-hotplug.txt for a complete description of
this interface.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-06-03 23:56:53 +02:00
Michael Roth ab316865db spapr: add rtas_st_buffer_direct() helper
This is similar to the existing rtas_st_buffer(), but for cases
where the guest is not expecting a length-encoded byte array.
Namely, for calls where a "work area" buffer is used to pass
around arbitrary fields/data.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-06-03 23:56:53 +02:00
Mike Day 8c8639df32 spapr_rtas: add set-indicator RTAS interface
This interface allows a guest to control various platform/device
sensors. Initially, we only implement support necessary to control
sensors that are required for hotplug: DR connector indicators/LEDs,
resource allocation state, and resource isolation state.

See docs/specs/ppc-spapr-hotplug.txt for a complete description of
this interface.

Signed-off-by: Mike Day <ncmike@ncultra.org>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-06-03 23:56:53 +02:00
Michael Roth bbf5c878ab spapr_drc: initial implementation of sPAPRDRConnector device
This device emulates a firmware abstraction used by pSeries guests to
manage hotplug/dynamic-reconfiguration of host-bridges, PCI devices,
memory, and CPUs. It is conceptually similar to an SHPC device,
complete with LED indicators to identify individual slots to physical
physical users and indicate when it is safe to remove a device. In
some cases it is also used to manage virtualized resources, such a
memory, CPUs, and physical-host bridges, which in the case of pSeries
guests are virtualized resources where the physical components are
managed by the host.

Guests communicate with these DR Connectors using RTAS calls,
generally by addressing the unique DRC index associated with a
particular connector for a particular resource. For introspection
purposes we expose this state initially as QOM properties, and
in subsequent patches will introduce the RTAS calls that make use of
it. This constitutes to the 'guest' interface.

On the QEMU side we provide an attach/detach interface to associate
or cleanup a DeviceState with a particular sPAPRDRConnector in
response to hotplug/unplug, respectively. This constitutes the
'physical' interface to the DR Connector.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-06-03 23:56:52 +02:00
Thomas Huth f9ce8e0aa3 hw/ppc/spapr_iommu: Fix the check for invalid upper bits in liobn
The check "liobn & 0xFFFFFFFF00000000ULL" in spapr_tce_find_by_liobn()
is completely useless since liobn is only declared as an uint32_t
parameter. Fix this by using target_ulong instead (this is what most
of the callers of this function are using, too).

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-06-03 23:56:51 +02:00
Alexey Kardashevskiy fae807a2b1 spapr_iommu: Make spapr_tce_find_by_liobn() public
At the moment spapr_tce_find_by_liobn() is used by H_PUT_TCE/...
handlers to find an IOMMU by LIOBN.

We are going to implement Dynamic DMA windows (DDW), new code
will go to a new file and we will use spapr_tce_find_by_liobn()
there too so let's make it public.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-06-03 23:56:51 +02:00
Alexey Kardashevskiy d9d96a3cc7 spapr_iommu: Add separate trace points for PCI DMA operations
This is to reduce VIO noise while debugging PCI DMA.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-06-03 23:56:51 +02:00
Alexey Kardashevskiy 4290ca49ee spapr_vio: Introduce a liobn number generating macros
This introduces a macro which makes up a LIOBN from fixed prefix and
VIO device address (@reg property).

This is to keep LIOBN macros rendering consistent - the same macro for
PCI has been added by the previous patch.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-06-03 23:56:50 +02:00
Alexey Kardashevskiy c8545818b3 spapr_pci: Introduce a liobn number generating macros
We are going to have multiple DMA windows per PHB and we want them to
migrate so we need a predictable way of assigning LIOBNs.

This introduces a macro which makes up a LIOBN from fixed prefix,
PHB index (unique PHB id) and window number.

This introduces a SPAPR_PCI_DMA_WINDOW_NUM() to know the window number
from LIOBN. It is used to distinguish the default 32bit windows from
dynamic windows and avoid picking default DMA window properties from
a wrong TCE table.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-06-03 23:56:50 +02:00
Mark Cave-Ayland bd4214fc92 macio: move unaligned DMA write code into separate pmac_dma_write() function
Similarly switch the macio IDE routines over to use the new function and
tidy-up the remaining code as required.

[Maintainer edit: printf format codes adjusted for 32/64bit. --js]

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Acked-by: John Snow <jsnow@redhat.com>
Message-id: 1425939893-14404-3-git-send-email-mark.cave-ayland@ilande.co.uk
Signed-off-by: John Snow <jsnow@redhat.com>
2015-05-22 15:58:22 -04:00
Gavin Shan ee954280da sPAPR: Implement EEH RTAS calls
The emulation for EEH RTAS requests from guest isn't covered
by QEMU yet and the patch implements them.

The patch defines constants used by EEH RTAS calls and adds
callbacks sPAPRPHBClass::{eeh_set_option, eeh_get_state, eeh_reset,
eeh_configure}, which are going to be used as follows:

  * RTAS calls are received in spapr_pci.c, sanity check is done
    there.
  * RTAS handlers handle what they can. If there is something it
    cannot handle and the corresponding sPAPRPHBClass callback is
    defined, it is called.
  * Those callbacks are only implemented for VFIO now. They do ioctl()
    to the IOMMU container fd to complete the calls. Error codes from
    that ioctl() are transferred back to the guest.

[aik: defined RTAS tokens for EEH RTAS calls]
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-03-09 15:00:08 +01:00
Markus Armbruster 28b07e737e spapr_vio: Convert to realize()
Bonus fix: always set an error on failure.  Some failures were silent
before, except for the generic error set by device_realize().

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-03-09 15:00:07 +01:00
David Gibson eefaccc02b pseries: Switch VGA endian on H_SET_MODE
When the guest switches the interrupt endian mode, which essentially
means a global machine endian switch, we want to change the VGA
framebuffer endian mode as well in order to be backward compatible
with existing guests who don't know about the new endian control
register.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-03-09 15:00:03 +01:00
David Gibson 880ae7de59 pseries: Move rtc_offset into RTC device's state structure
The initial creation of the PAPR RTC qdev class left a wart - the rtc's
offset was left in the sPAPREnvironment structure, accessed via a global.

This patch moves it into the RTC device's own state structure, were it
belongs.  This requires a small change to the migration stream format.  In
order to handle incoming streams from older versions, we also need to
retain the rtc_offset field in the sPAPREnvironment structure, so that it
can be loaded into via the vmsd, then pushed into the RTC device.

Since we're changing the migration format, this also takes the opportunity
to:

  * Change the rtc offset from a value in seconds to a value in
    nanoseconds, allowing nanosecond offsets between host and guest
    rtc time, if desired.

  * Remove both the already unused "next_irq" field and now unused
    "rtc_offset" field from the new version of the spapr migration
    stream

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-03-09 14:59:58 +01:00
David Gibson 28df36a13a pseries: Make the PAPR RTC a qdev device
At present the PAPR RTC isn't a "device" as such - it's accessed only via
firmware/hypervisor calls, and is handled in the sPAPR core code.  This
becomes inconvenient as we extend it in various ways.

This patch makes the PAPR RTC a separate device in the qemu device model.

For now, the only piece of device state - the rtc_offset - is still kept in
the global sPAPREnvironment structure.  That's clearly wrong, but leaving
it to be fixed in a following patch makes for a clearer separation between
the internal re-organization of the device, and the behavioural changes
(because the migration stream format needs to change slightly when the
offset is moved into the device's own state).

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-03-09 14:59:58 +01:00
David Gibson e5dad1d7d1 pseries: Add spapr_rtc_read() helper function
The virtual RTC time is used in two places in the pseries machine.  First
is in the RTAS get-time-of-day function which returns the RTC time to the
guest.  Second is in the spapr events code which is used to timestamp
event messages from the hypervisor to the guest.

Currently both call qemu_get_timedate() directly, but we want to change
that so we can properly handle the various -rtc options.  In preparation,
create a helper function to return the virtual RTC time.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-03-09 14:59:57 +01:00
David Gibson 12f421745c pseries: Move sPAPR RTC code into its own file
At the moment the RTAS (firmware/hypervisor) time of day functions are
implemented in spapr_rtas.c along with a bunch of other things.  Since
we're going to be expanding these a bit, move the RTAS RTC related code
out into new file spapr_rtc.c.  Also add its own initialization function,
spapr_rtc_init() called from the main machine init routine.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-03-09 14:59:56 +01:00
Alexey Kardashevskiy ee9a569ab8 spapr_vio/spapr_iommu: Move VIO bypass where it belongs
Instead of tweaking a TCE table device by adding there a bypass flag,
let's add an alias to RAM and IOMMU memory region, and enable/disable
those according to the selected bypass mode.
This way IOMMU memory region can have size of the actual window rather
than ram_size which is essential for upcoming DDW support.

This moves bypass logic to VIO layer and keeps @bypass flag in TCE table
for migration compatibility only. This replaces spapr_tce_set_bypass()
calls with explicit assignment to avoid confusion as the function could
do something more that just syncing the @bypass flag.

This adds a pointer to VIO device into the sPAPRTCETable struct to provide
the sPAPRTCETable device a way to update bypass mode for the VIO device.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-03-09 14:59:52 +01:00
Samuel Mendoza-Jonas 01a579729b spapr: Fix stale HTAB during live migration (KVM)
If a guest reboots during a running migration, changes to the
hash page table are not necessarily updated on the destination.
Opening a new file descriptor to the HTAB forces the migration
handler to resend the entire table.

Signed-off-by: Samuel Mendoza-Jonas <sam.mj@au1.ibm.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-01-07 16:16:26 +01:00
Greg Kurz 8c46f7ec85 spapr_pci: map the MSI window in each PHB
On sPAPR, virtio devices are connected to the PCI bus and use MSI-X.
Commit cc943c36fa has modified MSI-X
so that writes are made using the bus master address space and follow
the IOMMU path.

Unfortunately, the IOMMU address space address space does not have an
MSI window: the notification is silently dropped in unassigned_mem_write
instead of reaching the guest... The most visible effect is that all
virtio devices are non-functional on sPAPR since then. :(

This patch does the following:
1) map the MSI window into the IOMMU address space for each PHB
   - since each PHB instantiates its own IOMMU address space, we
     can safely map the window at a fixed address (SPAPR_PCI_MSI_WINDOW)
   - no real need to keep the MSI window setup in a separate function,
     the spapr_pci_msi_init() code moves to spapr_phb_realize().

2) kill the global MSI window as it is not needed in the end

Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-09-08 12:50:53 +02:00
Benjamin Herrenschmidt b7d1f77ada spapr: Locate RTAS and device-tree based on real RMA
We currently calculate the final RTAS and FDT location based on
the early estimate of the RMA size, cropped to 256M on KVM since
we only know the real RMA size at reset time which happens much
later in the boot process.

This means the FDT and RTAS end up right below 256M while they
could be much higher, using precious RMA space and limiting
what the OS bootloader can put there which has proved to be
a problem with some OSes (such as when using very large initrd's)

Fortunately, we do the actual copy of the device-tree into guest
memory much later, during reset, late enough to be able to do it
using the final RMA value, we just need to move the calculation
to the right place.

However, RTAS is still loaded too early, so we change the code to
load the tiny blob into qemu memory early on, and then copy it into
guest memory at reset time. It's small enough that the memory usage
doesn't matter.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[aik: fixed errors from checkpatch.pl, defined RTAS_MAX_ADDR]
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[agraf: fix compilation on 32bit hosts]
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-09-08 12:50:48 +02:00
Alexander Graf 261265cc91 PPC: mac99: Move NVRAM to page boundary when necessary
When running KVM we have to adhere to host page boundaries for memory slots.
Unfortunately the NVRAM on mac99 is a 4k RAM hole inside of an MMIO flash
area.

So if our host is configured with 64k page size, we can't use the mac99 target
with KVM. This is a real shame, as this limitation is not really an issue - we
can easily map NVRAM somewhere else and at least Linux and Mac OS X use it
at their new location.

So in that emergency case when it's about failing to run at all and moving NVRAM
to a place it shouldn't be at, choose the latter.

This patch enables -M mac99 with KVM on 64k page size hosts.

Signed-off-by: Alexander Graf <agraf@suse.de>
2014-09-08 12:50:47 +02:00
Nikunj A Dadhania 2e14072f9e ppc: spapr-rtas - implement os-term rtas call
PAPR compliant guest calls this in absence of kdump. This finally
reaches the guest and can be handled according to the policies set by
higher level tools(like taking dump) for further analysis by tools like
crash.

Linux kernel calls ibm,os-term when extended property of os-term is set.
This makes sure that a return to the linux kernel is gauranteed.

Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
[agraf: reduce RTAS_TOKEN_MAX]
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-09-08 12:50:45 +02:00
Alexey Kardashevskiy 9a321e9234 spapr_pci: Use XICS interrupt allocator and do not cache interrupts in PHB
Currently SPAPR PHB keeps track of all allocated MSI (here and below
MSI stands for both MSI and MSIX) interrupt because
XICS used to be unable to reuse interrupts. This is a problem for
dynamic MSI reconfiguration which happens when guest reloads a driver
or performs PCI hotplug. Another problem is that the existing
implementation can enable MSI on 32 devices maximum
(SPAPR_MSIX_MAX_DEVS=32) and there is no good reason for that.

This makes use of new XICS ability to reuse interrupts.

This reorganizes MSI information storage in sPAPRPHBState. Instead of
static array of 32 descriptors (one per a PCI function), this patch adds
a GHashTable when @config_addr is a key and (first_irq, num) pair is
a value. GHashTable can dynamically grow and shrink so the initial limit
of 32 devices is gone.

This changes migration stream as @msi_table was a static array while new
@msi_devs is a dynamic hash table. This adds temporary array which is
used for migration, it is populated in "spapr_pci"::pre_save() callback
and expanded into the hash table in post_load() callback. Since
the destination side does not know the number of MSI-enabled devices
in advance and cannot pre-allocate the temporary array to receive
migration state, this makes use of new VMSTATE_STRUCT_VARRAY_ALLOC macro
which allocates the array automatically.

This resets the MSI configuration space when interrupts are released by
the ibm,change-msi RTAS call.

This fixed traces to be more informative.

This changes vmstate_spapr_pci_msi name from "...lsi" to "...msi" which
was incorrect by accident. As the internal representation changed,
thus bumps migration version number.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[agraf: drop g_malloc_n usage]
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-27 13:48:27 +02:00
Alexey Kardashevskiy 51bba713fe xics: Implement xics_ics_free()
This implements interrupt release function so IRQs can be returned back
to the pool for reuse in cases such as PCI hot plug.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-27 13:48:26 +02:00
Alexey Kardashevskiy bee763dbfb spapr: Move interrupt allocator to xics
The current allocator returns IRQ numbers from a pool and does not
support IRQs reuse in any form as it did not keep track of what it
previously returned, it only keeps the last returned IRQ. Some use
cases such as PCI hot(un)plug may require IRQ release and reallocation.

This moves an allocator from SPAPR to XICS.

This switches IRQ users to use new API.

This uses LSI/MSI flags to know if interrupt is allocated.

The interrupt release function will be posted as a separate patch.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-27 13:48:26 +02:00
Alexey Kardashevskiy 4af88944d0 xics: Add flags for interrupts
The existing interrupt allocation scheme in SPAPR assumes that
interrupts are allocated at the start time, continously and the config
will not change. However, there are cases when this is not going to work
such as:

1. migration - we will have to have an ability to choose interrupt
numbers for devices in the command line and this will create gaps in
interrupt space.

2. PCI hotplug - interrupts from unplugged device need to be returned
back to interrupt pool, otherwise we will quickly run out of interrupts.

This replaces a separate lslsi[] array with a byte in the ICSIRQState
struct and defines "LSI" and "MSI" flags. Neither of these flags set
signals that the descriptor is not allocated and not in use.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-27 13:48:26 +02:00
Sam bobroff 3b50d8974b spapr: Add RTAS sysparm SPLPAR Characteristics
Add support for the SPLPAR Characteristics parameter to the emulated
RTAS call ibm,get-system-parameter.

The support provides just enough information to allow "cat
/proc/powerpc/lparcfg" to succeed without generating a kernel error
message.

Without this patch the above command will produce the following kernel
message: arch/powerpc/platforms/pseries/lparcfg.c \
parse_system_parameter_string Error calling get-system-parameter \
(0xfffffffd)

Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-27 13:48:26 +02:00
Sam bobroff b907d7b0fd spapr: Add RTAS sysparm UUID
Add support for the UUID parameter to the emulated RTAS call
ibm,get-system-parameter.

Return the guest's UUID as the value for the RTAS UUID system
parameter, or null (a zero length result) if it is not set.

Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-27 13:48:26 +02:00
Sam bobroff 3052d95190 spapr: Fix RTAS sysparm DIAGNOSTICS_RUN_MODE
This allows the ibm,get-system-parameter RTAS call to succeed for the
DIAGNOSTICS_RUN_MODE system parameter.

The problem can be seen with "ppc64_cpu --run-mode" from the
powerpc-utils package which fails before this patch with "Machine does
not support diagnostic run mode".

This is corrected by using the rtas_st_buffer() function to write to
the buffer.

The RTAS constants are also moved out into a header file, some new
constants added and the surrounding code slightly simplified.

Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
[agraf: remove some commentary]
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-27 13:48:25 +02:00
Sam bobroff ce3fa1eca2 spapr: Add rtas_st_buffer utility function
Add a function to write lengh + data into a buffer as required for the
emulation of the RTAS ibm,get-system-parameter call.

If the destination is smaller than the source, the write is truncated
and success is returned. This matches the behaviour of pHyp.

This will be used in following patches.

Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-27 13:48:25 +02:00
Alexey Kardashevskiy 9bb62a0702 spapr_iommu: Make in-kernel TCE table optional
POWER KVM supports an KVM_CAP_SPAPR_TCE capability which allows allocating
TCE tables in the host kernel memory and handle H_PUT_TCE requests
targeted to specific LIOBN (logical bus number) right in the host without
switching to QEMU. At the moment this is used for emulated devices only
and the handler only puts TCE to the table. If the in-kernel H_PUT_TCE
handler finds a LIOBN and corresponding table, it will put a TCE to
the table and complete hypercall execution. The user space will not be
notified.

Upcoming VFIO support is going to use the same sPAPRTCETable device class
so KVM_CAP_SPAPR_TCE is going to be used as well. That means that TCE
tables for VFIO are going to be allocated in the host as well.
However VFIO operates with real IOMMU tables and simple copying of
a TCE to the real hardware TCE table will not work as guest physical
to host physical address translation is requited.

So until the host kernel gets VFIO support for H_PUT_TCE, we better not
to register VFIO's TCE in the host.

This adds a place holder for KVM_CAP_SPAPR_TCE_VFIO capability. It is not
in upstream yet and being discussed so now it is always false which means
that in-kernel VFIO acceleration is not supported.

This adds a bool @vfio_accel flag to the sPAPRTCETable device telling
that sPAPRTCETable should not try allocating TCE table in the host kernel
for VFIO. The flag is false now as at the moment there is no VFIO.

This adds an vfio_accel parameter to spapr_tce_new_table(), the semantic
is the same. Since there is only emulated PCI and VIO now, the flag is set
to false. Upcoming VFIO support will set it to true.

This is a preparation patch so no change in behaviour is expected

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-27 13:48:23 +02:00
Alexey Kardashevskiy 3a3b8502e6 spapr: Fix RTAS token numbers
At the moment spapr_rtas_register() allocates a new token number for every
new RTAS callback so numbers are not fixed and depend on the number of
supported RTAS handlers and the exact order of spapr_rtas_register() calls.
These tokens are copied into the device tree and remain the same during
the guest lifetime.

When we start another guest to receive a migration, it calls
spapr_rtas_register() as well. If the number of RTAS handlers or their
order is different in QEMU on source and destination sides, the "/rtas"
node in the device tree will differ. Since migration overwrites the device
tree (as it overwrites the entire RAM), the actual RTAS config on
the destination side gets broken.

This defines global contant values for every RTAS token which QEMU
is using today.

This changes spapr_rtas_register() to accept a token number instead of
allocating one. This changes all users of spapr_rtas_register().

This changes XICS-KVM not to cache tokens registered with KVM as they
constant now.

This makes TOKEN_BASE global as RTAS_XXX use TOKEN_BASE as
a base. TOKEN_MAX is moved and renamed too and its value is changed
to the last token + 1. Boundary checks for token values are adjusted.

This reserves token numbers for "os-term" handlers and PCI hotplug
which we are working on.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-27 13:48:22 +02:00
Badari Pulavarty 9dbae97723 spapr_pci: Advertise MSI quota
Hotplug of multiple disks fails due to MSI vector quota check.
Number of MSI vectors default to 8 allowing only 4 devices.
This happens on RHEL6.5 guest. RHEL7 and SLES11 guests fallback
to INTX.

One way to workaround the issue is to increase total MSIs,
so that MSI quota check allows us to hotplug multiple disks.

This sets the quota to the maximum number of interupts XICS has
which is 1024 now (XICS_IRQS). This moves XICS_IRQS from spapr.c
to xics.h for wider visibility.

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
[aik: put XICS_IRQS=1024 instead of 64i, fixed endianness and size]
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-16 13:24:46 +02:00
Alexey Kardashevskiy d5ac4f5433 spapr_hcall: Add address-translation-mode-on-interrupt resource in H_SET_MODE
This adds handling of the RESOURCE_ADDR_TRANS_MODE resource from
the H_SET_MODE, for POWER8 (PowerISA 2.07) only.

This defines AIL flags for LPCR special register.

This changes @excp_prefix according to the mode, takes effect in TCG.

This turns support of a new capability PPC2_ISA207S flag for TCG.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Tom Musta <tommusta@gmail.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-16 13:24:45 +02:00
Alexey Kardashevskiy 1b8eceee28 spapr_iommu: Introduce bus_offset in sPAPRTCETable
This adds @bus_offset into sPAPRTCETable to tell where TCE table starts
from. It is set to 0 for emulated devices. Dynamic DMA windows will use
other offset.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-16 13:24:39 +02:00
Alexey Kardashevskiy 650f33adbd spapr_iommu: Introduce page_shift in sPAPRTCETable
At the moment only 4K pages are supported by sPAPRTCETable. Since sPAPR
spec allows other page sizes and we are going to implement them, we need
page size to be configrable.

This adds @page_shift into sPAPRTCETable and replaces SPAPR_TCE_PAGE_SHIFT
with it where it is possible.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-16 13:24:39 +02:00
Alexey Kardashevskiy 523e7b8ab8 spapr_iommu: Get rid of window_size in sPAPRTCETable
This removes window_size as it is basically a copy of nb_table
shifted by SPAPR_TCE_PAGE_SHIFT. As new dynamic DMA windows are
going to support windows as big as the entire RAM and this number
will be bigger that 32 capacity, we will have to do something
about @window_size anyway and removal seems to be the right way to go.

This removes dma_window_start/dma_window_size from sPAPRPHBState as
they are no longer used.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-16 13:24:39 +02:00
Alexander Graf 3e300fa6ad macio ide: Do remainder access asynchronously
The macio IDE controller has some pretty nasty magic in its implementation to
allow for unaligned sector accesses. We used to handle these accesses
synchronously inside the IO callback handler.

However, the block infrastructure changed below our feet and now it's impossible
to call a synchronous block read/write from the aio callback handler of a
previous block access.

Work around that limitation by making the unaligned handling bits also go
through our asynchronous handler.

This fixes booting Mac OS X for me.

Reported-by: John Arbuckle <programmingkidx@gmail.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-16 13:24:38 +02:00
Alexey Kardashevskiy 2a6593cb6a spapr: Add ibm, client-architecture-support call
The PAPR+ specification defines a ibm,client-architecture-support (CAS)
RTAS call which purpose is to provide a negotiation mechanism for
the guest and the hypervisor to work out the best compatibility parameters.
During the negotiation process, the guest provides an array of various
options and capabilities which it supports, the hypervisor adjusts
the device tree and (optionally) reboots the guest.

At the moment the Linux guest calls CAS method at early boot so SLOF
gets called. SLOF allocates a memory buffer for the device tree changes
and calls a custom KVMPPC_H_CAS hypercall. QEMU parses the options,
composes a diff for the device tree, copies it to the buffer provided
by SLOF and returns to SLOF. SLOF updates the device tree and returns
control to the guest kernel. Only then the Linux guest parses the device
tree so it is possible to avoid unnecessary reboot in most cases.

The device tree diff is a header with an update format version
(defined as 1 in this patch) followed by a device tree with the properties
which require update.

If QEMU detects that it has to reboot the guest, it silently does so
as the guest expects reboot to happen because this is usual pHyp firmware
behavior.

This defines custom KVMPPC_H_CAS hypercall. The current SLOF already
has support for it.

This implements stub which returns very basic tree (root node,
no properties) to the guest.

As the return buffer does not contain any change, no change in behavior is
expected.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-16 13:24:37 +02:00
Alexey Kardashevskiy 98a8b52442 spapr: Add support for time base offset migration
This allows guests to have a different timebase origin from the host.

This is needed for migration, where a guest can migrate from one host
to another and the two hosts might have a different timebase origin.
However, the timebase seen by the guest must not go backwards, and
should go forwards only by a small amount corresponding to the time
taken for the migration.

This is only supported for recent POWER hardware which has the TBU40
(timebase upper 40 bits) register. That includes POWER6, 7, 8 but not
970.

This adds kvm_access_one_reg() to access a special register which is not
in env->spr. This requires kvm_set_one_reg/kvm_get_one_reg patch.

The feature must be present in the host kernel.

This bumps vmstate_spapr::version_id and enables new vmstate_ppc_timebase
only for it. Since the vmstate_spapr::minimum_version_id remains
unchanged, migration from older QEMU is supported but without
vmstate_ppc_timebase.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-16 13:24:35 +02:00
BALATON Zoltan 9d1c128341 mac99: Added FW_CFG_PPC_BUSFREQ to match CLOCKFREQ and TBFREQ already there
While there, also moved the hard coded value for CLOCKFREQ to a #define.

Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-16 13:24:28 +02:00
Alexander Graf e81a982aa5 PPC: Clean up DECR implementation
There are 3 different variants of the decrementor for BookE and BookS.

The BookE variant sets TSR[DIS] to 1 when the DEC value becomes 1 or 0. TSR[DIS]
is then the indicator whether the decrementor interrupt line is asserted or not.

The old BookS variant treats DEC as an edge interrupt that gets triggered when
the DEC value's top bit turns 1 from 0.

The new BookS variant maintains the assertion bit inside DEC itself. Whenever
the DEC value becomes negative (top bit set) the DEC interrupt line is asserted.

So far we implemented mostly the old BookS variant. Let's do them all properly.

This fixes booting pseries ppc64 guest images in TCG mode for me.

Signed-off-by: Alexander Graf <agraf@suse.de>
2014-04-08 11:20:04 +02:00
Alexey Kardashevskiy a46622fd07 spapr_hcall: Fix little-endian resource handling in H_SET_MODE
This changes resource code definitions to ones used in the host kernel.

This fixes H_SET_MODE_RESOURCE_LE (switch between big endian and
little endian) to sync registers from KVM before changing LPCR value.

This adds a set_spr() helper to update an SPR in a CPU's context to avoid
possible races and makes use of it to change LPCR.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2014-03-20 02:39:33 +01:00
Edgar E. Iglesias ab1da85791 exec: Make stl_*_phys input an AddressSpace
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
2014-02-11 22:57:18 +10:00
Edgar E. Iglesias fdfba1a298 exec: Make ldl_*_phys input an AddressSpace
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
2014-02-11 22:56:54 +10:00
Alexey Kardashevskiy 3ada6b1137 spapr-rtas: add ibm, (get|set)-system-parameter
This adds very basic handlers for ibm,get-system-parameter and
ibm,set-system-parameter RTAS calls.

The only parameter handled at the moment is
"platform-processor-diagnostics-run-mode" which is always disabled and
does not support changing. This is expected to make
"ppc64_cpu --run-mode=1" happy.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[agraf: s/papameter/parameter/g]
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-12-20 01:57:59 +01:00
Alexey Kardashevskiy a64d325df1 spapr-rtas: replace return code constants with macros
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-12-20 01:57:59 +01:00
Stefan Weil 1246b259f8 misc: Replace 'struct QEMUTimer' by 'QEMUTimer'
Most code already used QEMUTimer without the redundant 'struct' keyword.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2013-12-02 21:03:39 +04:00
Andreas Färber 3bbf37f269 spapr: Use DeviceClass::fw_name for device tree CPU node
Instead of relying on cpu_model, obtain the device tree node label
per CPU. Use DeviceClass::fw_name as source.

Whenever DeviceClass::fw_name is unknown, default to "PowerPC,UNKNOWN".

As a consequence, spapr_fixup_cpu_dt() can operate on each CPU's fw_name,
obsoleting sPAPREnvironment::cpu_model, and spapr_create_fdt_skel() can
drop its cpu_model argument.

Signed-off-by: Prerna Saxena <prerna@linux.vnet.ibm.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-25 23:25:48 +02:00
Benjamin Herrenschmidt 5d87e4b74a xics: Implement H_XIRR_X
This implements H_XIRR_X hypercall in addition to H_XIRR as
it is mandatory for PAPR+ and there is no way for the guest to
detect whether it is supported or not so just add it.

As the Partition Adjunct Option is not supported at the moment,
the CPPR parameter of the hypercall is ignored.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-25 23:25:47 +02:00
David Gibson 11ad93f681 xics-kvm: Support for in-kernel XICS interrupt controller
Recent (host) kernels support emulating the PAPR defined "XICS" interrupt
controller system within KVM.  This patch allows qemu to initialize and
configure the in-kernel XICS, and keep its state in sync with qemu's XICS
state as necessary.

This should give considerable performance improvements.  e.g. on a simple
IPI ping-pong test between hardware threads, using qemu XICS gives us
around 5,000 irqs/second, whereas the in-kernel XICS gives us around
70,000 irqs/s on the same hardware configuration.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[Mike Qiu <qiudayu@linux.vnet.ibm.com>: fixed mistype which caused ics_set_kvm_state() to fail]
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-25 23:25:47 +02:00
Alexey Kardashevskiy 5eb92ccc3f xics: add cpu_setup callback
This adds a cpu_setup callback to the XICS device class (as XICS-KVM
will do it different), xics_cpu_setup() will call it if it is set.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-25 23:25:47 +02:00
Alexey Kardashevskiy 5a3d7b23ba xics: split to xics and xics-common
The upcoming XICS-KVM support will use bits of emulated XICS code.
So this introduces new level of hierarchy - "xics-common" class. Both
emulated XICS and XICS-KVM will inherit from it and override class
callbacks when required.

The new "xics-common" class implements:
1. replaces static "nr_irqs" and "nr_servers" properties with
the dynamic ones and adds callbacks to be executed when properties
are set.
2. xics_cpu_setup() callback renamed to xics_common_cpu_setup() as
it is a common part for both XICS'es
3. xics_reset() renamed to xics_common_reset() for the same reason.

The emulated XICS changes:
1. the part of xics_realize() which creates ICPs is moved to
the "nr_servers" property callback as realize() is too late to
create/initialize devices and instance_init() is too early to create
devices as the number of child devices comes via the "nr_servers"
property.
2. added ics_initfn() which does a little part of what xics_realize() did.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-25 23:25:47 +02:00
Alexey Kardashevskiy d1b5682d88 xics: add pre_save/post_load dispatchers
The upcoming support of in-kernel XICS will redefine migration callbacks
for both ICS and ICP so classes and callback pointers are added.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-25 23:25:46 +02:00
Alexey Kardashevskiy 4fe822e075 spapr-rtas: fix h_rtas parameters reading
On the real hardware, RTAS is called in real mode and therefore
top 4 bits of the address passed in the call are ignored.
So does the patch.

This converts h_rtas() to use existing rtas_ld() handlers.

This fixed rtas_ld()/rtas_st() to ignore top 4 bits.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-25 23:25:46 +02:00
Anton Blanchard 42561bf2e4 pseries: Add H_SET_MODE hcall to change guest exception endianness
H_SET_MODE is used for controlling various partition settings. One
of these settings is the endianness a guest takes its exceptions in.

Signed-off-by: Anton Blanchard <anton@samba.org>
[agraf: fix whitespace]
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-09-02 10:06:42 +02:00
Alexey Kardashevskiy f1c2dc7c86 spapr-pci: rework MSI/MSIX
On the sPAPR platform a guest allocates MSI/MSIX vectors via RTAS
hypercalls which return global IRQ numbers to a guest so it only
operates with those and never touches MSIMessage.

Therefore MSIMessage handling is completely hidden in QEMU.

Previously every sPAPR PCI host bridge implemented its own MSI window
to catch msi_notify()/msix_notify() calls from QEMU devices (virtio-pci
or vfio) and route them to the guest via qemu_pulse_irq().
MSIMessage used to be encoded as:
	.addr - address within the PHB MSI window;
	.data - the device index on PHB plus vector number.
The MSI MR write function translated this MSIMessage to a global IRQ
number and called qemu_pulse_irq().

However the total number of IRQs is not really big (at the moment it is
1024 IRQs starting from 4096) and even 16bit data field of MSIMessage
seems to be enough to store an IRQ number there.

This simplifies MSI handling in sPAPR PHB. Specifically, this does:
1. remove a MSI window from a PHB;
2. add a single memory region for all MSIs to sPAPREnvironment
and spapr_pci_msi_init() to initialize it;
3. encode MSIMessage as:
    * .addr - a fixed address of SPAPR_PCI_MSI_WINDOW==0x40000000000ULL;
    * .data as an IRQ number.
4. change IRQ allocator to align first IRQ number in a block for MSI.
MSI uses lower bits to specify the vector number so the first IRQ has to
be aligned. MSIX does not need any special allocator though.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-09-02 10:06:42 +02:00
Anthony Liguori c04d6cfa3f xics: rename types to be sane and follow coding style
Basically, in HW the layout of the interrupt network is:

     - One ICP per processor thread (the "presenter"). This contains the
    registers to fetch a pending interrupt (ack), EOI, and control the
    processor priority.

     - One ICS per logical source of interrupts (ie, one per PCI host
    bridge, and a few others here or there). This contains the per-interrupt
    source configuration (target processor(s), priority, mask) and the
    per-interrupt internal state.

    Under PAPR, there is a single "virtual" ICS ... somewhat (it's a bit
    oddball what pHyp does here, arguably there are two but we can ignore
    that distinction). There is no register level access. A pair of firmware
    (RTAS) calls is used to configure each virtual interrupt.

    So our model here is somewhat the same. We have one ICS in the emulated
    XICS which arguably *is* the emulated XICS, there's no point making it a
    separate "device", that would just be gross, and each VCPU has an
    associated ICP.

Yet we call the "XICS" struct icp_state and then the ICPs
'struct icp_server_state'.  It's particularly confusing when all of the
functions have xics_prefixes yet take *icp arguments.

Rename:

  struct icp_state -> XICSState
  struct icp_server_state -> ICPState
  struct ics_state -> ICSState
  struct ics_irq_state -> ICSIRQState

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-id: 1374175984-8930-12-git-send-email-aliguori@us.ibm.com
[aik: added ics_resend() on post_load]
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2013-07-29 10:37:09 -05:00
Alexey Kardashevskiy e68cb8b4fa pseries: savevm support with KVM
At present, the savevm / migration support for the pseries machine will not
work when KVM is enabled.  That's because KVM manages the guest's hash page
table in the host kernel, so qemu has no visibility of it.  This patch
fixes this by using new kernel interfaces to extract and reinsert the
guest's hash table during the migration process.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Message-id: 1374175984-8930-11-git-send-email-aliguori@us.ibm.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2013-07-29 10:37:09 -05:00
David Gibson 4be21d561d pseries: savevm support for pseries machine
This adds the necessary pieces to implement savevm / migration for the
pseries machine.  The most complex part here is migrating the hash
table - for the paravirtualized pseries machine the guest's hash page
table is not stored within guest memory, but externally and the guest
accesses it via hypercalls.

This patch uses a hypervisor reserved bit of the HPTE as a dirty bit
(tracking changes to the HPTE itself, not the page it references).
This is used to implement a live migration style incremental save and
restore of the hash table contents.

Normally a hash table is 16MB but it can get bigger depending on how
much RAM the guest has. Due to its nature, updates to it are random so
the live migration style is used for it.

In addition it adds VMStateDescription information to save and restore
the (few) remaining pieces of state information needed by the pseries
machine.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
Message-id: 1374175984-8930-9-git-send-email-aliguori@us.ibm.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2013-07-29 10:37:08 -05:00
Anthony Liguori a83000f5e3 spapr-tce: make sPAPRTCETable a proper device
Model TCE tables as a device that's hooked up as a child object to
the owner.  Besides the code cleanup, we get a few nice benefits:

1) free actually works now (it was dead code before)

2) the TCE information is visible in the device tree

3) we can expose table information as properties such that if we
   change the window_size, we can use globals to keep migration
   working.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-id: 1374175984-8930-6-git-send-email-aliguori@us.ibm.com
[dwg: pseries: savevm support for PAPR TCE tables]
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[alexey: ppc kvm: fix to compile]
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2013-07-29 10:37:08 -05:00
David Gibson b368a7d864 pseries: savevm support for VIO devices
This patch adds helpers to allow PAPR VIO devices to save state common
to all VIO devices during savevm.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
Message-id: 1374175984-8930-3-git-send-email-aliguori@us.ibm.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2013-07-29 10:37:08 -05:00
Alexander Graf 80fc95d8bd PPC: dbdma: Support unaligned DMA access
The DBDMA engine really just reads bytes from a producing device (IDE
in our case) and shoves these bytes into memory. It doesn't care whether
any alignment takes place or not.

Our code today however assumes that block accesses always happen on
sector (512 byte) boundaries. This is a fair assumption for most cases.

However, Mac OS X really likes to do unaligned, incomplete accesses
that it finishes with the next DMA request.

So we need to read / write the unaligned bits independent of the actual
asynchronous request, because that one can only handle 512-byte-aligned
data. We also need to cache these unaligned sectors until the next DMA
request, at which point the data might be successfully flushed from the
pipe.

Signed-off-by: Alexander Graf <agraf@suse.de>
2013-07-11 18:51:25 +02:00
Alexander Graf 03ee3b1e58 PPC: dbdma: Move processing to io
Soon we will introduce intermediate processing pauses which will
allow the bottom half to restart a DMA request that couldn't be
fulfilled yet.

For that to work, move the processing variable into the io struct
which is what DMA providers work with.

While touching it, also change it into a bool

Signed-off-by: Alexander Graf <agraf@suse.de>
2013-07-11 18:51:25 +02:00
Alexander Graf d2f0ce2189 PPC: dbdma: Move static bh variable to device struct
The DBDMA controller has a bottom half to asynchronously process DMA
request queues.

This bh was stored as a gross static variable. Move it into the device
struct instead.

While at it, move all users of it to the new generic kick function.

Signed-off-by: Alexander Graf <agraf@suse.de>
2013-07-11 18:51:24 +02:00
Alexander Graf d1e562deb2 PPC: dbdma: Introduce kick function
The DBDMA engine really is running all the time, waiting for input. However
we don't want to waste cycles constantly polling.

So introduce a kick function that data providers can call to notify the
DBDMA controller of new input.

Signed-off-by: Alexander Graf <agraf@suse.de>
2013-07-11 18:51:24 +02:00
Alexander Graf f2f963fd07 PPC: dbdma: Move defines into header file
We usually keep struct and constant definitions in header files. Move
them there to stay consistent and to make access to fields easier.

Signed-off-by: Alexander Graf <agraf@suse.de>
2013-07-11 18:51:24 +02:00
Andreas Färber 2b927571cc intc/openpic: Build openpic only once
Since current_cpu is CPUState it no longer depends on CPUPPCState.

Move ppce500_set_mpic_proxy() to a new hw/ppc/ppc_e500.h because
hw/ppc/ppc.h is too heavily using CPUPPCState and PowerPCCPU.

Signed-off-by: Andreas Färber <afaerber@suse.de>
2013-07-09 21:33:02 +02:00
Paolo Bonzini 84af6d9f97 spapr_iommu: pass device to spapr_tce_new_table and use it to set owner
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-07-04 17:42:47 +02:00
Alexander Graf a1014f25ef PPC: Add clock-frequency export for Mac machines
Support in fwcfg has been around for exposure of the clock-frequency
CPU property. OpenBIOS reads it, we just never exposed it.

Since Mac OS X is very picky about its clock frequency values, let's
just take a known good value and always expose that.

Reported-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-07-01 01:11:17 +02:00
Anthony Liguori 210b580b10 spapr-rtas: add CPU argument to RTAS calls
RTAS is a hypervisor provided binary blob that a guest loads and
calls into to execute certain functions.  It's similar to the
vsyscall page in Linux or the short lived VMCI paravirt interface
from VMware.

The QEMU implementation of the RTAS blob is simply a passthrough
that proxies all RTAS calls to the hypervisor via an hypercall.

While we pass a CPU argument for hypercall handling in QEMU, we
don't pass it for RTAS calls.  Since some RTAs calls require
making hypercalls (normally RTAS is implemented as guest code) we
have nasty hacks to allow that.

Add a CPU argument to RTAS call handling so we can more easily
invoke hypercalls just as guest code would.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-07-01 01:11:16 +02:00
Andreas Färber dd49c038c3 intc/openpic_kvm: Fix QOM and build issues
Signed-off-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-07-01 01:11:15 +02:00
Andreas Färber e1766344fd intc/openpic: QOM'ify
Introduce type constant and cast macro.

Signed-off-by: Andreas Färber <afaerber@suse.de>
Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-07-01 01:11:14 +02:00
Scott Wood d85937e683 kvm/openpic: in-kernel mpic support
Enables support for the in-kernel MPIC that thas been merged into the
KVM next branch.  This includes irqfd/KVM_IRQ_LINE support from Alex
Graf (along with some other improvements).

Note from Alex regarding kvm_irqchip_create():

  On x86, one would call kvm_irqchip_create() to initialize an
  in-kernel interrupt controller.  That function then goes ahead and
  initializes global capability variables as well as the default irq
  routing table.

  On ppc, we can't call kvm_irqchip_create() because we can have
  different types of interrupt controllers.  So we want to do all the
  things that function would do for us in the in-kernel device init
  handler.

Signed-off-by: Scott Wood <scottwood@freescale.com>
[agraf: squash in kvm_irqchip_commit_routes patch, fix non-kvm build,
        fix ppcemb]
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-07-01 01:11:14 +02:00
Scott Wood 8935a442cd openpic: factor out some common defines into openpic.h
...for use by the KVM in-kernel irqchip stub.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-07-01 01:11:14 +02:00
Paolo Bonzini df32fd1c9f dma: eliminate DMAContext
The DMAContext is a simple pointer to an AddressSpace that is now always
already available.  Make everyone hold the address space directly,
and clean up the DMA API to use the AddressSpace directly.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-06-20 16:39:52 +02:00
Paolo Bonzini 96478592a9 spapr_vio: take care of creating our own AddressSpace/DMAContext
Fetch the root region from the sPAPRTCETable, and use it to build
an AddressSpace and DMAContext.

Now, everywhere we have a DMAContext we also have access to the
corresponding AddressSpace (either because we create it just before
the DMAContext, or because dma_context_memory's AddressSpace is
trivially address_space_memory).

Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-06-20 16:32:48 +02:00
Paolo Bonzini a84bb43669 spapr: use memory core for iommu support
Now we can stop using a "translating" DMAContext, but we do not yet modify
the sPAPRTCETable users to get an AddressSpace; they keep using the table
via a DMAContext.

Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-06-20 16:32:47 +02:00
Paolo Bonzini 2b7dc949e2 spapr: convert TCE API to use an opaque type
The TCE table is currently returned as a DMAContext, and non-type-safe
APIs are called later passing back the DMAContext.  Since we want to move
away from DMAContext, use an opaque type instead, and add an accessor
to retrieve the DMAContext from it.

Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-06-20 16:32:47 +02:00
Paolo Bonzini 0d09e41a51 hw: move headers to include/
Many of these should be cleaned up with proper qdev-/QOM-ification.
Right now there are many catch-all headers in include/hw/ARCH depending
on cpu.h, and this makes it necessary to compile these files per-target.
However, fixing this does not belong in these patches.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-04-08 18:13:10 +02:00