Commit Graph

67607 Commits

Author SHA1 Message Date
Daniel P. Berrange 8953caf3cd authz: add QAuthZPAM object type for authorizing using PAM
Add an authorization backend that talks to PAM to check whether the user
identity is allowed. This only uses the PAM account validation facility,
which is essentially just a check to see if the provided username is permitted
access. It doesn't use the authentication or session parts of PAM, since
that's dealt with by the relevant part of QEMU (eg VNC server).

Consider starting QEMU with a VNC server and telling it to use TLS with
x509 client certificates and configuring it to use an PAM to validate
the x509 distinguished name. In this example we're telling it to use PAM
for the QAuthZ impl with a service name of "qemu-vnc"

 $ qemu-system-x86_64 \
     -object tls-creds-x509,id=tls0,dir=/home/berrange/security/qemutls,\
             endpoint=server,verify-peer=yes \
     -object authz-pam,id=authz0,service=qemu-vnc \
     -vnc :1,tls-creds=tls0,tls-authz=authz0

This requires an /etc/pam/qemu-vnc file to be created with the auth
rules. A very simple file based whitelist can be setup using

  $ cat > /etc/pam/qemu-vnc <<EOF
  account         requisite       pam_listfile.so item=user sense=allow file=/etc/qemu/vnc.allow
  EOF

The /etc/qemu/vnc.allow file simply contains one username per line. Any
username not in the file is denied. The usernames in this example are
the x509 distinguished name from the client's x509 cert.

  $ cat > /etc/qemu/vnc.allow <<EOF
  CN=laptop.berrange.com,O=Berrange Home,L=London,ST=London,C=GB
  EOF

More interesting would be to configure PAM to use an LDAP backend, so
that the QEMU authorization check data can be centralized instead of
requiring each compute host to have file maintained.

The main limitation with this PAM module is that the rules apply to all
QEMU instances on the host. Setting up different rules per VM, would
require creating a separate PAM service name & config file for every
guest. An alternative approach for the future might be to not pass in
the plain username to PAM, but instead combine the VM name or UUID with
the username. This requires further consideration though.

Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2019-02-26 15:32:19 +00:00
Daniel P. Berrangé 55d869846d authz: add QAuthZListFile object type for a file access control list
Add a QAuthZListFile object type that implements the QAuthZ interface. This
built-in implementation is a proxy around the QAuthZList object type,
initializing it from an external file, and optionally, automatically
reloading it whenever it changes.

To create an instance of this object via the QMP monitor, the syntax
used would be:

      {
        "execute": "object-add",
        "arguments": {
          "qom-type": "authz-list-file",
          "id": "authz0",
          "props": {
            "filename": "/etc/qemu/vnc.acl",
	    "refresh": true
          }
        }
      }

If "refresh" is "yes", inotify is used to monitor the file,
automatically reloading changes. If an error occurs during reloading,
all authorizations will fail until the file is next successfully
loaded.

The /etc/qemu/vnc.acl file would contain a JSON representation of a
QAuthZList object

    {
      "rules": [
         { "match": "fred", "policy": "allow", "format": "exact" },
         { "match": "bob", "policy": "allow", "format": "exact" },
         { "match": "danb", "policy": "deny", "format": "glob" },
         { "match": "dan*", "policy": "allow", "format": "exact" },
      ],
      "policy": "deny"
    }

This sets up an authorization rule that allows 'fred', 'bob' and anyone
whose name starts with 'dan', except for 'danb'. Everyone unmatched is
denied.

The object can be loaded on the comand line using

   -object authz-list-file,id=authz0,filename=/etc/qemu/vnc.acl,refresh=yes

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2019-02-26 15:32:18 +00:00
Daniel P. Berrange c8c99887d1 authz: add QAuthZList object type for an access control list
Add a QAuthZList object type that implements the QAuthZ interface. This
built-in implementation maintains a trivial access control list with a
sequence of match rules and a final default policy. This replicates the
functionality currently provided by the qemu_acl module.

To create an instance of this object via the QMP monitor, the syntax
used would be:

  {
    "execute": "object-add",
    "arguments": {
      "qom-type": "authz-list",
      "id": "authz0",
      "props": {
        "rules": [
           { "match": "fred", "policy": "allow", "format": "exact" },
           { "match": "bob", "policy": "allow", "format": "exact" },
           { "match": "danb", "policy": "deny", "format": "glob" },
           { "match": "dan*", "policy": "allow", "format": "exact" },
        ],
        "policy": "deny"
      }
    }
  }

This sets up an authorization rule that allows 'fred', 'bob' and anyone
whose name starts with 'dan', except for 'danb'. Everyone unmatched is
denied.

It is not currently possible to create this via -object, since there is
no syntax supported to specify non-scalar properties for objects. This
is likely to be addressed by later support for using JSON with -object,
or an equivalent approach.

In any case the future "authz-listfile" object can be used from the
CLI and is likely a better choice, as it allows the ACL to be refreshed
automatically on change.

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2019-02-26 15:32:18 +00:00
Daniel P. Berrangé fb5c4ebc08 authz: add QAuthZSimple object type for easy whitelist auth checks
In many cases a single VM will just need to whitelist a single identity
as the allowed user of network services. This is especially the case for
TLS live migration (optionally with NBD storage) where we just need to
whitelist the x509 certificate distinguished name of the source QEMU
host.

Via QMP this can be configured with:

  {
    "execute": "object-add",
    "arguments": {
      "qom-type": "authz-simple",
      "id": "authz0",
      "props": {
        "identity": "fred"
      }
    }
  }

Or via the command line

  -object authz-simple,id=authz0,identity=fred

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2019-02-26 15:25:58 +00:00
Daniel P. Berrange 5b76dd132c authz: add QAuthZ object as an authorization base class
The current qemu_acl module provides a simple access control list
facility inside QEMU, which is used via a set of monitor commands
acl_show, acl_policy, acl_add, acl_remove & acl_reset.

Note there is no ability to create ACLs - the network services (eg VNC
server) were expected to create ACLs that they want to check.

There is also no way to define ACLs on the command line, nor potentially
integrate with external authorization systems like polkit, pam, ldap
lookup, etc.

The QAuthZ object defines a minimal abstract QOM class that can be
subclassed for creating different authorization providers.

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2019-02-26 15:25:58 +00:00
Daniel P. Berrangé 47287c27d0 hw/usb: switch MTP to use new inotify APIs
The internal inotify APIs allow a lot of conditional statements to be
cleared out, and provide a simpler callback for handling events.

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2019-02-26 15:25:58 +00:00
Daniel P. Berrangé 888e0359bf hw/usb: fix const-ness for string params in MTP driver
Various functions accepting 'char *' string parameters were missing
'const' qualifiers.

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2019-02-26 15:25:58 +00:00
Daniel P. Berrangé 3c48baf1d4 hw/usb: don't set IN_ISDIR for inotify watch in MTP driver
IN_ISDIR is not a bit that one can request when registering a
watch with inotify_add_watch. Rather it is a bit that is set
automatically when reading events from the kernel.

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2019-02-26 15:25:58 +00:00
Daniel P. Berrangé 6134d7522e qom: don't require user creatable objects to be registered
When an object is in turn owned by another user object, it is not
desirable to expose this in the QOM object hierarchy. It is just an
internal implementation detail, we should be free to change without
exposure to apps managing QEMU.

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2019-02-26 15:25:58 +00:00
Daniel P. Berrangé 90e33dfec6 util: add helper APIs for dealing with inotify in portable manner
The inotify userspace API for reading events is quite horrible, so it is
useful to wrap it in a more friendly API to avoid duplicating code
across many users in QEMU. Wrapping it also allows introduction of a
platform portability layer, so that we can add impls for non-Linux based
equivalents in future.

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2019-02-26 15:25:58 +00:00
Alex Bennée bf30e8662c tests/Makefile.include: test all rounding modes of softfloat
We missed a bug in a recent patch as we were not testing all the
rounding modes for all operations. However enabling all rounding modes
for mulAdd does slow down the already slowest test and doesn't really
buy us much additional coverage so lets allow the default test flags
to be overridden.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2019-02-26 14:08:03 +00:00
Richard Henderson 5d64abb32f softfloat: Support float_round_to_odd more places
Previously this was only supported for roundAndPackFloat64.

New support in round_canonical, round_to_int, float128_round_to_int,
roundAndPackFloat32, roundAndPackInt32, roundAndPackInt64,
roundAndPackUint64.  This does not include any of the floatx80 routines,
as we do not have users for that rounding mode there.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20190215170225.15537-1-richard.henderson@linaro.org>
Tested-by: David Hildenbrand <david@redhat.com>
[AJB: add missing break]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
2019-02-26 14:08:03 +00:00
Alex Bennée dc3f8a9dcf tests/fp: enable f128_to_ui[32/64] tests in float-to-uint
We've just added f128_to_ui32 and we missed out the f128_to_ui64 tests
last time.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2019-02-26 14:07:34 +00:00
Alex Bennée 80d491fea3 tests/fp: add wrapping for f128_to_ui32
Needed to test: softfloat: Implement float128_to_uint32

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2019-02-26 14:05:19 +00:00
David Hildenbrand e45de9922e softfloat: Implement float128_to_uint32
Handling it just like float128_to_uint32_round_to_zero, that hopefully
is free of bugs :)

Documentation basically copied from float128_to_uint64

Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
2019-02-26 14:05:19 +00:00
David Hildenbrand 4739318160 softfloat: add float128_is_{normal,denormal}
Needed on s390x, to test for the data class of a number. So it will
gain soon a user.

A number is considered normal if the exponent is neither 0 nor all 1's.
That can be checked by adding 1 to the exponent, and comparing against
>= 2 after dropping an eventual overflow into the sign bit.

While at it, convert the other floatXX_is_normal functions to use a
similar, less error prone calculation, as suggested by Richard H.

Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
2019-02-26 14:05:19 +00:00
Eric Blake 3acf04f899 tests: Ignore fp test outputs
Commit 2cade3d wired up new tests, but did not exclude the
new *.out files produced by running the tests.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
2019-02-26 14:05:19 +00:00
Murilo Opsfelder Araujo b268a6162d ppc/pnv: use IEC binary prefixes to represent sizes
Using IEC binary prefixes from qemu/units.h provides a more human-friendly value
to size constants.

Suggested-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Message-Id: <20190225170155.1972-4-muriloo@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 14:20:30 +11:00
Murilo Opsfelder Araujo 584ea7e76f ppc/pnv: add INITRD_MAX_SIZE constant
The current 0x10000000 value is actually 256MiB, not 128MB as the comment
suggests. Move it to a constant and fix the comment (no change in the size
value).

Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Message-Id: <20190225170155.1972-3-muriloo@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 14:20:30 +11:00
Murilo Opsfelder Araujo b45b56baee ppc/pnv: increase kernel size limit to 256MiB
Building kernel with CONFIG_DEBUG_INFO_REDUCED can generate a ~90MB image and
building with CONFIG_DEBUG_INFO can generate a ~225M one, both exceeds the
current limit of 32MiB.

Increasing kernel size limit to 256MiB should fit for now.

Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Message-Id: <20190225170155.1972-2-muriloo@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 14:20:30 +11:00
Thomas Huth f6d4dca807 hw/ppc: Use object_initialize_child for correct reference counting
Both functions, object_initialize() and object_property_add_child() increase
the reference counter of the new object, so one of the references has to be
dropped afterwards to get the reference counting right. Otherwise the child
object will not be properly cleaned up when the parent gets destroyed.
Thus let's use now object_initialize_child() instead to get the reference
counting here right.

Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1550748288-30598-1-git-send-email-thuth@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Cédric Le Goater 3dbe65c178 ppc/xive: xive does not have a POWER7 interrupt model
Patch "target/ppc: Add POWER9 external interrupt model" should have
removed the section covering PPC_FLAGS_INPUT_POWER7.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190219142530.17807-1-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Greg Kurz 9bcb5b2941 tests/device-plug: Add PHB unplug request test for spapr
We can easily test this, just like PCI. PHB unplug is not supported
on s390x and x86 ACPI.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155059673939.1466090.14354001937819612724.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Michael Roth dae5e39ada spapr: enable PHB hotplug for default pseries machine type
The 'dr_phb_enabled' field of that class can be set as part of
machine-specific init code. It will be used to conditionally
enable creation of DRC objects and device-tree description to
facilitate hotplug of PHBs.

Since we can't migrate this state to older machine types,
default the option to true and disable it for older machine
types.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <155059673433.1466090.6188091133769611501.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Greg Kurz bb2bdd812e spapr: add hotplug hooks for PHB hotplug
Hotplugging PHBs is a machine-level operation, but PHBs reside on the
main system bus, so we register spapr machine as the handler for the
main system bus.

Provide the usual pre-plug, plug and unplug-request handlers.

Move the checking of the PHB index to the pre-plug handler. It is okay
to do that and assert in the realize function because the pre-plug
handler is always called, even for the oldest machine types we support.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
(Fixed interrupt controller phandle in "interrupt-map" and
 TCE table size in "ibm,dma-window" FDT fragment, Greg Kurz)
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155059672926.1466090.13612804072190051439.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Michael Roth f130928d2a spapr_pci: add ibm, my-drc-index property for PHB hotplug
This is needed to denote a boot-time PHB as being hot-pluggable.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155059672420.1466090.15147504040270659866.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Michael Roth 0a0a66cd1b spapr_pci: provide node start offset via spapr_populate_pci_dt()
PHB hotplug re-uses PHB device tree generation code and passes
it to a guest via RTAS. Doing this requires knowledge of where
exactly in the device tree the node describing the PHB begins.

Provide this via a new optional pointer that can be used to
store the PHB node's start offset.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155059671912.1466090.10891589403973703473.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Michael Roth 4b6d336f2c spapr_events: add support for phb hotplug events
Extend the existing EPOW event format we use for PCI
devices to emit PHB plug/unplug events.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155059671405.1466090.535964535260503283.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Nathan Fontenot 3998ccd092 spapr: populate PHB DRC entries for root DT node
This add entries to the root OF node to advertise our PHBs as being
DR-capable in accordance with PAPR specification.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155059670897.1466090.10843921337591637414.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Michael Roth 962b6c3650 spapr: create DR connectors for PHBs
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155059670389.1466090.10015601248906623076.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Greg Kurz ef28b98d58 spapr_pci: add PHB unrealize
To support PHB hotplug we need to clean up lingering references,
memory, child properties, etc. prior to the PHB object being
finalized. Generally this will be called as a result of calling
object_unparent() on the PHB object, which in turn would normally
be called as the result of an unplug() operation.

When the PHB is finalized, child objects will be unparented in
turn, and finalized if the PHB was the only reference holder. so
we don't bother to explicitly unparent child objects of the PHB,
with the notable exception of DRCs. This is needed to avoid a QEMU
crash when unplugging a PHB and resetting the machine before the
guest could handle the event. The DRCs are removed from the QOM tree
by  pci_unregister_root_bus() and we must make sure we're not leaving
stale aliases under the global /dr-connector path.

The formula that gives the number of DMA windows is moved to an
inline function in the hw/pci-host/spapr.h header because it
will have other users.

The unrealize function is able to cope with partially realized PHBs.
It is hence used to implement proper rollback on the realize error
path.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <155059669881.1466090.13515030705986041517.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Greg Kurz ad62bff638 spapr_irq: Expose the phandle of the interrupt controller
This will be used by PHB hotplug in order to create the "interrupt-map"
property of the PHB node.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155059669374.1466090.12943228478046223856.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Greg Kurz 743ed566c1 spapr: Expose the name of the interrupt controller node
This will be needed by PHB hotplug in order to access the "phandle"
property of the interrupt controller node.

Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <155059668867.1466090.6339199751719123386.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Greg Kurz 6cead90c5c xics: Write source state to KVM at claim time
The pseries machine only uses LSIs to support legacy PCI devices. Every
PHB claims 4 LSIs at realize time. When using in-kernel XICS (or upcoming
in-kernel XIVE), QEMU synchronizes the state of all irqs, including these
LSIs, later on at machine reset.

In order to support PHB hotplug, we need a way to tell KVM about the LSIs
that doesn't require a machine reset. An easy way to do that is to always
inform KVM when an interrupt is claimed, which really isn't a performance
path.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155059668360.1466090.5969630516627776426.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Greg Kurz 09d876ce2c spapr/drc: Drop spapr_drc_attach() fdt argument
All DRC subtypes have been converted to generate the FDT fragment at
configure connector time instead of attach time. The fdt and fdt_offset
arguments of spapr_drc_attach() aren't needed anymore. Drop them and
make the implementation of the dt_populate() method mandatory.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155059667853.1466090.16527852453054217565.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Greg Kurz 46fd02990d spapr/pci: Generate FDT fragment at configure connector time
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155059667346.1466090.326696113231137772.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Greg Kurz 345b12b99e spapr: Generate FDT fragment for CPUs at configure connector time
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155059666839.1466090.3833376527523126752.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Greg Kurz 62d38c9bd3 spapr: Generate FDT fragment for LMBs at configure connector time
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155059666331.1466090.6766540766297333313.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Greg Kurz d9c95c71ac spapr_drc: Allow FDT fragment to be added later
The current logic is to provide the FDT fragment when attaching a device
to a DRC. This works perfectly fine for our current hotplug support, but
soon we will add support for PHB hotplug which has some constraints, that
CPU, PCI and LMB devices don't seem to have.

The first constraint is that the "ibm,dma-window" property of the PHB
node requires the IOMMU to be configured, ie, spapr_tce_table_enable()
has been called, which happens during PHB reset. It is okay in the case
of hotplug since the device is reset before the hotplug handler is
called. On the contrary with coldplug, the hotplug handler is called
first and device is only reset during the initial system reset. Trying
to create the FDT fragment on the hotplug path in this case, would
result in somthing like this:

ibm,dma-window = < 0x80000000 0x00 0x00 0x00 0x00 >;

This will cause linux in the guest to panic, by simply removing and
re-adding the PHB using the drmgr command:

	page = alloc_pages_node(nid, GFP_KERNEL, get_order(sz));
	if (!page)
		panic("iommu_init_table: Can't allocate %ld bytes\n", sz);

The second and maybe more problematic constraint is that the
"interrupt-map" property needs to reference the interrupt controller
node using the very same phandle that SLOF has already exposed to the
guest. QEMU requires SLOF to call the private KVMPPC_H_UPDATE_DT hcall
at some point to know about this phandle. With the latest QEMU and SLOF,
this happens when SLOF gets quiesced. This means that if the PHB gets
hotplugged after CAS but before SLOF quiesce, then we're sure that the
phandle is not known when the hotplug handler is called.

The FDT is only needed when the guest first invokes RTAS to configure
the connector actually, long after SLOF quiesce. Let's postpone the
creation of FDT fragments for PHBs to rtas_ibm_configure_connector().

Since we only need this for PHBs, introduce a new method in the base
DRC class for that. DRC subtypes will be converted to use it in
subsequent patches.

Allow spapr_drc_attach() to be passed a NULL fdt argument if the method
is available. When all DRC subtypes have been converted, the fdt argument
will eventually disappear.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155059665823.1466090.18358845122627355537.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Benjamin Herrenschmidt 539c6e7358 target/ppc: Basic POWER9 bare-metal radix MMU support
No guest support yet

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190215170029.15641-13-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Benjamin Herrenschmidt 3367c62f52 target/ppc: Support for POWER9 native hash
(Might need more patch splitting)

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190215170029.15641-12-clg@kaod.org>
[dwg: Hack to fix compile with some earlier include tweaks of mine]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Benjamin Herrenschmidt 79825f4d58 target/ppc: Rename PATB/PATBE -> PATE
That "b" means "base address" and thus shouldn't be in the name
of actual entries and related constants.

This patch keeps the synthetic patb_entry field of the spapr
virtual hypervisor unchanged until I figure out if that has
an impact on the migration stream.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190215170029.15641-11-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Benjamin Herrenschmidt c4dae9cd37 target/ppc: Flush the TLB locally when the LPIDR is written
Our TCG TLB only tags whether it's a HV vs a guest access, so it must
be flushed when the LPIDR is changed.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190215170029.15641-10-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Benjamin Herrenschmidt 74c4912f09 target/ppc: Fix synchronization of mttcg with broadcast TLB flushes
Let's use the generic helper tlb_flush_all_cpus_synced() instead
of iterating the CPUs ourselves.

We do lose the optimization of clearing the "other" CPUs "need flush"
flags but this shouldn't be a problem in practice.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190215170029.15641-9-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Benjamin Herrenschmidt 34525595fb target/ppc: Add basic support for "new format" HPTE as found on POWER9
POWER9 (arch v3) slightly changes the HPTE format. The B bits move
from the first to the second half of the HPTE, and the AVPN/ARPN
are slightly shorter.

However, under SPAPR, the hypercalls still take the old format
(and probably will for the foreseable future).

The simplest way to support this is thus to convert the HPTEs from
new to old format when reading them if the MMU model is v3 and there
is no virtual hypervisor, leaving the rest of the code unchanged.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190215170029.15641-8-clg@kaod.org>
[dwg: Moved function to .c since there was no real need for it in the .h]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Benjamin Herrenschmidt 3054b0ca4b target/ppc: Fix ordering of hash MMU accesses
With mttcg, we can have MMU lookups happening at the same time
as the guest modifying the page tables.

Since the HPTEs of the hash table MMU contains two words (or
double worlds on 64-bit), we need to make sure we read them
in the right order, with the correct memory barrier.

Additionally, when using emulated SPAPR mode, the hypercalls
writing to the hash table must also perform the udpates in
the right order.

Note: This part is still not entirely correct

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190215170029.15641-7-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Benjamin Herrenschmidt 2819282dae target/ppc: Fix #include guard in mmu-book3s-v3.h
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190215170029.15641-5-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Benjamin Herrenschmidt 2b9e0a6b94 target/ppc: Re-enable RMLS on POWER9 for virtual hypervisors
Historically the 64-bit server MMU supports two way of configuring the
guest "real mode" mapping:

 - The "RMA" with is a single chunk of physically contiguous
memory remapped as guest real, and controlled by the RMLS
field in the LPCR register and the RMOR register.

 - The "VRMA" which uses special PTEs inserted in the partition
hash table by the hypervisor.

POWER9 deprecates the former, which is reflected by the filtering
done in ppc_store_lpcr() which effectively prevents setting of
the RMLS field.

However, when using fully emulated SPAPR machines, our qemu code
currently only knows how to define the guest real mode memory using
RMLS.

Thus you cannot run a SPAPR machine anymore with a POWER9 CPU
model today.

This works around it with a quirk in ppc_store_lpcr() to continue
allowing the RMLS field to be set when using a virtual hypervisor.

Ultimately we will want to implement configuring a VRMA instead
which will also be necessary if we want to migrate a SPAPR guest
between TCG and KVM but this is a lot more work.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190215170029.15641-4-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Benjamin Herrenschmidt 38c784a1cc target/ppc/mmu: Use LPCR:HR to chose radix vs. hash translation
Now that LPCR:HR is set properly for SPAPR, use it for deciding
the translation type, which also works for bare metal

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190215170029.15641-3-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Benjamin Herrenschmidt 00fd075e18 target/ppc/spapr: Set LPCR:HR when using Radix mode
The HW relies on LPCR:HR along with the PATE to determine whether
to use Radix or Hash mode. In fact it uses LPCR:HR more commonly
than the PATE.

For us, it's also more efficient to do so, especially since unlike
the HW we do not maintain a cache of the current PATE and HV PATE
in a generic place.

Prepare the grounds for that by ensuring that LPCR:HR is set
properly on SPAPR machines.

Another option would have been to use a callback to get the PATE
but this gets messy when implementing bare metal support, it's
much simpler (and faster) to use LPCR.

Since existing migration streams may not have it, fix it up in
spapr_post_load() as well based on the pseudo-PATE entry that
we keep.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190215170029.15641-2-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00