Commit Graph

188 Commits

Author SHA1 Message Date
Eric Auger 0d570a2572 vfio/pci: Use vbasedev local variable in vfio_realize()
Using a VFIODevice handle local variable to improve the code readability.

no functional change intended

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Link: https://lore.kernel.org/r/20220502094223.36384-3-yi.l.liu@intel.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-06 09:06:50 -06:00
Eric Auger 9d38ffc5d8 hw/vfio/pci: fix vfio_pci_hot_reset_result trace point
"%m" format specifier is not interpreted by the trace infrastructure
and thus "%m" is output instead of the actual errno string. Fix it by
outputting strerror(errno).

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Link: https://lore.kernel.org/r/20220502094223.36384-2-yi.l.liu@intel.com
[aw: replace commit log as provided by Eric]
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-06 09:06:50 -06:00
Longpeng(Mike) dc580d51f7 vfio: defer to commit kvm irq routing when enable msi/msix
In migration resume phase, all unmasked msix vectors need to be
setup when loading the VF state. However, the setup operation would
take longer if the VM has more VFs and each VF has more unmasked
vectors.

The hot spot is kvm_irqchip_commit_routes, it'll scan and update
all irqfds that are already assigned each invocation, so more
vectors means need more time to process them.

vfio_pci_load_config
  vfio_msix_enable
    msix_set_vector_notifiers
      for (vector = 0; vector < dev->msix_entries_nr; vector++) {
        vfio_msix_vector_do_use
          vfio_add_kvm_msi_virq
            kvm_irqchip_commit_routes <-- expensive
      }

We can reduce the cost by only committing once outside the loop.
The routes are cached in kvm_state, we commit them first and then
bind irqfd for each vector.

The test VM has 128 vcpus and 8 VF (each one has 65 vectors),
we measure the cost of the vfio_msix_enable for each VF, and
we can see 90+% costs can be reduce.

VF      Count of irqfds[*]  Original        With this patch

1st           65            8               2
2nd           130           15              2
3rd           195           22              2
4th           260           24              3
5th           325           36              2
6th           390           44              3
7th           455           51              3
8th           520           58              4
Total                       258ms           21ms

[*] Count of irqfds
How many irqfds that already assigned and need to process in this
round.

The optimization can be applied to msi type too.

Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
Link: https://lore.kernel.org/r/20220326060226.1892-6-longpeng2@huawei.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-06 09:06:50 -06:00
Longpeng(Mike) 75d546fc18 Revert "vfio: Avoid disabling and enabling vectors repeatedly in VFIO migration"
Commit ecebe53fe9 ("vfio: Avoid disabling and enabling vectors
repeatedly in VFIO migration") avoids inefficiently disabling and
enabling vectors repeatedly and lets the unmasked vectors be enabled
one by one.

But we want to batch multiple routes and defer the commit, and only
commit once outside the loop of setting vector notifiers, so we
cannot enable the vectors one by one in the loop now.

Revert that commit and we will take another way in the next patch,
it can not only avoid disabling/enabling vectors repeatedly, but
also satisfy our requirement of defer to commit.

Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
Link: https://lore.kernel.org/r/20220326060226.1892-5-longpeng2@huawei.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-06 09:06:50 -06:00
Longpeng(Mike) 8ab217d5d3 vfio: simplify the failure path in vfio_msi_enable
Use vfio_msi_disable_common to simplify the error handling
in vfio_msi_enable.

Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
Link: https://lore.kernel.org/r/20220326060226.1892-4-longpeng2@huawei.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-06 09:06:50 -06:00
Longpeng(Mike) be4a46eccf vfio: move re-enabling INTX out of the common helper
Move re-enabling INTX out, and the callers should decide to
re-enable it or not.

Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
Link: https://lore.kernel.org/r/20220326060226.1892-3-longpeng2@huawei.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-06 09:06:50 -06:00
Longpeng(Mike) a6f5770fb2 vfio: simplify the conditional statements in vfio_msi_enable
It's unnecessary to test against the specific return value of
VFIO_DEVICE_SET_IRQS, since any positive return is an error
indicating the number of vectors we should retry with.

Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
Link: https://lore.kernel.org/r/20220326060226.1892-2-longpeng2@huawei.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-06 09:06:50 -06:00
Marc-André Lureau 8e3b0cbb72 Replace qemu_real_host_page variables with inlined functions
Replace the global variables with inlined helper functions. getpagesize() is very
likely annotated with a "const" function attribute (at least with glibc), and thus
optimization should apply even better.

This avoids the need for a constructor initialization too.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20220323155743.1585078-12-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-06 10:50:38 +02:00
Markus Armbruster b21e238037 Use g_new() & friends where that makes obvious sense
g_new(T, n) is neater than g_malloc(sizeof(T) * n).  It's also safer,
for two reasons.  One, it catches multiplication overflowing size_t.
Two, it returns T * rather than void *, which lets the compiler catch
more type errors.

This commit only touches allocations with size arguments of the form
sizeof(T).

Patch created mechanically with:

    $ spatch --in-place --sp-file scripts/coccinelle/use-g_new-etc.cocci \
	     --macro-file scripts/cocci-macro-file.h FILES...

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20220315144156.1595462-4-armbru@redhat.com>
Reviewed-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
2022-03-21 15:44:44 +01:00
Longpeng(Mike) def4c5570c kvm/msi: do explicit commit when adding msi routes
We invoke the kvm_irqchip_commit_routes() for each addition to MSI route
table, which is not efficient if we are adding lots of routes in some cases.

This patch lets callers invoke the kvm_irqchip_commit_routes(), so the
callers can decide how to optimize.

[1] https://lists.gnu.org/archive/html/qemu-devel/2021-11/msg00967.html

Signed-off-by: Longpeng <longpeng2@huawei.com>
Message-Id: <20220222141116.2091-3-longpeng2@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-15 11:26:20 +01:00
Kunkun Jiang f36d4fb85f vfio/pci: Add support for mmapping sub-page MMIO BARs after live migration
We can expand MemoryRegions of sub-page MMIO BARs in
vfio_pci_write_config() to improve IO performance for some
devices. However, the MemoryRegions of destination VM are
not expanded any more after live migration. Because their
addresses have been updated in vmstate_load_state()
(vfio_pci_load_config) and vfio_sub_page_bar_update_mapping()
will not be called.

This may result in poor performance after live migration.
So iterate BARs in vfio_pci_load_config() and try to update
sub-page BARs.

Reported-by: Nianyao Tang <tangnianyao@huawei.com>
Reported-by: Qixin Gan <ganqixin@huawei.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
Link: https://lore.kernel.org/r/20211027090406.761-2-jiangkunkun@huawei.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-11-01 12:17:51 -06:00
Kevin Wolf f3558b1b76 qdev: Base object creation on QDict rather than QemuOpts
QDicts are both what QMP natively uses and what the keyval parser
produces. Going through QemuOpts isn't useful for either one, so switch
the main device creation function to QDicts. By sharing more code with
the -object/object-add code path, we can even reduce the code size a
bit.

This commit doesn't remove the detour through QemuOpts from any code
path yet, but it allows the following commits to do so.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20211008133442.141332-15-kwolf@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-10-15 16:11:22 +02:00
Cai Huoqing 631ba5a128 hw/vfio: Fix typo in comments
Fix typo in comments:
*programatically  ==> programmatically
*disconecting  ==> disconnecting
*mulitple  ==> multiple
*timout  ==> timeout
*regsiter  ==> register
*forumula  ==> formula

Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210730012613.2198-1-caihuoqing@baidu.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2021-09-16 11:57:01 +02:00
Cai Huoqing 1bd9f1b14d vfio/pci: Add pba_offset PCI quirk for BAIDU KUNLUN AI processor
Fix pba_offset initialization value for BAIDU KUNLUN Virtual
Function device. The KUNLUN hardware returns an incorrect
value for the VF PBA offset, and add a quirk to instead
return a hardcoded value of 0xb400.

Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Link: https://lore.kernel.org/r/20210713093743.942-1-caihuoqing@baidu.com
[aw: comment & whitespace tuning]
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-07-14 13:47:17 -06:00
Cai Huoqing 936555bc4f vfio/pci: Change to use vfio_pci_is()
Make use of vfio_pci_is() helper function.

Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Link: https://lore.kernel.org/r/20210713014831.742-1-caihuoqing@baidu.com
[aw: commit log wording]
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-07-14 13:47:17 -06:00
Thomas Huth 4c386f8064 Do not include sysemu/sysemu.h if it's not really necessary
Stop including sysemu/sysemu.h in files that don't need it.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20210416171314.2074665-2-thuth@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2021-05-02 17:24:50 +02:00
Shenming Lu ecebe53fe9 vfio: Avoid disabling and enabling vectors repeatedly in VFIO migration
In VFIO migration resume phase and some guest startups, there are
already unmasked vectors in the vector table when calling
vfio_msix_enable(). So in order to avoid inefficiently disabling
and enabling vectors repeatedly, let's allocate all needed vectors
first and then enable these unmasked vectors one by one without
disabling.

Signed-off-by: Shenming Lu <lushenming@huawei.com>
Message-Id: <20210310030233.1133-4-lushenming@huawei.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-03-16 10:06:44 -06:00
Philippe Mathieu-Daudé 4eda914cac hw/vfio/pci-quirks: Replace the word 'blacklist'
Follow the inclusive terminology from the "Conscious Language in your
Open Source Projects" guidelines [*] and replace the word "blacklist"
appropriately.

[*] https://github.com/conscious-lang/conscious-lang-docs/blob/main/faq.md

Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210205171817.2108907-9-philmd@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-03-16 10:06:44 -06:00
Eduardo Habkost ce35e2295e qdev: Move softmmu properties to qdev-properties-system.h
Move the property types and property macros implemented in
qdev-properties-system.c to a new qdev-properties-system.h
header.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20201211220529.2290218-16-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2020-12-18 15:20:17 -05:00
Kirti Wankhede bb0990d174 vfio: Change default dirty pages tracking behavior during migration
By default dirty pages tracking is enabled during iterative phase
(pre-copy phase).
Added per device opt-out option 'x-pre-copy-dirty-page-tracking' to
disable dirty pages tracking during iterative phase. If the option
'x-pre-copy-dirty-page-tracking=off' is set for any VFIO device, dirty
pages tracking during iterative phase will be disabled.

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-23 10:05:58 -07:00
Alex Williamson cf254988a5 vfio: Make migration support experimental
Support for migration of vfio devices is still in flux.  Developers
are attempting to add support for new devices and new architectures,
but none are yet readily available for validation.  We have concerns
whether we're transferring device resources at the right point in the
migration, whether we're guaranteeing that updates during pre-copy are
migrated, and whether we can provide bit-stream compatibility should
any of this change.  Even the question of whether devices should
participate in dirty page tracking during pre-copy seems contentious.
In short, migration support has not had enough soak time and it feels
premature to mark it as supported.

Create an experimental option such that we can continue to develop.

[Retaining previous acks/reviews for a previously identical code
 change with different specifics in the commit log.]

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-23 08:29:29 -07:00
Kirti Wankhede a22651053b vfio: Make vfio-pci device migration capable
If the device is not a failover primary device, call
vfio_migration_probe() and vfio_migration_finalize() to enable
migration support for those devices that support it respectively to
tear it down again.
Removed migration blocker from VFIO PCI device specific structure and use
migration blocker from generic structure of  VFIO device.

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Neo Jia <cjia@nvidia.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01 12:30:51 -07:00
Kirti Wankhede c5e2fb3ce4 vfio: Add save and load functions for VFIO PCI devices
Added functions to save and restore PCI device specific data,
specifically config space of PCI device.

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Neo Jia <cjia@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01 12:30:50 -07:00
Kirti Wankhede e93b733bcf vfio: Add vfio_get_object callback to VFIODeviceOps
Hook vfio_get_object callback for PCI devices.

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Neo Jia <cjia@nvidia.com>
Suggested-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01 12:30:50 -07:00
Eduardo Habkost 01b4606440 vfio: Rename PCI_VFIO to VFIO_PCI
Make the type checking macro name consistent with the TYPE_*
constant.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20200902224311.1321159-56-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2020-09-09 13:20:22 -04:00
Eduardo Habkost 42db0fb5e0 vfio/pci: Move QOM macros to header
This will make future conversion to OBJECT_DECLARE* easier.

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Tested-By: Roman Bolshakov <r.bolshakov@yadro.com>
Message-Id: <20200825192110.3528606-43-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2020-08-27 14:04:55 -04:00
Markus Armbruster af175e85f9 error: Eliminate error_propagate() with Coccinelle, part 2
When all we do with an Error we receive into a local variable is
propagating to somewhere else, we can just as well receive it there
right away.  The previous commit did that with a Coccinelle script I
consider fairly trustworthy.  This commit uses the same script with
the matching of return taken out, i.e. we convert

    if (!foo(..., &err)) {
        ...
        error_propagate(errp, err);
        ...
    }

to

    if (!foo(..., errp)) {
        ...
        ...
    }

This is unsound: @err could still be read between afterwards.  I don't
know how to express "no read of @err without an intervening write" in
Coccinelle.  Instead, I manually double-checked for uses of @err.

Suboptimal line breaks tweaked manually.  qdev_realize() simplified
further to placate scripts/checkpatch.pl.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200707160613.848843-36-armbru@redhat.com>
2020-07-10 15:18:08 +02:00
Markus Armbruster 668f62ec62 error: Eliminate error_propagate() with Coccinelle, part 1
When all we do with an Error we receive into a local variable is
propagating to somewhere else, we can just as well receive it there
right away.  Convert

    if (!foo(..., &err)) {
        ...
        error_propagate(errp, err);
        ...
        return ...
    }

to

    if (!foo(..., errp)) {
        ...
        ...
        return ...
    }

where nothing else needs @err.  Coccinelle script:

    @rule1 forall@
    identifier fun, err, errp, lbl;
    expression list args, args2;
    binary operator op;
    constant c1, c2;
    symbol false;
    @@
         if (
    (
    -        fun(args, &err, args2)
    +        fun(args, errp, args2)
    |
    -        !fun(args, &err, args2)
    +        !fun(args, errp, args2)
    |
    -        fun(args, &err, args2) op c1
    +        fun(args, errp, args2) op c1
    )
            )
         {
             ... when != err
                 when != lbl:
                 when strict
    -        error_propagate(errp, err);
             ... when != err
    (
             return;
    |
             return c2;
    |
             return false;
    )
         }

    @rule2 forall@
    identifier fun, err, errp, lbl;
    expression list args, args2;
    expression var;
    binary operator op;
    constant c1, c2;
    symbol false;
    @@
    -    var = fun(args, &err, args2);
    +    var = fun(args, errp, args2);
         ... when != err
         if (
    (
             var
    |
             !var
    |
             var op c1
    )
            )
         {
             ... when != err
                 when != lbl:
                 when strict
    -        error_propagate(errp, err);
             ... when != err
    (
             return;
    |
             return c2;
    |
             return false;
    |
             return var;
    )
         }

    @depends on rule1 || rule2@
    identifier err;
    @@
    -    Error *err = NULL;
         ... when != err

Not exactly elegant, I'm afraid.

The "when != lbl:" is necessary to avoid transforming

         if (fun(args, &err)) {
             goto out
         }
         ...
     out:
         error_propagate(errp, err);

even though other paths to label out still need the error_propagate().
For an actual example, see sclp_realize().

Without the "when strict", Coccinelle transforms vfio_msix_setup(),
incorrectly.  I don't know what exactly "when strict" does, only that
it helps here.

The match of return is narrower than what I want, but I can't figure
out how to express "return where the operand doesn't use @err".  For
an example where it's too narrow, see vfio_intx_enable().

Silently fails to convert hw/arm/armsse.c, because Coccinelle gets
confused by ARMSSE being used both as typedef and function-like macro
there.  Converted manually.

Line breaks tidied up manually.  One nested declaration of @local_err
deleted manually.  Preexisting unwanted blank line dropped in
hw/riscv/sifive_e.c.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200707160613.848843-35-armbru@redhat.com>
2020-07-10 15:18:08 +02:00
David Hildenbrand aff92b8286 vfio: Convert to ram_block_discard_disable()
VFIO is (except devices without a physical IOMMU or some mediated devices)
incompatible with discarding of RAM. The kernel will pin basically all VM
memory. Let's convert to ram_block_discard_disable(), which can now
fail, in contrast to qemu_balloon_inhibit().

Leave "x-balloon-allowed" named as it is for now.

Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Tony Krowiak <akrowiak@linux.ibm.com>
Cc: Halil Pasic <pasic@linux.ibm.com>
Cc: Pierre Morel <pmorel@linux.ibm.com>
Cc: Eric Farman <farman@linux.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20200626072248.78761-4-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-07-02 05:54:59 -04:00
Peter Xu 97a3757616 vfio/pci: Use kvm_irqchip_add_irqfd_notifier_gsi() for irqfds
VFIO is currently the only one left that is not using the generic
function (kvm_irqchip_add_irqfd_notifier_gsi()) to register irqfds.
Let VFIO use the common framework too.

Follow up patches will introduce extra features for kvm irqfd, so that
VFIO can easily leverage that after the switch.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20200318145204.74483-3-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-10 12:10:28 -04:00
Markus Armbruster 40c2281cc3 Drop more @errp parameters after previous commit
Several functions can't fail anymore: ich9_pm_add_properties(),
device_add_bootindex_property(), ppc_compat_add_property(),
spapr_caps_add_properties(), PropertyInfo.create().  Drop their @errp
parameter.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200505152926.18877-16-armbru@redhat.com>
2020-05-15 07:08:14 +02:00
Marc-André Lureau 4f67d30b5e qdev: set properties with device_class_set_props()
The following patch will need to handle properties registration during
class_init time. Let's use a device_class_set_props() setter.

spatch --macro-file scripts/cocci-macro-file.h  --sp-file
./scripts/coccinelle/qdev-set-props.cocci --keep-comments --in-place
--dir .

@@
typedef DeviceClass;
DeviceClass *d;
expression val;
@@
- d->props = val
+ device_class_set_props(d, val)

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20200110153039.1379601-20-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24 20:59:15 +01:00
Peter Xu 0446f81217 vfio/pci: Don't remove irqchip notifier if not registered
The kvm irqchip notifier is only registered if the device supports
INTx, however it's unconditionally removed.  If the assigned device
does not support INTx, this will cause QEMU to crash when unplugging
the device from the system.  Change it to conditionally remove the
notifier only if the notify hook is setup.

CC: Eduardo Habkost <ehabkost@redhat.com>
CC: David Gibson <david@gibson.dropbear.id.au>
CC: Alex Williamson <alex.williamson@redhat.com>
Cc: qemu-stable@nongnu.org # v4.2
Reported-by: yanghliu@redhat.com
Debugged-by: Eduardo Habkost <ehabkost@redhat.com>
Fixes: c5478fea27 ("vfio/pci: Respond to KVM irqchip change notifier")
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1782678
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-01-06 14:19:42 -07:00
David Gibson c5478fea27 vfio/pci: Respond to KVM irqchip change notifier
VFIO PCI devices already respond to the pci intx routing notifier, in order
to update kernel irqchip mappings when routing is updated.  However this
won't handle the case where the irqchip itself is replaced by a different
model while retaining the same routing.  This case can happen on
the pseries machine type due to PAPR feature negotiation.

To handle that case, add a handler for the irqchip change notifier, which
does much the same thing as the routing notifier, but is unconditional,
rather than being a no-op when the routing hasn't changed.

Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
2019-11-26 10:11:30 +11:00
David Gibson ad54dbd89d vfio/pci: Split vfio_intx_update()
This splits the vfio_intx_update() function into one part doing the actual
reconnection with the KVM irqchip (vfio_intx_update(), now taking an
argument with the new routing) and vfio_intx_routing_notifier() which
handles calls to the pci device intx routing notifier and calling
vfio_intx_update() when necessary.  This will make adding support for the
irqchip change notifier easier.

Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
2019-11-26 10:11:30 +11:00
Jens Freimann ed92369a57 vfio: don't ignore return value of migrate_add_blocker
When an error occurs in migrate_add_blocker() it sets a
negative return value and uses error pointer we pass in.
Instead of just looking at the error pointer check for a negative return
value and avoid a coverity error because the return value is
set but never used. This fixes CID 1407219.

Reported-by: Coverity (CID 1407219)
Fixes: f045a0104c ("vfio: unplug failover primary device before migration")
Signed-off-by: Jens Freimann <jfreimann@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-11-18 10:41:48 -07:00
Michal Privoznik 1335d64323 hw/vfio/pci: Fix double free of migration_blocker
When user tries to hotplug a VFIO device, but the operation fails
somewhere in the middle (in my testing it failed because of
RLIMIT_MEMLOCK forbidding more memory allocation), then a double
free occurs. In vfio_realize() the vdev->migration_blocker is
allocated, then something goes wrong which causes control to jump
onto 'error' label where the error is freed. But the pointer is
left pointing to invalid memory. Later, when
vfio_instance_finalize() is called, the memory is freed again.

In my testing the second hunk was sufficient to fix the bug, but
I figured the first hunk doesn't hurt either.

==169952== Invalid read of size 8
==169952==    at 0xA47DCD: error_free (error.c:266)
==169952==    by 0x4E0A18: vfio_instance_finalize (pci.c:3040)
==169952==    by 0x8DF74C: object_deinit (object.c:606)
==169952==    by 0x8DF7BE: object_finalize (object.c:620)
==169952==    by 0x8E0757: object_unref (object.c:1074)
==169952==    by 0x45079C: memory_region_unref (memory.c:1779)
==169952==    by 0x45376B: do_address_space_destroy (memory.c:2793)
==169952==    by 0xA5C600: call_rcu_thread (rcu.c:283)
==169952==    by 0xA427CB: qemu_thread_start (qemu-thread-posix.c:519)
==169952==    by 0x80A8457: start_thread (in /lib64/libpthread-2.29.so)
==169952==    by 0x81C96EE: clone (in /lib64/libc-2.29.so)
==169952==  Address 0x143137e0 is 0 bytes inside a block of size 48 free'd
==169952==    at 0x4A342BB: free (vg_replace_malloc.c:530)
==169952==    by 0xA47E05: error_free (error.c:270)
==169952==    by 0x4E0945: vfio_realize (pci.c:3025)
==169952==    by 0x76A4FF: pci_qdev_realize (pci.c:2099)
==169952==    by 0x689B9A: device_set_realized (qdev.c:876)
==169952==    by 0x8E2C80: property_set_bool (object.c:2080)
==169952==    by 0x8E0EF6: object_property_set (object.c:1272)
==169952==    by 0x8E3FC8: object_property_set_qobject (qom-qobject.c:26)
==169952==    by 0x8E11DB: object_property_set_bool (object.c:1338)
==169952==    by 0x5E7BDD: qdev_device_add (qdev-monitor.c:673)
==169952==    by 0x5E81E5: qmp_device_add (qdev-monitor.c:798)
==169952==    by 0x9E18A8: do_qmp_dispatch (qmp-dispatch.c:132)
==169952==  Block was alloc'd at
==169952==    at 0x4A35476: calloc (vg_replace_malloc.c:752)
==169952==    by 0x51B1158: g_malloc0 (in /usr/lib64/libglib-2.0.so.0.6000.6)
==169952==    by 0xA47357: error_setv (error.c:61)
==169952==    by 0xA475D9: error_setg_internal (error.c:97)
==169952==    by 0x4DF8C2: vfio_realize (pci.c:2737)
==169952==    by 0x76A4FF: pci_qdev_realize (pci.c:2099)
==169952==    by 0x689B9A: device_set_realized (qdev.c:876)
==169952==    by 0x8E2C80: property_set_bool (object.c:2080)
==169952==    by 0x8E0EF6: object_property_set (object.c:1272)
==169952==    by 0x8E3FC8: object_property_set_qobject (qom-qobject.c:26)
==169952==    by 0x8E11DB: object_property_set_bool (object.c:1338)
==169952==    by 0x5E7BDD: qdev_device_add (qdev-monitor.c:673)

Fixes: f045a0104c ("vfio: unplug failover primary device before migration")
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-11-18 10:41:47 -07:00
Jens Freimann f045a0104c vfio: unplug failover primary device before migration
As usual block all vfio-pci devices from being migrated, but make an
exception for failover primary devices. This is achieved by setting
unmigratable to 0 but also add a migration blocker for all vfio-pci
devices except failover primary devices. These will be unplugged before
migration happens by the migration handler of the corresponding
virtio-net standby device.

Signed-off-by: Jens Freimann <jfreimann@redhat.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Message-Id: <20191029114905.6856-12-jfreimann@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-10-29 18:55:26 -04:00
Evgeny Yakovlev d964d3b5ab hw/vfio/pci: fix double free in vfio_msi_disable
The following guest behaviour patter leads to double free in VFIO PCI:

1. Guest enables MSI interrupts
vfio_msi_enable is called, but fails in vfio_enable_vectors.
In our case this was because VFIO GPU device was in D3 state.
Unhappy path in vfio_msi_enable will g_free(vdev->msi_vectors) but not
set this pointer to NULL

2. Guest still sees MSI an enabled after that because emulated config
write is done in vfio_pci_write_config unconditionally before calling
vfio_msi_enable

3. Guest disables MSI interrupts
vfio_msi_disable is called and tries to g_free(vdev->msi_vectors)
in vfio_msi_disable_common => double free

Signed-off-by: Evgeny Yakovlev <wrfsh@yandex-team.ru>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-10-10 11:07:28 -06:00
Chen Zhang f75ca62723 vfio: fix a typo
Signed-off-by: Chen Zhang <tgfbeta@me.com>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <8E5A9C27-C76D-46CF-85B0-79121A00B05F@me.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2019-09-19 11:50:37 +02:00
Markus Armbruster 54d31236b9 sysemu: Split sysemu/runstate.h off sysemu/sysemu.h
sysemu/sysemu.h is a rather unfocused dumping ground for stuff related
to the system-emulator.  Evidence:

* It's included widely: in my "build everything" tree, changing
  sysemu/sysemu.h still triggers a recompile of some 1100 out of 6600
  objects (not counting tests and objects that don't depend on
  qemu/osdep.h, down from 5400 due to the previous two commits).

* It pulls in more than a dozen additional headers.

Split stuff related to run state management into its own header
sysemu/runstate.h.

Touching sysemu/sysemu.h now recompiles some 850 objects.  qemu/uuid.h
also drops from 1100 to 850, and qapi/qapi-types-run-state.h from 4400
to 4200.  Touching new sysemu/runstate.h recompiles some 500 objects.

Since I'm touching MAINTAINERS to add sysemu/runstate.h anyway, also
add qemu/main-loop.h.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190812052359.30071-30-armbru@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
[Unbreak OS-X build]
2019-08-16 13:37:36 +02:00
Markus Armbruster a27bd6c779 Include hw/qdev-properties.h less
In my "build everything" tree, changing hw/qdev-properties.h triggers
a recompile of some 2700 out of 6600 objects (not counting tests and
objects that don't depend on qemu/osdep.h).

Many places including hw/qdev-properties.h (directly or via hw/qdev.h)
actually need only hw/qdev-core.h.  Include hw/qdev-core.h there
instead.

hw/qdev.h is actually pointless: all it does is include hw/qdev-core.h
and hw/qdev-properties.h, which in turn includes hw/qdev-core.h.
Replace the remaining uses of hw/qdev.h by hw/qdev-properties.h.

While there, delete a few superfluous inclusions of hw/qdev-core.h.

Touching hw/qdev-properties.h now recompiles some 1200 objects.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Daniel P. Berrangé" <berrange@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20190812052359.30071-22-armbru@redhat.com>
2019-08-16 13:31:53 +02:00
Markus Armbruster db72581598 Include qemu/main-loop.h less
In my "build everything" tree, changing qemu/main-loop.h triggers a
recompile of some 5600 out of 6600 objects (not counting tests and
objects that don't depend on qemu/osdep.h).  It includes block/aio.h,
which in turn includes qemu/event_notifier.h, qemu/notify.h,
qemu/processor.h, qemu/qsp.h, qemu/queue.h, qemu/thread-posix.h,
qemu/thread.h, qemu/timer.h, and a few more.

Include qemu/main-loop.h only where it's needed.  Touching it now
recompiles only some 1700 objects.  For block/aio.h and
qemu/event_notifier.h, these numbers drop from 5600 to 2800.  For the
others, they shrink only slightly.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190812052359.30071-21-armbru@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2019-08-16 13:31:52 +02:00
Markus Armbruster 650d103d3e Include hw/hw.h exactly where needed
In my "build everything" tree, changing hw/hw.h triggers a recompile
of some 2600 out of 6600 objects (not counting tests and objects that
don't depend on qemu/osdep.h).

The previous commits have left only the declaration of hw_error() in
hw/hw.h.  This permits dropping most of its inclusions.  Touching it
now recompiles less than 200 objects.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20190812052359.30071-19-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2019-08-16 13:31:52 +02:00
Markus Armbruster d645427057 Include migration/vmstate.h less
In my "build everything" tree, changing migration/vmstate.h triggers a
recompile of some 2700 out of 6600 objects (not counting tests and
objects that don't depend on qemu/osdep.h).

hw/hw.h supposedly includes it for convenience.  Several other headers
include it just to get VMStateDescription.  The previous commit made
that unnecessary.

Include migration/vmstate.h only where it's still needed.  Touching it
now recompiles only some 1600 objects.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20190812052359.30071-16-armbru@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2019-08-16 13:31:52 +02:00
Eric Auger 5053bd7811 vfio/pci: Trace vfio_set_irq_signaling() failure in vfio_msix_vector_release()
Report an error in case we fail to set a trigger action
on any VFIO_PCI_MSIX_IRQ_INDEX subindex. This might be
useful in debugging a device that is not working properly.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reported-by: Coverity (CID 1402196)
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-07-02 13:16:29 -06:00
Eric Auger 201a733145 vfio/common: Introduce vfio_set_irq_signaling helper
The code used to assign an interrupt index/subindex to an
eventfd is duplicated many times. Let's introduce an helper that
allows to set/unset the signaling for an ACTION_TRIGGER,
ACTION_MASK or ACTION_UNMASK action.

In the error message, we now use errno in case of any
VFIO_DEVICE_SET_IRQS ioctl failure.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-06-13 09:57:37 -06:00
Alex Williamson c60807dea5 vfio/pci: Allow MSI-X relocation to fixup bogus PBA
The MSI-X relocation code can sometimes be used to work around bogus
MSI-X capabilities, but this test for whether the PBA is outside of
the specified BAR causes the device to error before we can apply a
relocation.  Let it proceed if we intend to relocate MSI-X anyway.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-06-13 09:57:36 -06:00
Alex Williamson 3412d8ec98 vfio/pci: Hide Resizable BAR capability
The resizable BAR capability is currently exposed read-only from the
kernel and we don't yet implement a protocol for virtualizing it to
the VM.  Exposing it to the guest read-only introduces poor behavior
as the guest has no reason to test that a control register write is
accepted by the hardware.  This can lead to cases where the guest OS
assumes the BAR has been resized, but it hasn't.  This has been
observed when assigning AMD Vega GPUs.

Note, this does not preclude future enablement of resizable BARs, but
it's currently incorrect to expose this capability as read-only, so
better to not expose it at all.

Reported-by: James Courtier-Dutton <james.dutton@gmail.com>
Tested-by: James Courtier-Dutton <james.dutton@gmail.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-06-13 09:57:36 -06:00
Markus Armbruster 0b8fa32f55 Include qemu/module.h where needed, drop it from qemu-common.h
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190523143508.25387-4-armbru@redhat.com>
[Rebased with conflicts resolved automatically, except for
hw/usb/dev-hub.c hw/misc/exynos4210_rng.c hw/misc/bcm2835_rng.c
hw/misc/aspeed_scu.c hw/display/virtio-vga.c hw/arm/stm32f205_soc.c;
ui/cocoa.m fixed up]
2019-06-12 13:18:33 +02:00