Currently the virtio balloon device, when using the virtio-pci interface
advertises itself with PCI class code MEMORY_RAM. This is wrong; the
balloon is vaguely related to memory, but is nothing like a PCI memory
device in the meaning of the class code, and this code is not required
or suggested by the virtio PCI specification.
Worse, this patch causes problems on the pseries machine, because the
firmware, seeing this class code, advertises the device as memory in the
device tree, and then a guest kernel bug causes it to see this "memory"
before the real system memory, leading to a crash in early boot.
This patch fixes the problem by removing the bogus PCI class code on the
balloon device. The backwards compatibility PC machines get new compat
properties so that they don't change.
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Replace device_init() with generalized type_init().
While at it, unify naming convention: type_init([$prefix_]register_types)
Also, type_init() is a function, so add preceding blank line where
necessary and don't put a semicolon after the closing brace.
Signed-off-by: Andreas Färber <afaerber@suse.de>
Cc: Anthony Liguori <anthony@codemonkey.ws>
Cc: malc <av1474@comtv.ru>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Limit them to the device_add functionality. Device aliases were a hack based
on the fact that virtio was modeled the wrong way. The mechanism for aliasing
is very limited in that only one alias can exist for any device.
We have to support it for the purposes of compatibility but we only need to
support it in device_add so restrict it to that piece of code.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
---
v1 -> v2
- Use a table for aliases (Paolo)
This was done in a mostly automated fashion. I did it in three steps and then
rebased it into a single step which avoids repeatedly touching every file in
the tree.
The first step was a sed-based addition of the parent type to the subclass
registration functions.
The second step was another sed-based removal of subclass registration functions
while also adding virtual functions from the base class into a class_init
function as appropriate.
Finally, a python script was used to convert the DeviceInfo structures and
qdev_register_subclass functions to TypeInfo structures, class_init functions,
and type_register_static calls.
We are almost fully converted to QOM after this commit.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
These are various small stylistic changes which help make things more
consistent such that the automated conversion script can be simpler.
It's not necessary to agree or disagree with these style changes because all
of this code is going to be rewritten by the patch monkey script anyway.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
The virtio config area in PIO space is a bit special. The initial
header is little endian but the rest (device specific) is guest
native endian.
The PIO accessors for PCI on machines that don't have native IO ports
assume that all PIO is little endian, which works fine for everything
except the above.
A complicated way to fix it would be to split the BAR into two memory
regions with different endianess settings, but this isn't practical
to do, besides, the PIO code doesn't honor region endianness anyway
(I have a patch for that too but it isn't necessary at this stage).
So I decided to go for the quick fix instead which consists of
reverting the swap in virtio-pci in selected places, hoping that when
we eventually do a "v2" of the virtio protocols, we sort that out once
and for all using a fixed endian setting for everything.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
[agraf: keep virtio in libhw and determine endianness through a
helper function in exec.c]
Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
All files under GPLv2 will get GPLv2+ changes starting tomorrow.
event_notifier.c and exec-obsolete.h were only ever touched by Red Hat
employees and can be relicensed now.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
vdev->guest_features is not masking features that are not supported by
the guest. Fix this by introducing a common wrapper to be used by all
virtio bus implementations.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Currently, virtio devices are usually presented to the guest as an
emulated PCI device, virtio_pci. Although the actual IO operations
are done through system memory, the configuration of the virtio device
is done through the one PCI IO space BAR that virtio_pci presents.
But PCI IO space (aka PIO) is deprecated for modern PCI devices, and
on some systems with many PCI domains accessing PIO space can be
problematic. For example on the existing PowerVM implementation of
the PAPR spec, PCI PIO access is not supported at all. We're hoping
that our KVM implementation will support PCI PIO (once we support PCI
at all), but it will probably have some irritating limitations.
This patch, therefore, extends the virtio_pci device to have a PCI
memory space (MMIO) BAR as well as the IO BAR. The MMIO BAR contains
exactly the same registers, in exactly the same layout as the existing
PIO BAR.
Because the PIO BAR is still present, existing guest drivers should
still work fine. With this change in place, future guest drivers can
check for an MMIO BAR and use that if present (falling back to PIO
when possible to support older qemu versions).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
The msix table is defined as a subregion, to allow for a BAR that
mixes device specific regions with the msix table.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Add an exit handler that will free up RAM after a virtio-balloon device
is unplugged.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Multiple balloon registrations are not allowed; check if the
registration with the qemu balloon api succeeded. If not, fail the
device init.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
In practice, guests don't generate config requests
that cross a word boundary, so the logic to
detect command word access is correct because
PCI_COMMAND is 0x4. But depending on this is
tricky, further, it will break with guests
that do try to generate a misaligned access
as we pass it to devices without splitting.
Better to use the generic range_covers_byte for this.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
It needs to be a qdev property, because it belongs to the drive's
guest part. Precedence: commit a0fef654 and 6ced55a5.
Bonus: info qtree now shows the serial number.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The virtio_queue_notify() function checks that the virtqueue number is
less than the maximum number of virtqueues. A signed comparison is used
but the virtqueue number could be negative if a buggy or malicious guest
is run. This results in memory accesses outside of the virtqueue array.
It is risky doing input validation in common code instead of at the
guest<->host boundary. Note that virtio_queue_set_addr(),
virtio_queue_get_addr(), virtio_queue_get_num(), and many other virtio
functions do *not* validate the virtqueue number argument.
Instead of fixing the comparison in virtio_queue_notify(), move the
comparison to the virtio bindings (just like VIRTIO_PCI_QUEUE_SEL) where
we have a uint32_t value and can avoid ever calling into common virtio
code if the virtqueue number is invalid.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This patch move the 9p device registration into its own file
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri (JV) <jvrao@linux.vnet.ibm.com>
We have two different virtio buses: pci and s390. The abstraction path
taken in qemu is to have generic aliases for each device type in the
architecture specific qdev devices.
So let's make use of these aliases whenever we can and define them
whenever we can.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Commit c81131db15
detects old guests by comparing virtio and
PCI status. It attempts to do this on load,
as well, but load_config callback in a binding
is invoked too early and so the virtio status
isn't set yet.
We could add yet another callback to the
binding, to invoke after load, but it
seems easier to reuse the existing vmstate
callback.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Alexander Graf <agraf@suse.de>
Enable ioeventfd for virtio-serial devices by default. Commit
25db9ebe15 lists the benefits of using
ioeventfd.
Copying a file from guest to host over a virtio-serial channel didn't
show much difference in time or io_exit rate.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Instead of using a single variable to pass to the virtio_serial_init
function, use a struct so that expanding the number of variables to be
passed on later is easier.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
When MSI is off, each interrupt needs to be bounced through the io
thread when it's set/cleared, so vhost-net causes more context switches and
higher CPU utilization than userspace virtio which handles networking in
the same thread.
We'll need to fix this by adding level irq support in kvm irqfd,
for now disable vhost-net in these configurations.
Added a vhostforce flag to force vhost-net back on.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
- Don't return status from start/stop functions where it's ignored
- report errors to make debugging easier
- assert on unexpected failures
- don't disable notifiers on error so that we'll
retry when guest driver restarts
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Virtqueue notify is currently handled synchronously in userspace virtio. This
prevents the vcpu from executing guest code while hardware emulation code
handles the notify.
On systems that support KVM, the ioeventfd mechanism can be used to make
virtqueue notify a lightweight exit by deferring hardware emulation to the
iothread and allowing the VM to continue execution. This model is similar to
how vhost receives virtqueue notifies.
The result of this change is improved performance for userspace virtio devices.
Virtio-blk throughput increases especially for multithreaded scenarios and
virtio-net transmit throughput increases substantially.
Some virtio devices are known to have guest drivers which expect a notify to be
processed synchronously and spin waiting for completion.
For virtio-net, this also seems to interact with the guest stack in strange
ways so that TCP throughput for small message sizes (~200bytes)
is harmed. Only enable ioeventfd for virtio-blk for now.
Care must be taken not to interfere with vhost-net, which uses host
notifiers. If the set_host_notifier() API is used by a device
virtio-pci will disable virtio-ioeventfd and let the device deal with
host notifiers as it wishes.
Finally, there used to be a limit of 6 KVM io bus devices inside the
kernel. On such a kernel, don't use ioeventfd for virtqueue host
notification since the limit is reached too easily. This ensures that
existing vhost-net setups (which always use ioeventfd) have ioeventfds
available so they can continue to work.
After migration and on VM change state (running/paused) virtio-ioeventfd
will enable/disable itself.
* VIRTIO_CONFIG_S_DRIVER_OK -> enable virtio-ioeventfd
* !VIRTIO_CONFIG_S_DRIVER_OK -> disable virtio-ioeventfd
* virtio_pci_set_host_notifier() -> disable virtio-ioeventfd
* vm_change_state(running=0) -> disable virtio-ioeventfd
* vm_change_state(running=1) -> enable virtio-ioeventfd
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The VirtIOPCIProxy bugs field is currently used to enable workarounds
for older guests. Rename it to flags so that other per-device behavior
can be tracked.
A later patch uses the flags field to remember whether ioeventfd should
be used for virtqueue host notification.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Add "fw_name" to DeviceInfo to use in device path building. In
contrast to "name" "fw_name" should refer to functionality device
provides instead of particular device model like "name" does.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
This patch enables MSI-X for virtfs-9p-pci. It also adds a
compat property to pc-0.13 which turns it of there to stay
compatible to 0.13-stable.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
When using irqfd with vhost-net to inject interrupts,
a single evenfd might inject multiple interrupts.
Implementing this is much easier with a single
per-device callback to set guest notifiers.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Based on a patch from Mark McLoughlin, this patch introduces a new
bottom half packet transmitter that avoids the latency imposed by
the tx_timer approach. Rather than scheduling a timer when a TX
packet comes in, schedule a bottom half to be run from the iothread.
The bottom half handler first attempts to flush the queue with
notification disabled (this is where we could race with a guest
without txburst). If we flush a full burst, reschedule immediately.
If we send short of a full burst, try to re-enable notification.
To avoid a race with TXs that may have occurred, we must then
flush again. If we find some packets to send, the guest it probably
active, so we can reschedule again.
tx_timer and tx_bh are mutually exclusive, so we can re-use the
tx_waiting flag to indicate one or the other needs to be setup.
This allows us to seamlessly migrate between timer and bh TX
handling.
The bottom half handler becomes the new default and we add a new
tx= option to virtio-net-pci. Usage:
-device virtio-net-pci,tx=timer # select timer mitigation vs "bh"
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
If virtio_net_flush_tx() is called with notification disabled, we can
race with the guest, processing packets at the same rate as they
get produced. The trouble is that this means we have no guaranteed
exit condition from the function and can spend minutes in there.
Currently flush_tx is only called with notification on, which seems
to limit us to one pass through the queue per call. An upcoming
patch changes this.
Also add an option to set this value on the command line as different
workloads may wish to use different values. We can't necessarily
support any random value, so this is a developer option: x-txburst=
Usage:
-device virtio-net-pci,x-txburst=64 # 64 packets per tx flush
One pass through the queue (256) seems to be a good default value
for this, balancing latency with throughput. We use a signed int
for x-txburst because 2^31 packets in a burst would take many, many
minutes to process and it allows us to easily return a negative
value value from virtio_net_flush_tx() to indicate a back-off
or error condition.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Add an option to make the TX mitigation timer adjustable as a device
option. The 150us hard coded default used currently is reasonable,
but may not be suitable for all workloads, this gives us a way to
adjust it using a single binary. We can't support any random option
though, so use the "x-" prefix to indicate this is a developer
option. Usage:
-device virtio-net-pci,x-txtimer=500000,... # .5ms timeout
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Changing block.h or blockdev.h resulted in recompiling most objects.
Move DriveInfo typedef and BlockInterfaceType enum definitions
to qemu-common.h and rearrange blockdev.h use to decrease churn.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Free malloc'ed memory, unregister from savevm and clean up virtio-common
bits on device hot-unplug.
This was found performing a migration after device hot-unplug.
Reported-by: <lihuang@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Otherwise we can't migrate after we've removed a virtio block device.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Move the check from virtio_blk_init_pci(), where it protects only
virtio-blk-pci, to virtio_blk_init(). Without that, virtio-blk-s390
initializes without a drive. I figure that can lead to null pointer
dereferences.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
It can't actually fail now, but the next commit will change that.
s390_virtio_blk_init() already checks for failure, but
virtio_blk_init_pci() doesn't. Fix that.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Don't overwrite pci header type.
Otherwise, multi function bit which pci_init_header_type() sets
appropriately is lost.
Anyway PCI_HEADER_TYPE_NORMAL is zero, so it is unnecessary to zero
which is already zero cleared.
how to test:
run qemu and issue info pci to see whether a device in question is
normal device, not pci-to-pci bridge.
This is handy because guest os isn't required.
tested changes:
The following files are covered by using following commands.
sparc64-softmmu
apb_pci.c, vga-pci.c, cmd646.c, ne2k_pci.c, sun4u.c
ppc-softmmu
grackle_pci.c, cmd646.c, ne2k_pci.c, vga-pci.c, macio.c
ppc-softmmu -M mac99
unin_pci.c(uni-north, uni-north-agp)
ppc64-softmmu
pci-ohci, ne2k_pci, vga-pci, unin_pci.c(u3-agp)
x86_64-softmmu
acpi_piix4.c, ide/piix.c, piix_pci.c
-vga vmware vmware_vga.c
-watchdog i6300esb wdt_i6300esb.c
-usb usb-uhci.c
-sound ac97 ac97.c
-nic model=rtl8139 rtl8139.c
-nic model=pcnet pcnet.c
-balloon virtio virtio-pci.c:
untested changes:
The following changes aren't tested.
prep_pci.c: ppc-softmmu -M prep should cover, but core dumped.
unin_pci.c(uni-north-pci): the caller is commented out.
openpic.c: the caller is commented out in ppc_prep.c
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Make the property point to BlockDriverState, cutting out the DriveInfo
middleman. This prepares the ground for block devices that don't have
a DriveInfo.
Currently all user-defined ones have a DriveInfo, because the only way
to define one is -drive & friends (they go through drive_init()).
DriveInfo is closely tied to -drive, and like -drive, it mixes
information about host and guest part of the block device. I'm
working towards a new way to define block devices, with clean
host/guest separation, and I need to get DriveInfo out of the way for
that.
Fortunately, the device models are perfectly happy with
BlockDriverState, except for two places: ide_drive_initfn() and
scsi_disk_initfn() need to check the DriveInfo for a serial number set
with legacy -drive serial=... Use drive_get_by_blockdev() there.
Device model code should now use DriveInfo only when explicitly
dealing with drives defined the old way, i.e. without -device.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>