VIRTIO_BLK_F_SCSI is supposed to mean whether the host can *parse*
SCSI requests, not *execute* them. You could run QEMU with scsi=on
and a file-backed disk, and QEMU would fail all SCSI requests even
though it advertises VIRTIO_BLK_F_SCSI.
Because we need to do this to fix a migration compatibility problem
related to how QEMU is invoked by management, we must do this
unconditionally even on older machine types. This more or less assumes
that no one ever invoked QEMU with scsi=off.
Here is how testing goes:
- old QEMU, scsi=on -> new QEMU, scsi=on
- new QEMU, scsi=on -> old QEMU, scsi=on
- old QEMU, scsi=off -> new QEMU, scsi=on
- new QEMU, scsi=off -> old QEMU, scsi=on
ok (new QEMU has VIRTIO_BLK_F_SCSI, adding host features is fine)
- old QEMU, scsi=off -> new QEMU, scsi=off
ok (new QEMU has VIRTIO_BLK_F_SCSI, adding host features is fine)
- old QEMU, scsi=on -> new QEMU, scsi=off
ok, bug fixed
- new QEMU, scsi=on -> old QEMU, scsi=off
doesn't work (same as: old QEMU, scsi=on -> old QEMU, scsi=off)
- new QEMU, scsi=off -> old QEMU, scsi=off
broken by the patch
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
We will have to add another field to the virtio-blk configuration in
the next patch. Avoid a proliferation of arguments to virtio_blk_init.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Move it from virtio_blk_exit_pci to virtio_blk_exit.
This is included here because the next patch removes proxy->block.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Linux really looks only at scsi->errors for SG_IO requests; it does
not look at the virtio request status at all. Because of this, when
a SG_IO request is failed early with virtio_blk_req_complete(req,
VIRTIO_BLK_S_UNSUPP), without writing hdr.status, it will look like
a success to the guest.
This is their bug, but we can make it safe for older guests now by
forcing scsi->errors to have a non-zero value whenever a request
has to be failed.
But if we fix the bug in the guest driver, we will have another problem
because QEMU returns VIRTIO_BLK_S_IOERR if the status is non-zero, and
Linux translates that to -EIO. Rather, the guest should succeed the
request and pass the non-zero status via the userspace-provided SG_IO
structure. So, remove the case where virtio_blk_handle_scsi can
return VIRTIO_BLK_S_IOERR.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Allow load_image_targphys to load files on systems with more than 2G of
emulated memory by changing the max_sz parameter from an int to an
uint64_t.
Reviewed-by: Andreas F=E4rber <afaerber@suse.de>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Mark Langsdorf <mark.langsdorf@calxeda.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
* mdroth/qga-pull-5-15-12:
qemu-ga: align versioning with QEMU_VERSION
qemu-ga: fix segv after failure to open log file
qemu-ga: guest-shutdown: use only async-signal-safe functions
qemu-ga: guest-shutdown: become synchronous
qemu-ga: guest-suspend: make the API synchronous
qemu-ga: become_daemon(): reopen standard fds to /dev/null
qemu-ga: make reopen_fd_to_null() public
qemu-ga: guest-suspend-hybrid: don't emit a success response
qemu-ga: guest-suspend-ram: don't emit a success response
qemu-ga: guest-suspend-disk: don't emit a success response
qemu-ga: guest-shutdown: don't emit a success response
qemu-ga: don't warn on no command return
qapi: add support for command options
Commit 93e9eb6808 added fdc-test,
but accidentally removed rtc-test because check-qtest-i386-y was
not enhanced but set twice.
This patch adds rtc-test again (and sorts both tests alphabetically).
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Make use of the new vector notifier to track changes of the MSI-X
configuration of virtio PCI devices. On enabling events, we establish
the required virtual IRQ to MSI-X message route and link the signaling
eventfd file descriptor to this vIRQ line. That way, vhost-generated
interrupts can be directly delivered to an in-kernel MSI-X consumer like
the x86 APIC.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Analogously to msi_nr_vectors_allocated, add a service for MSI-X. Will
be used by the virtio-pci layer.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Provide a dummy kvm_kernel_irqchip so that kvm_irqchip_in_kernel can be
used by code that is not under CONFIG_KVM protection.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Add services to associate an eventfd file descriptor as input with an
IRQ line as output. Such a line can be an input pin of an in-kernel
irqchip or a virtual line returned by kvm_irqchip_add_route.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Automatically commit route changes after kvm_add_routing_entry and
kvm_irqchip_release_virq. There is no performance relevant use case for
which collecting multiple route changes is beneficial. This makes
kvm_irqchip_commit_routes an internal service which assert()s that the
corresponding IOCTL will always succeed.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This allows to drop routes created by kvm_irqchip_add_irq/msi_route
again.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Add a service that establishes a static route from a virtual IRQ line to
an MSI message. Will be used for IRQFD and device assignment. As we will
use this service outside of CONFIG_KVM protected code, stub it properly.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
We will add kvm_irqchip_add_msi_route, so let's make the difference
clearer.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Vector notifiers shall be triggered by the MSI/MSI-X core whenever a
relevant configuration change is programmed by the guest. In case of
MSI-X, changes are reported when the effective mask (global &&
per-vector) alters its state. On unmask, the current vector
configuration is included in the event report. This allows users - e.g.
virtio-pci layer - to transfer this information to external MSI-X
routing subsystems - like vhost + KVM in-kernel irqchip.
This implementation only provides MSI-X support, but extension to MSI is
feasible and will be provided later on when adding support for KVM PCI
device assignment.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
In preparation of firing vector notifiers on mask changes, call
msix_handle_mask_update also from msix_mask_all. So far, this will have
no real effect.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This helper will also be used by the upcoming config notifier.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
When QEMU was built with the simple trace backend, linking failed:
LINK tests/fdc-test
oslib-posix.o: In function `trace_qemu_memalign':
qemu/bin/debug/x86/./trace.h:31: undefined reference to `trace3'
oslib-posix.o: In function `trace_qemu_vmalloc':
qemu/bin/debug/x86/./trace.h:35: undefined reference to `trace2'
oslib-posix.o: In function `trace_qemu_vfree':
qemu/bin/debug/x86/./trace.h:39: undefined reference to `trace1'
collect2: error: ld returned 1 exit status
make: *** [tests/fdc-test] Fehler 1
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
There's some dodgy application of De Morgan's law in the emulation
of the MIPS BC1ANY[24]F instructions: they end up branching only
if all CCs are false, rather than if one CC is.
Tested on mips64-linux-gnu, where it fixes the GCC MIPS3D tests.
Signed-off-by: Richard Sandiford <rdsandiford@googlemail.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
If we execute linux-user code that does the following:
* A = mmap()
* execute code in A
* munmap(A)
* B = mmap(), but mmap returns the same address as A
* execute code in B
we end up executing a stale cached tb that contains translated code
from A, while we want new code from B.
This patch adds a TB flush for mmap'ed regions, before we return them,
avoiding the whole issue. It also adds a flush for munmap, so that we
don't execute stale TBs instead of getting a segfault.
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Riku Voipio <riku.voipio@linaro.org>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
There are no outside references to virtio_portio.
Add missing 'static' specifier.
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Initrd load address is too low, it conflicts with kernel load
address:
rom: requested regions overlap (rom phdr #0: /tmp/vmlinux-debian-6.0.4-sparc64. free=0x0000000000742519, addr=0x0000000000400000)
rom loading failed
Fix by making the initrd address variable, load initrd after kernel
image. Use 64 bit variables instead of longs or 32 bit types.
Tested-by: Artyom Tarasenko <atar4qemu@gmail.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Split IN_T into BSIZE and ITYPE, to avoid expansion if the OS has
defined macros for the intX_t and uintX_t types. The IN_T constant is
then defined in mixeng_template.h so it can be used by the
functions/macros on this header file.
This change has been tested successfully under Debian Linux and NetBSD
6.0BETA.
Cc: Vassili Karpov (malc) <av1474@comtv.ru>
Signed-off-by: Roger Pau Monne <roger.pau@citrix.com>
Signed-off-by: malc <av1474@comtv.ru>
Implemented the swapb and swaph byte/halfword reversal instructions added
to microblaze v8.30
Signed-off-by: Peter A. G. Crosthwaite <peter.crosthwaite@petalogix.com>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@gmail.com>
Signed-off-by: John V. Baboval <john.baboval@virtualcomputer.com>
Signed-off-by: Tom Goetz <tom.goetz@virtualcomputer.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
In the context of PV-on-HVM under Xen, the emulated nics are supposed to be
unplug before the guest drivers are initialized, when the guest write to a
specific IO port.
Without this patch, the guest end up with two nics with the same MAC, the
emulated nic and the PV nic.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
While for the "normal" case (called from blk_send_response_all())
decrementing requests_finished is correct, doing so in the parse error
case is wrong; requests_inflight needs to be decremented instead.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Use bdrv_aio_flush instead of bdrv_flush.
Make sure to call bdrv_aio_writev/readv after the presync bdrv_aio_flush is fully
completed and make sure to call the postsync bdrv_aio_flush after
bdrv_aio_writev/readv is fully completed.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
This patch removes a dead option.
The same can be achieved removing BDRV_O_NOCACHE and BDRV_O_CACHE_WB
from the flags passed to bdrv_open.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
rtc_clock is only used by the RTC emulator (mc146818rtc.c), however Xen
has its own RTC emulator in the hypervisor so we can disable it.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
PIT and PCSPK are emulated by the hypervisor so we don't need to emulate
them in Qemu: this patch prevents Qemu from waking up needlessly at
PIT_FREQ on Xen.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
As MSI is now fully supported by KVM (/wrt available features in
upstream), we can finally enable the in-kernel irqchip by default.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
If the kernel supports KVM_SIGNAL_MSI, we can avoid the route-based
MSI injection mechanism.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Catch writes to the MSI MMIO region in the KVM APIC and forward them to
the kernel. Provide the kernel support GSI routing, this allows to
enable MSI support also for in-kernel irqchip mode.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Push msi_supported enabling to the APIC implementations where we can
encapsulate the decision more cleanly, hiding the details from the
generic code.
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This patch basically adds kvm_irqchip_send_msi, a service for sending
arbitrary MSI messages to KVM's in-kernel irqchip models.
As the original KVM API requires us to establish a static route from a
pseudo GSI to the target MSI message and inject the MSI via toggling
that virtual IRQ, we need to play some tricks to make this interface
transparent. We create those routes on demand and keep them in a hash
table. Succeeding messages can then search for an existing route in the
table first and reuse it whenever possible. If we should run out of
limited GSIs, we simply flush the table and rebuild it as messages are
sent.
This approach is rather simple and could be optimized further. However,
latest kernels contains a more efficient MSI injection interface that
will obsolete the GSI-based dynamic injection.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Will be used for generating and distributing MSI messages, both in
emulation mode and under KVM.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Instead of the bitmap size, store the maximum of GSIs the kernel
support. Move the GSI limit assertion to the API function
kvm_irqchip_add_route and make it stricter.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Previously qemu-ga version was defined seperately. Since it is aligned
with QEMU releases, use QEMU_VERSION instead. This also implies the
version bump for 1.1[-rcN] release of qemu-ga.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Acked-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Currently, if we fail to open the specified log file (generally due to a
permissions issue), we'll assign NULL to the logfile handle (stderr,
initially) used by the logging routines, which can cause a segfault to
occur when we attempt to report the error before exiting.
Instead, only re-assign if the open() was successful.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
POSIX mandates[1] that a child process of a multi-thread program uses
only async-signal-safe functions before exec(). We consider qemu-ga
to be multi-thread, because it uses glib.
However, qmp_guest_shutdown() uses functions that are not
async-signal-safe. Fix it the following way:
- fclose() -> reopen_fd_to_null()
- execl() -> execle()
- exit() -> _exit()
- drop slog() usage (which is not safe)
[1] http://pubs.opengroup.org/onlinepubs/009695399/functions/fork.html
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Last commit dropped qemu-ga's SIGCHLD handler, used to automatically
reap terminated children processes. This introduced a bug to
qmp_guest_shutdown(): it will generate zombies.
This problem probably doesn't matter in the success case, as the VM
will shutdown anyway, but let's do the right thing and reap the
created process. This ultimately means that guest-shutdown is now a
synchronous command.
An interesting side effect is that guest-shutdown is now able to
report an error to the client if shutting down fails.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Currently, qemu-ga has a SIGCHLD handler that automatically reaps terminated
children processes. The idea is to avoid having qemu-ga commands blocked
waiting for children to terminate.
That approach has two problems:
1. qemu-ga is unable to detect errors in the child, meaning that qemu-ga
returns success even if the child fails to perform its task
2. if a command does depend on the child exit status, the command has to
play tricks to bypass the automatic reaper
Case 2 impacts the guest-suspend-* API, because it has to execute an external
program to check for suspend support. Today, to bypass the automatic reaper,
suspend code has to double fork and pass exit status information through a
pipe. Besides being complex, this is prone to race condition bugs. Indeed,
the current code does have such bugs.
Making the guest-suspend-* API synchronous (ie. by dropping the SIGCHLD
handler and calling waitpid() from commands) is a much simpler approach,
which fixes current race conditions bugs and enables commands to detect
errors in the child.
This commit does just that. There's a side effect though, guest-shutdown
will generate zombies if shutting down fails. This will be fixed by the
next commit.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
This fixes a bug where qemu-ga doesn't suspend the guest because it
fails to detect suspend support even when the guest does support
suspend. This happens because of the way qemu-ga fds are managed in
daemon mode.
When starting qemu-ga with --daemon, become_daemon() will close all
standard fds. This will cause qemu-ga to end up with the following
fds (if started with 'qemu-ga --daemon'):
0 -> /dev/vport0p1
3 -> /run/qemu-ga.pid
Then a guest-suspend-* function is issued. They call bios_supports_mode(),
which will call pipe(), and qemu-ga's fd will be:
0 -> /dev/vport0p1
1 -> pipe:[16247]
2 -> pipe:[16247]
3 -> /run/qemu-ga.pid
bios_supports_mode() forks off a child and blocks waiting for the child
to write something to the pipe. The child, however, closes its reading
end of the pipe _and_ reopen all standard fds to /dev/null. This will
cause the child's fds to be:
0 -> /dev/null
1 -> /dev/null
2 -> /dev/null
3 -> /run/qemu-ga.pid
In other words, the child's writing end of the pipe is now /dev/null.
It writes there and exits. The parent process (blocked on read()) will
get an EOF and interpret this as "something unexpected happened in
the child, let's assume the guest doesn't support suspend". And suspend
will fail.
To solve this problem we have to reopen standard fds to /dev/null
in become_daemon(), instead of closing them.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
The next commit wants to use it.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Today, qemu-ga may not be able to emit a success response when
guest-suspend-hybrid completes. This happens because the VM may
suspend before qemu-ga is able to emit a response.
This semantic is a bit confusing, as it's not clear for clients if
they should wait for a response or how they should check for success.
This commit solves that problem by changing guest-suspend-hybrid to
never emit a success response and suggests in the documentation
what clients should do to check for success.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Today, qemu-ga may not be able to emit a success response when
guest-suspend-ram completes. This happens because the VM may
suspend before qemu-ga is able to emit a response.
This semantic is a bit confusing, as it's not clear for clients if
they should wait for a response or how they should check for success.
This commit solves that problem by changing guest-suspend-ram to
never emit a success response and suggests in the documentation
what clients should do to check for success.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>