Commit Graph

15968 Commits

Author SHA1 Message Date
Christoph Hellwig
c488c7f649 block: latency accounting
Account the total latency for read/write/flush requests.  This allows
management tools to average it based on a snapshot of the nr ops
counters and allow checking for SLAs or provide statistics.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-26 18:18:38 +02:00
Christoph Hellwig
a597e79ce1 block: explicit I/O accounting
Decouple the I/O accounting from bdrv_aio_readv/writev/flush and
make the hardware models call directly into the accounting helpers.

This means:
 - we do not count internal requests from image formats in addition
   to guest originating I/O
 - we do not double count I/O ops if the device model handles it
   chunk wise
 - we only account I/O once it actuall is done
 - can extent I/O accounting to synchronous or coroutine I/O easily
 - implement I/O latency tracking easily (see the next patch)

I've conveted the existing device model callers to the new model,
device models that are using synchronous I/O and weren't accounted
before haven't been updated yet.  Also scsi hasn't been converted
to the end-to-end accounting as I want to defer that after the pending
scsi layer overhaul.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-25 18:18:42 +02:00
Frediano Ziglio
2f4b759367 qcow2: remove unused qcow2_create_refcount_update function
Signed-off-by: Frediano Ziglio <freddy77@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-25 15:23:10 +02:00
Frediano Ziglio
35ee5e39c5 qcow2: use always stderr for debugging
let all DEBUG_ALLOC2 printf goes to stderr

Signed-off-by: Frediano Ziglio <freddy77@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-25 15:22:25 +02:00
MORITA Kazutaka
2df4624662 sheepdog: use coroutines
This makes the sheepdog block driver support bdrv_co_readv/writev
instead of bdrv_aio_readv/writev.

With this patch, Sheepdog network I/O becomes fully asynchronous.  The
block driver yields back when send/recv returns EAGAIN, and is resumed
when the sheepdog network connection is ready for the operation.

Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-24 14:53:51 +02:00
Frediano Ziglio
ab0997e0af qcow2: remove memory leak
Signed-off-by: Frediano Ziglio <freddy77@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-23 17:41:15 +02:00
Frediano Ziglio
3fc48d0983 qcow2: Removed QCowAIOCB entirely
Signed-off-by: Frediano Ziglio <freddy77@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-23 17:41:14 +02:00
Frediano Ziglio
5ebaa27e9a qcow2: reindent and use while before the big jump
prepare to remove read/write callbacks

Signed-off-by: Frediano Ziglio <freddy77@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-23 17:41:14 +02:00
Frediano Ziglio
e78c69b89c qcow2: remove common from QCowAIOCB
Signed-off-by: Frediano Ziglio <freddy77@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-23 17:41:14 +02:00
Frediano Ziglio
c2bdd9904b qcow2: remove cluster_offset from QCowAIOCB
Signed-off-by: Frediano Ziglio <freddy77@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-23 17:41:14 +02:00
Frediano Ziglio
c227140397 qcow2: remove l2meta from QCowAIOCB
Signed-off-by: Frediano Ziglio <freddy77@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-23 17:41:14 +02:00
Frediano Ziglio
faf575c136 qcow2: removed cur_nr_sectors field in QCowAIOCB
Signed-off-by: Frediano Ziglio <freddy77@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-23 17:41:14 +02:00
Frediano Ziglio
4617310c33 qcow2: Removed unused AIOCB fields
Signed-off-by: Frediano Ziglio <freddy77@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-23 17:41:14 +02:00
Frediano Ziglio
122bbd1dd9 qcow: remove old #undefined code
Signed-off-by: Frediano Ziglio <freddy77@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-23 17:41:14 +02:00
Frediano Ziglio
27deebe836 qcow: Remove QCowAIOCB
Embed qcow_aio_read_cb into qcow_co_readv and qcow_aio_write_cb into qcow_co_writev

Signed-off-by: Frediano Ziglio <freddy77@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-23 17:41:14 +02:00
Frediano Ziglio
43ca85b559 qcow: move some blocks of code to avoid useless variable initialization
Signed-off-by: Frediano Ziglio <freddy77@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-23 17:41:14 +02:00
Frediano Ziglio
430bbaaa95 qcow: QCowAIOCB field cleanup
remove unused field from this structure and put some of them in qcow_aio_read_cb and qcow_aio_write_cb

Signed-off-by: Frediano Ziglio <freddy77@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-23 17:41:14 +02:00
Frediano Ziglio
f5cd8173e7 qcow/qcow2: Allocate QCowAIOCB structure using stack
instead of calling qemi_aio_get use stack

Signed-off-by: Frediano Ziglio <freddy77@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-23 17:41:14 +02:00
Avi Kivity
e4ea78ee76 posix-aio-compat: fix latency issues
In certain circumstances, posix-aio-compat can incur a lot of latency:
 - threads are created by vcpu threads, so if vcpu affinity is set,
   aio threads inherit vcpu affinity.  This can cause many aio threads
   to compete for one cpu.
 - we can create up to max_threads (64) aio threads in one go; since a
   pthread_create can take around 30μs, we have up to 2ms of cpu time
   under a global lock.

Fix by:
 - moving thread creation to the main thread, so we inherit the main
   thread's affinity instead of the vcpu thread's affinity.
 - if a thread is currently being created, and we need to create yet
   another thread, let thread being born create the new thread, reducing
   the amount of time we spend under the main thread.
 - drop the local lock while creating a thread (we may still hold the
   global mutex, though)

Note this doesn't eliminate latency completely; scheduler artifacts or
lack of host cpu resources can still cause it.  We may want pre-allocated
threads when this cannot be tolerated.

Thanks to Uli Obergfell of Red Hat for his excellent analysis and suggestions.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-23 17:41:14 +02:00
Christoph Hellwig
e8045d6726 block: include flush requests in info blockstats
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-23 17:41:14 +02:00
Nicholas Thomas
f785a5ae36 block/curl: Handle failed reads gracefully.
Current behaviour if a read fails is for the acb to not get finished.
This causes an infinite loop in bdrv_read_em (block.c). The read failure
never gets reported to the  guest and if the error condition clears, the
process never recovers.

With this patch, when curl reports a failure we finish the acb as a
failure. This results in the guest receiving an I/O error (rather than
the read hanging indefinitely) and if the error condition subsequently
clears, retries work as expected.

The simplest test is to put an ISO on a web server you have control over
and open it with qemu-io. Then move the ISO out of the way and attempt
to read some data - you should see behaviour matching the above.

Signed-off-by: Nick Thomas <nick@bytemark.co.uk>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-23 17:41:14 +02:00
Stefan Hajnoczi
3fba9d8198 qemu-img: print error codes when convert fails
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-23 14:15:17 +02:00
Scott Wood
de33b1f3dd qcow: initialize coroutine mutex
commit 52b8eb6013 added a mutex,
but never initialized it.  This caused a segfault.

Reported-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-23 14:15:17 +02:00
Devin Nakamura
d57237f291 qcow2: fix typo in documentation for qcow2_get_cluster_offset()
Documentation states the num is measured in clusters, but its
actually measured in sectors

Signed-off-by: Devin Nakamura <devin122@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-23 14:15:17 +02:00
Kevin Wolf
bb1c05973c qemu-img: Use qemu_blockalign
Now that you can use cache=none for the output file in qemu-img, we should
properly align our buffers so that raw-posix doesn't have to use its (smaller)
bounce buffer.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2011-08-23 14:15:17 +02:00
Philipp Hahn
6cbc3031c8 qcow2: Fix DEBUG_* compilation
By introducing BlockDriverState compiling qcow2 with DEBUG_ALLOC and DEBUG_EXT
defined got broken.
Define a BdrvCheckResult structure locally which is now needed as the second
argument.

Also fix qcow2_read_extensions() needing BDRVQcowState.

Signed-off-by: Philipp Hahn <hahn@univention.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-23 14:15:17 +02:00
Stefan Hajnoczi
92196b2f56 block: add cache=directsync parameter to -drive
This patch adds -drive cache=directsync for O_DIRECT | O_SYNC host file
I/O with no disk write cache presented to the guest.

This mode is useful when guests may not be sending flushes when
appropriate and therefore leave data at risk in case of power failure.
When cache=directsync is used, write operations are only completed to
the guest when data is safely on disk.

This new mode is like cache=writethrough but it bypasses the host page
cache.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-23 14:15:17 +02:00
Stefan Hajnoczi
c3993cdca3 block: parse cache mode flags in a single place
This patch introduces bdrv_parse_cache_flags() which sets open flags
given a cache mode.  Previously this was duplicated in blockdev.c and
qemu-img.c.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-23 14:15:17 +02:00
Aneesh Kumar K.V
12888904fe coroutine: Add CoRwlock support
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-23 14:15:17 +02:00
Peter A. G. Crosthwaite
b861b7419c xilinx: removed microbalze_pic_init from xilinx.h
This is a microblaze target specific function that belongs outside
of xilinx.h (which is a collection of target independent device model
instantiator functions)

Signed-off-by: Peter A. G. Crosthwaite <peter.crosthwaite@petalogix.com>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@gmail.com>
2011-08-22 23:29:37 +02:00
Peter A. G. Crosthwaite
0d877c66b6 xilinx.h: Added missing includes
Added some missing #includes for this file. Previously this file
relied on its clients to pre-include its dependencies.

Signed-off-by: Peter A. G. Crosthwaite <peter.crosthwaite@petalogix.com>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@gmail.com>
2011-08-22 23:29:37 +02:00
Jan Kiszka
f8b8d633f6 sdl: Don't release input on mouse mode change in full-screen mode
While in full-screen mode, the input focus naturally belongs to the SDL
window. Avoid dropping it when switching from absolute to relative
mouse mode.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-08-22 14:37:04 -05:00
Jan Kiszka
fa7d186757 Replace qemu_system_cond with VCPU stop mechanism
We can express the VCPU thread wakeup with the stop mechanism, saving
both qemu_system_ready and the qemu_system_cond. For KVM threads, we can
just enter the main loop as long as the thread is stopped. The central
TCG thread is better held back before the loop as there can be side
effects of the services called even when all CPUs are stopped.

Creating VCPUs in stopped state will also be required for proper CPU
hotplugging support.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-08-22 14:37:03 -05:00
Jan Kiszka
78dd9ff632 vga: Drop some unused fields
Memory region refactorings obsoleted them.

CC: Avi Kivity <avi@redhat.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-08-22 14:37:03 -05:00
Jan Kiszka
80763888bf vga: Use linear mapping + dirty logging in chain 4 memory access mode
Most VGA memory access modes require MMIO handling as they demand weird
logic to get a byte from or into the video RAM. However, there is one
exception: chain 4 mode with all memory planes enabled for writing. This
mode actually allows lineary mapping, which can then be combined with
dirty logging to accelerate KVM.

This patch accelerates specifically VBE accesses like they are used by
grub in graphical mode. Not only the standard VGA adapter benefits from
this, also vmware and spice in VGA mode.

CC: Gerd Hoffmann <kraxel@redhat.com>
CC: Avi Kivity <avi@redhat.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-08-22 14:37:03 -05:00
Jan Kiszka
fe55ff6e61 vmware-vga: Eliminate vga_dirty_log_restart
After the conversion to the new Memory API, vga_dirty_log_restart became
seriously pointless. Remove it from vmware-vga and and then finally drop
the service.

CC: Andrzej Zaborowski <balrogg@gmail.com>
CC: Avi Kivity <avi@redhat.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-08-22 14:37:03 -05:00
Jan Kiszka
8d121d4960 vmware-vga: Remove dead DIRECT_VRAM mode
The code was disabled since day 1 of vmware-vga, and now it does not
even build anymore. Time for a cleanup.

CC: Andrzej Zaborowski <balrogg@gmail.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-08-22 14:37:03 -05:00
Jan Kiszka
ca0508df2e vmware-vga: Disable verbose mode
Elimiates 'vmsvga_value_write: guest runs Linux.' messages from the
console.

CC: Andrzej Zaborowski <balrogg@gmail.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-08-22 14:37:03 -05:00
Jan Kiszka
8a9501bae2 vmware-vga: Register reset service
Fixes cold reset in vmware graphic modes. We need to split up the reset
function for this purpose, breaking out init-once bits.

Cc: Andrzej Zaborowski <balrogg@gmail.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-08-22 14:37:03 -05:00
Jan Kiszka
0035e5094c ioapic: Implement polarity
If the polarity bit is set in the redirection table, the input level
simply has to inverted as it is low active in this case.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-08-22 14:37:03 -05:00
Jan Kiszka
1f6f408c8c target-i386: Remove unused polarity arguments from APIC API
Polarity of external interrupts needs to be handled in the IOAPIC.
Passing it to the APIC is pointless. So remove all these arguments.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-08-22 14:37:03 -05:00
Jan Kiszka
eae74cf906 Do not kick vcpus in TCG mode
In TCG mode, iothread and vcpus run in lock-step. So it's pointless to
send a signal from qemu_cpu_kick to the vcpu thread - if we got here,
the receiver already left the vcpu loop.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-08-22 14:37:02 -05:00
Jan Kiszka
c9f711a5d3 Poll main loop after I/O events were received
Polling until select returns empty fdsets helps to reduce the switches
between iothread and vcpus. The benefit of this patch is best visible
when running an SMP guest on an SMP host in emulation mode.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-08-22 14:37:02 -05:00
Jan Kiszka
200668ba08 Do not drop global mutex for polled main loop runs
If we call select without a timeout, it's more efficient to keep the
global mutex locked as we may otherwise just play ping pong with a
vcpu thread contending for it. This is particularly important for TCG
mode where we run in lock-step with the vcpu thread.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-08-22 14:37:02 -05:00
Anthony Liguori
6e23063c46 Merge remote-tracking branch 'qemu-kvm/memory/core' into staging 2011-08-22 12:26:30 -05:00
Edgar E. Iglesias
22a78d64cc microblaze-user: Deliver SIGFPE on div by zero
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@gmail.com>
2011-08-22 18:47:38 +02:00
Richard Henderson
563ea48903 memory: Fix old_portio vs non-zero offset
The legacy functions that we're wrapping expect that offset
to be included in the register.  Indeed, they generally
expect the absolute address and then mask off the "high" bits.

The FDC is the first converted device with a non-zero offset.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-08-22 19:29:04 +03:00
Anthony Liguori
a5e1cbc80e memory: temporarily suppress the subregion collision warning
After 312b4234, the APIC and PCI devices are colliding with each other.  This
is harmless in practice because the APIC accesses are special cased and never
make there way onto the bus.

Avi is working on a proper fix, but until that's ready, avoid printing the
warning.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-08-22 11:14:56 -05:00
Avi Kivity
ae0a54664c 440fx: fix PAM, PCI holes
The current implementation of PAM and the PCI holes is broken in several
ways:

  - PCI BARs are not restricted to the PCI hole (a BAR may hide memory)
  - PCI devices do not respect PAM (if a PCI device maps a region while
    PAM maps the region to RAM, the request will be honored)

This patch fixes things by introducing a pci address space, and using
memory region aliases to represent PAM regions, SMRAM, and PCI holes.

The memory hierarchy looks something like

system_memory
 |
 +--- low memory alias (0-0xe0000000)
 |      |
 |      +-- ram@0
 |
 +--- high memory alias (0x100000000-EOM)
 |      |
 |      +-- ram@0xe0000000
 |
 +--- pci hole alias (end of low memory-0x100000000)
 |      |
 |      +-- pci@end-of-low-memory
 |
 |
 +--- pam[n] (0xc0000-0xc3fff etc) (when set to pci, priority 1)
 |      |
 |      +-- pci@0xc4000 etc
 |
 +--- smram (0xa0000-0xbffff) (when set to pci/vga, priority 1)
        |
        +-- pci@0xa0000 etc

ram (simple ram region)

pci
 |
 +--- BARn
 |
 +--- VGA 0xa0000-0xbffff
 |
 +--- ROMs

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-08-22 10:47:49 -05:00
Avi Kivity
be20f9e902 vga: drop get_system_memory() from vga devices and derivatives
Instead, use the bus accessors, or get the address space directly
from the board constructor.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-08-22 10:47:49 -05:00