Commit Graph

579 Commits

Author SHA1 Message Date
Paolo Bonzini
a275fa42fa block: do not reuse the backing file across bdrv_close/bdrv_open
This is another bug caused by not doing a full cleanup of the BDS
across close/open.  This was found with mirroring by Shaolong Hu,
but it can probably be reproduced also with eject or change.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:12 +02:00
Paolo Bonzini
3a389e7926 block: another bdrv_append fix
bdrv_append must also copy open_flags to the top, because the snapshot
has BDRV_O_NO_BACKING set.  This causes interesting results if you
later use drive-reopen (not upstream) to reopen the image, and lose
the backing file in the process.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:12 +02:00
Paolo Bonzini
e023b2e244 block: fix snapshot on QED
QED's opaque data includes a pointer back to the BlockDriverState.
This breaks when bdrv_append shuffles data between bs_new and bs_top.
To avoid this, add a "rebind" function that tells the driver about
the new relationship between the BlockDriverState and its opaque.

The patch also adds rebind to VVFAT for completeness, even though
it is not used with live snapshots.

Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:12 +02:00
Paolo Bonzini
71df14fcbe block: fix allocation size for dirty bitmap
Also reuse elsewhere the new constant for sizeof(unsigned long) * 8.

The dirty bitmap is allocated in bits but declared as unsigned long.
Thus, its memory block is accessed beyond its end unless the image
is a multiple of 64 chunks (i.e. a multiple of 64 MB).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:11 +02:00
Paolo Bonzini
63090dac3a block: open backing file as read-only when probing for size
bdrv_img_create will temporarily open the backing file to probe its size.
However, this could be done with a read-write open if the wrong flags are
passed to bdrv_img_create.  Since there is really no documentation on
what flags can be passed, assume that bdrv_img_create receives the flags
with which the new image will be opened; sanitize them when opening
the backing file.

Reported-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:11 +02:00
Paolo Bonzini
469ef350e1 block: update in-memory backing file and format
These are needed to print "info block" output correctly.  QCOW2 does this
because it needs it to write the header, but QED does not, and common code
is the right place to do it.

Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:11 +02:00
Paolo Bonzini
5f3777945d block: push bdrv_change_backing_file error checking up from drivers
This check applies to all drivers, but QED lacks it.

Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:11 +02:00
Zhi Yong Wu
4c355d53c6 block: add the support to drain throttled requests
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
[ Iterate until all block devices have processed all requests,
  add comments. - Paolo ]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:11 +02:00
Zhi Yong Wu
5b7e1542cf block: make bdrv_create adopt coroutine
The current qemu.git introduces failure with preallocation and some
sizes:

qemu-img create -f qcow2 new.img 976563K -o preallocation=metadata
qemu-img: qemu-coroutine-lock.c:111: qemu_co_mutex_unlock: Assertion
`mutex->locked == 1' failed.

And lock needs to work in coroutine context. So to fix this issue, we
need to make bdrv_create adopt coroutine at first.

Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-07 19:33:18 +02:00
Stefan Hajnoczi
c83c66c3b5 block: add 'speed' optional parameter to block-stream
Allow streaming operations to be started with an initial speed limit.
This eliminates the window of time between starting streaming and
issuing block-job-set-speed.  Users should use the new optional 'speed'
parameter instead so that speed limits are in effect immediately when
the job starts.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2012-04-27 11:44:50 -03:00
Stefan Hajnoczi
882ec7ce53 block: change block-job-set-speed argument from 'value' to 'speed'
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2012-04-27 11:44:50 -03:00
Stefan Hajnoczi
9e6636c72d block: use Error mechanism instead of -errno for block_job_set_speed()
There are at least two different errors that can occur in
block_job_set_speed(): the job might not support setting speeds or the
value might be invalid.

Use the Error mechanism to report the error where it occurs.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2012-04-27 11:44:50 -03:00
Stefan Hajnoczi
fd7f8c6537 block: use Error mechanism instead of -errno for block_job_create()
The block job API uses -errno return values internally and we convert
these to Error in the QMP functions.  This is ugly because the Error
should be created at the point where we still have all the relevant
information.  More importantly, it is hard to add new error cases to
this case since we quickly run out of -errno values without losing
information.

Go ahead and use Error directly and don't convert later.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2012-04-27 11:44:50 -03:00
Kevin Wolf
621f058940 qcow2: Zero write support
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-20 15:57:30 +02:00
Liu Yuan
80ccf93b88 qemu-img: let 'qemu-img convert' flush data
The 'qemu-img convert -h' advertise that the default cache mode is
'writeback', while in fact it is 'unsafe'.

This patch 1) fix the help manual and 2) let bdrv_close() call bdrv_flush()

2) is needed because some backend storage doesn't have a self-flush
mechanism(for e.g., sheepdog), so we need to call bdrv_flush() to make
sure the image is really writen to the storage instead of hanging around
writeback cache forever.

Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-20 11:42:41 +02:00
Kevin Wolf
7094f12f86 block: Drain requests in bdrv_close
If an AIO request is in flight that refers to a BlockDriverState that
has been closed and possibly even freed, more or less anything could
happen. I have seen segfaults, -EBADF return values and qcow2 sometimes
actually catches the situation in bdrv_close() and abort()s.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2012-04-19 15:48:52 +02:00
Benoît Canet
077892696b block: add a function to clear incoming live migration flags
This function will clear all BDRV_O_INCOMING flags.

Signed-off-by: Benoit Canet <benoit.canet@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05 16:27:56 +02:00
Jeff Cody
f6801b83d0 block: bdrv_append() fixes
A few fixups for bdrv_append():

The new bs (bs_new) passed into bdrv_append() should be anonymous.  Rather
than call bdrv_make_anon() to enforce this, use an assert to catch when a caller
is passing in a bs_new that is not anonymous.

Also, the new top layer should have its backing_format reflect the original
top's format.

And last, after the swap of bs contents, the device_name will have been copied
down. This needs to be cleared to reflect the anonymity of the bs that was
pushed down.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05 14:54:41 +02:00
Paolo Bonzini
9f25eccc1c block: set job->speed in block_set_speed
There is no need to do this in every implementation of set_speed
(even though there is only one right now).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05 14:54:40 +02:00
Paolo Bonzini
3e914655f2 block: fix streaming/closing race
Streaming can issue I/O while qcow2_close is running.  This causes the
L2 caches to become very confused or, alternatively, could cause a
segfault when the streaming coroutine is reentered after closing its
block device.  The fix is to cancel streaming jobs when closing their
underlying device.

The cancellation must be synchronous, on the other hand qemu_aio_wait
will not restart a coroutine that is sleeping in co_sleep.  So add
a flag saying whether streaming has in-flight I/O.  If the busy flag
is false, the coroutine is quiescent and, when cancelled, will not
issue any new I/O.

This protects streaming against closing, but not against deleting.
We have a reference count protecting us against concurrent deletion,
but I still added an assertion to ensure nothing bad happens.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05 14:54:40 +02:00
Zhi Yong Wu
498e386c58 block: disable I/O throttling on sync api
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05 14:54:40 +02:00
Paolo Bonzini
29cdb2513c block: push recursive flushing up from drivers
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05 14:54:39 +02:00
Stefan Hajnoczi
e88774971c block: handle -EBUSY in bdrv_commit_all()
Monitor operations that manipulate image files must not execute while a
background job (like image streaming) is in progress.  This prevents
corruptions from happening when two pieces of code are manipulating the
image file without knowledge of each other.

The monitor "commit" command raises QERR_DEVICE_IN_USE when
bdrv_commit() returns -EBUSY but "commit all" has no error handling.
This is easy to fix, although note that we do not deliver a detailed
error about which device was busy in the "commit all" case.

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-03-12 15:14:06 +01:00
Jeff Cody
8802d1fdd4 qapi: Introduce blockdev-group-snapshot-sync command
This is a QAPI/QMP only command to take a snapshot of a group of
devices. This is similar to the blockdev-snapshot-sync command, except
blockdev-group-snapshot-sync accepts a list devices, filenames, and
formats.

It is attempted to keep the snapshot of the group atomic; if the
creation or open of any of the new snapshots fails, then all of
the new snapshots are abandoned, and the name of the snapshot image
that failed is returned.  The failure case should not interrupt
any operations.

Rather than use bdrv_close() along with a subsequent bdrv_open() to
perform the pivot, the original image is never closed and the new
image is placed 'in front' of the original image via manipulation
of the BlockDriverState fields.  Thus, once the new snapshot image
has been successfully created, there are no more failure points
before pivoting to the new snapshot.

This allows the group of disks to remain consistent with each other,
even across snapshot failures.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Acked-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-02-29 15:48:33 +01:00
Paolo Bonzini
b6a127a156 block: drop aio_multiwrite in BlockDriver
These were never used.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-02-29 12:48:47 +01:00
Hervé Poussineau
f8d3d12857 block: add a transfer rate for floppy types
Floppies must be read at a specific transfer rate, depending of its own format.
Update floppy description table to include required transfer rate.

Signed-off-by: Hervé Poussineau <hpoussin@reactos.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-02-29 12:48:46 +01:00
Luiz Capitulino
6f382ed226 qmp: add DEVICE_TRAY_MOVED event
It's emitted whenever the tray is moved by the guest or by HMP/QMP
commands.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
2012-02-22 17:23:50 -02:00
Luiz Capitulino
f36f394952 block: bdrv_eject(): Make eject_flag a real bool
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
2012-02-22 17:23:05 -02:00
Luiz Capitulino
329c0a48a9 block: Rename bdrv_mon_event() & BlockMonEventAction
They are QMP events, not monitor events. Rename them accordingly.

Also, move bdrv_emit_qmp_error_event() up in the file. A new event will
be added soon and it's good to have them next each other.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
2012-02-22 17:22:35 -02:00
Stefan Hajnoczi
79c053bde9 block: perform zero-detection during copy-on-read
Copy-on-Read populates the image file with data read from a backing
image.  In order to avoid bloating the image file when all zeroes are
read we should scan the buffer and perform an optimized zero write
operation.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-02-09 16:17:50 +01:00
Stefan Hajnoczi
f08f2ddae0 block: add .bdrv_co_write_zeroes() interface
The ability to zero regions of an image file is a useful primitive for
higher-level features such as image streaming or zero write detection.

Image formats may support an optimized metadata representation instead
of writing zeroes into the image file.  This allows zero writes to be
potentially faster than regular write operations and also preserve
sparseness of the image file.

The .bdrv_co_write_zeroes() interface should be implemented by block
drivers that wish to provide efficient zeroing.

Note that this operation is different from the discard operation, which
may leave the contents of the region indeterminate.  That means
discarded blocks are not guaranteed to contain zeroes and may contain
junk data instead.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-02-09 16:17:50 +01:00
Marcelo Tosatti
e8a6bb9caa block: add bdrv_find_backing_image
Add bdrv_find_backing_image: given a BlockDriverState pointer, and an id,
traverse the backing image chain to locate the id.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-01-26 14:49:18 +01:00
Stefan Hajnoczi
eeec61f291 block: add BlockJob interface for long-running operations
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-01-26 11:45:26 +01:00
Stefan Hajnoczi
470c05047a block: make copy-on-read a per-request flag
Previously copy-on-read could only be enabled for all requests to a
block device.  This means requests coming from the guest as well as
QEMU's internal requests would perform copy-on-read when enabled.

For image streaming we want to support finer-grained behavior than just
populating the image file from its backing image.  Image streaming
supports partial streaming where a common backing image is preserved.
In this case guest requests should not perform copy-on-read because they
would indiscriminately copy data which should be left in a backing image
from the backing chain.

Introduce a per-request flag for copy-on-read so that a block device can
process both regular and copy-on-read requests.  Overlapping reads and
writes still need to be serialized for correctness when copy-on-read is
happening, so add an in-flight reference count to track this.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-01-26 11:45:26 +01:00
Stefan Hajnoczi
2d3735d3bf block: check bdrv_in_use() before blockdev operations
Long-running block operations like block migration and image streaming
must have continual access to their block device.  It is not safe to
perform operations like hotplug, eject, change, resize, commit, or
external snapshot while a long-running operation is in progress.

This patch adds the missing bdrv_in_use() checks so that block migration
and image streaming never have the rug pulled out from underneath them.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-01-26 11:45:26 +01:00
Paolo Bonzini
3f3aace830 block: avoid useless checks on acb->bh
Coverity is confused by this "if" and reports leaks on acb->bh.
The bottom half is always deleted before releasing the AIOCB,
in either bdrv_aio_cancel_em or bdrv_aio_bh_cb.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-15 12:40:08 +01:00
Paolo Bonzini
df9309fb43 block: simplify failure handling for bdrv_aio_multiwrite
Now that early failure of bdrv_aio_writev is not possible anymore,
mcb->num_requests can be set before the loop starts.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-15 12:40:07 +01:00
Paolo Bonzini
ad54ae80c7 block: bdrv_aio_* do not return NULL
Initially done with the following semantic patch:

@ rule1 @
expression E;
statement S;
@@
  E =
(
   bdrv_aio_readv
|  bdrv_aio_writev
|  bdrv_aio_flush
|  bdrv_aio_discard
|  bdrv_aio_ioctl
)
     (...);
(
- if (E == NULL) { ... }
|
- if (E)
    { <... S ...> }
)

which however missed the occurrence in block/blkverify.c
(as it should have done), and left behind some unused
variables.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-15 12:40:07 +01:00
Stefan Hajnoczi
922453bca6 block: convert qemu_aio_flush() calls to bdrv_drain_all()
Many places in QEMU call qemu_aio_flush() to complete all pending
asynchronous I/O.  Most of these places actually want to drain all block
requests but there is no block layer API to do so.

This patch introduces the bdrv_drain_all() API to wait for requests
across all BlockDriverStates to complete.  As a bonus we perform checks
after qemu_aio_wait() to ensure that requests really have finished.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:56:06 +01:00
Stefan Hajnoczi
5f8b6491f2 block: wait_for_overlapping_requests() deadlock detection
Debugging a reentrant request deadlock was fun but in the future we need
a quick and obvious way of detecting such bugs.  Add an assert that
checks we are not about to deadlock when waiting for another request.

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:52:34 +01:00
Stefan Hajnoczi
bd9533e36e block: implement bdrv_co_is_allocated() boundary cases
Cases beyond the end of the disk image are only implemented for block
drivers that do not provide .bdrv_co_is_allocated().  It's worth making
these cases generic so that block drivers that do implement
.bdrv_co_is_allocated() also get them for free.

Suggested-by: Mark Wu <wudxw@linux.vnet.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:51:39 +01:00
Stefan Hajnoczi
ab1859218a block: core copy-on-read logic
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:51:38 +01:00
Stefan Hajnoczi
d83947ac6d block: request overlap detection
Detect overlapping requests and remember to align to cluster boundaries
if the image format uses them.  This assumes that allocating I/O is
performed in cluster granularity - which is true for qcow2, qed, etc.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:51:38 +01:00
Stefan Hajnoczi
f4658285f9 block: wait for overlapping requests
When copy-on-read is enabled it is necessary to wait for overlapping
requests before issuing new requests.  This prevents races between the
copy-on-read and a write request.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:51:38 +01:00
Stefan Hajnoczi
53fec9d3fd block: add interface to toggle copy-on-read
The bdrv_enable_copy_on_read()/bdrv_disable_copy_on_read() functions can
be used to programmatically enable or disable copy-on-read for a block
device.  Later patches add the actual copy-on-read logic.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:51:38 +01:00
Stefan Hajnoczi
dbffbdcfff block: add request tracking
The block layer does not know about pending requests.  This information
is necessary for copy-on-read since overlapping requests must be
serialized to prevent races that corrupt the image.

The BlockDriverState gets a new tracked_request list field which
contains all pending requests.  Each request is a BdrvTrackedRequest
record with sector_num, nb_sectors, and is_write fields.

Note that request tracking is always enabled but hopefully this extra
work is so small that it doesn't justify adding an enable/disable flag.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:51:38 +01:00
Stefan Hajnoczi
060f51c9de block: add bdrv_co_is_allocated() interface
This patch introduces the public bdrv_co_is_allocated() interface which
can be used to query image allocation status while the VM is running.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:51:37 +01:00
Stefan Hajnoczi
6aebab140d block: drop .bdrv_is_allocated() interface
Now that all block drivers have been converted to
.bdrv_co_is_allocated() we can drop .bdrv_is_allocated().

Note that the public bdrv_is_allocated() interface is still available
but is in fact a synchronous wrapper around .bdrv_co_is_allocated().

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:51:37 +01:00
Stefan Hajnoczi
376ae3f1cb block: add .bdrv_co_is_allocated()
This patch adds the .bdrv_co_is_allocated() interface which is identical
to .bdrv_is_allocated() but runs in coroutine context.  Running in
coroutine context implies that other coroutines might be performing I/O
at the same time.   Therefore it must be safe to run while the following
BlockDriver functions are in-flight:

    .bdrv_co_readv()
    .bdrv_co_writev()
    .bdrv_co_flush()
    .bdrv_co_is_allocated()

The new .bdrv_co_is_allocated() interface is useful because it can be
used when a VM is running, whereas .bdrv_is_allocated() is a synchronous
interface that does not cope with parallel requests.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:51:36 +01:00
Stefan Hajnoczi
05c4af54c6 block: use public bdrv_is_allocated() interface
There is no need for bdrv_commit() to use the BlockDriver
.bdrv_is_allocated() interface directly.  Converting to the public
interface gives us the freedom to drop .bdrv_is_allocated() entirely in
favor of a new .bdrv_co_is_allocated() in the future.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:51:36 +01:00
Zhi Yong Wu
727f005e6a hmp/qmp: add block_set_io_throttle
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:51:35 +01:00
Zhi Yong Wu
98f90dba5e block: add I/O throttling algorithm
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:51:35 +01:00
Zhi Yong Wu
0563e19151 block: add the blockio limits command line support
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:51:35 +01:00
Anthony Liguori
0f15423c32 block: allow migration to work with image files (v3)
Image files have two types of data: immutable data that describes things like
image size, backing files, etc. and mutable data that includes offset and
reference count tables.

Today, image formats aggressively cache mutable data to improve performance.  In
some cases, this happens before a guest even starts.  When dealing with live
migration, since a file is open on two machines, the caching of meta data can
lead to data corruption.

This patch addresses this by introducing a mechanism to invalidate any cached
mutable data a block driver may have which is then used by the live migration
code.

NB, this still requires coherent shared storage.  Addressing migration without
coherent shared storage (i.e. NFS) requires additional work.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-11-21 14:58:48 -06:00
Kevin Wolf
ca716364f0 block: Make cache=unsafe flush to the OS
cache=unsafe completely ignored bdrv_flush, because flushing the host disk
costs a lot of performance. However, this means that qcow2 images (and
potentially any other format) can lose data even after the guest has issued a
flush if the qemu process crashes/is killed. In case of a host crash, data loss
is certainly expected with cache=unsafe, but if just the qemu process dies this
is a bit too unsafe.

Now that we have two separate flush functions, we can choose to flush
everythign to the OS, but don't enforce that it's physically written to the
disk.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-11-11 14:02:59 +01:00
Kevin Wolf
eb489bb1ec block: Introduce bdrv_co_flush_to_os
qcow2 has a writeback metadata cache, so flushing a qcow2 image actually
consists of writing back that cache to the protocol and only then flushes the
protocol in order to get everything stable on disk.

This introduces a separate bdrv_co_flush_to_os to reflect the split.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-11-11 14:02:59 +01:00
Kevin Wolf
c68b89acd6 block: Rename bdrv_co_flush to bdrv_co_flush_to_disk
There are two different types of flush that you can do: Flushing one level up
to the OS (i.e. writing data to the host page cache) or flushing it all the way
down to the disk. The existing functions flush to the disk, reflect this in the
function name.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-11-11 14:02:59 +01:00
Paolo Bonzini
025ccaa7f9 block: add eject request callback
Recent versions of udev always keep the tray locked so that the kernel
can observe "eject request" events (aka tray button presses) even on
discs that aren't mounted.  Add support for these events in the ATAPI
and SCSI cd drive device models.

To let management cope with the behavior of udev, an event should also
be added for "tray opened/closed".  This way, after issuing an "eject"
command, management can poll until the guests actually reacts to the
command.  They can then issue the "change" command after the tray has been
opened, or try with "eject -f" after a (configurable?) timeout.  However,
with this patch and the corresponding support in the device models,
at least it is possible to do a manual two-step eject+change sequence.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-11-11 14:02:57 +01:00
Anthony Liguori
8494a397b6 Merge remote-tracking branch 'kwolf/for-anthony' into staging
Conflicts:
	block/vmdk.c
2011-10-31 11:09:00 -05:00
Stefan Hajnoczi
03f541bd6e block: reinitialize across bdrv_close()/bdrv_open()
Several BlockDriverState fields are not being reinitialized across
bdrv_close()/bdrv_open().  Make sure they are reset to their default
values.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-10-28 19:25:50 +02:00
Stefan Hajnoczi
e7c637967e block: set bs->read_only before .bdrv_open()
Several block drivers set bs->read_only in .bdrv_open() but
block.c:bdrv_open_common() clobbers its value.  Additionally, QED uses
bdrv_is_read_only() in .bdrv_open() to decide whether to perform
consistency checks.

The correct ordering is to initialize bs->read_only from the open flags
before calling .bdrv_open().  This way block drivers can override it if
necessary and can use bdrv_is_read_only() in .bdrv_open().

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-10-28 19:25:49 +02:00
Kevin Wolf
2b5728164f block: Fix bdrv_open use after free
tmp_filename was used outside the block it was defined in, i.e. after it went
out of scope. Move its declaration to the top level.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-10-28 19:25:49 +02:00
Kevin Wolf
3574c60819 block: Remove dead code
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-10-28 19:25:49 +02:00
Luiz Capitulino
f795e743bd Drop qemu-objects.h from modules that don't require it
Previous commits dropped most qobjects usage from qemu modules
(now they are a low level interface used by the QAPI). However,
some modules still include the qemu-objects.h header file.

This commit drops qemu-objects.h from some of those modules
and includes qjson.h instead, which is what they actually need.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2011-10-27 11:48:47 -02:00
Luiz Capitulino
f11f57e405 qapi: Convert query-blockstats
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2011-10-27 11:48:47 -02:00
Luiz Capitulino
b202381800 qapi: Convert query-block
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2011-10-27 11:48:47 -02:00
Luiz Capitulino
58e21ef5ab block: Rename the BlockIOStatus enum values
The biggest change is to rename its prefix from BDRV_IOS to
BLOCK_DEVICE_IO_STATUS.

Next commit will convert the query-block command to the QAPI
and that's how the enumeration is going to be generated.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2011-10-27 11:48:47 -02:00
Luiz Capitulino
d6bf279e7a block: iostatus: Drop BDRV_IOS_INVAL
A future commit will convert bdrv_info() to the QAPI and it won't
provide IOS_INVAL.

Luckily all we have to do is to add a new 'iostatus_enabled'
member to BlockDriverState and use it instead.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2011-10-27 11:48:47 -02:00
Paolo Bonzini
6db39ae2e2 block: change discard to co_discard
Since coroutine operation is now mandatory, convert both bdrv_discard
implementations to coroutines.  For qcow2, this means taking the lock
around the operation.  raw-posix remains synchronous.

The bdrv_discard callback is then unused and can be eliminated.

Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-10-21 17:34:14 +02:00
Paolo Bonzini
8b94ff8573 block: change flush to co_flush
Since coroutine operation is now mandatory, convert all bdrv_flush
implementations to coroutines.  For qcow2, this means taking the lock.
Other implementations are simpler and just forward bdrv_flush to the
underlying protocol, so they can avoid the lock.

The bdrv_flush callback is then unused and can be eliminated.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-10-21 17:34:14 +02:00
Paolo Bonzini
4265d620c5 block: add bdrv_co_discard and bdrv_aio_discard support
This similarly adds support for coroutine and asynchronous discard.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-10-21 17:34:13 +02:00
Paolo Bonzini
07f0761574 block: unify flush implementations
Add coroutine support for flush and apply the same emulation that
we already do for read/write.  bdrv_aio_flush is simplified to always
go through a coroutine.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-10-21 17:34:13 +02:00
Paolo Bonzini
35246a6825 block: rename bdrv_co_rw_bh
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-10-21 17:34:12 +02:00
Stefan Hajnoczi
09f085d59d block: drop bdrv_has_async_rw()
Commit cd74d83345e0e3b708330ab8c4cd9111bb82cda6 ("block: switch
bdrv_read()/bdrv_write() to coroutines") removed the bdrv_has_async_rw()
callers.  This patch removes bdrv_has_async_rw() since it is no longer
used.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-10-14 17:31:22 +02:00
Stefan Hajnoczi
f8c35c1d59 block: drop .bdrv_read()/.bdrv_write() emulation
There is no need to emulate .bdrv_read()/.bdrv_write() since these
interfaces are only called if aio and coroutine interfaces are not
present.  All valid BlockDrivers must implement either sync, aio, or
coroutine interfaces.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-10-14 17:31:22 +02:00
Stefan Hajnoczi
8c5873d697 block: drop emulation functions that use coroutines
Block drivers that implement coroutine functions used to get sync and
aio wrappers.  This is no longer necessary since all request processing
now happens in a coroutine.  If a block driver implements the coroutine
interface then none of the other interfaces will be invoked.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-10-14 17:31:22 +02:00
Stefan Hajnoczi
1a6e115b19 block: switch bdrv_aio_writev() to coroutines
More sync, aio, and coroutine unification.  Make bdrv_aio_writev() go
through coroutine request processing.

Remove the dirty block callback mechanism which was needed only for aio
processing and can be done more naturally in coroutine context.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-10-13 15:02:54 +02:00
Stefan Hajnoczi
6b7cb2479b block: mark blocks dirty on coroutine write completion
The aio write operation marks blocks dirty when the write operation
completes.  The coroutine write operation marks blocks dirty before
issuing the write operation.

It seems safest to mark the block dirty when the operation completes so
that anything tracking dirty blocks will not act before the change has
been made to the image file.

Make the coroutine write operation dirty blocks on write completion.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-10-13 15:02:54 +02:00
Stefan Hajnoczi
b2a6137166 block: switch bdrv_aio_readv() to coroutines
More sync, aio, and coroutine unification.  Make bdrv_aio_readv() go
through coroutine request processing.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-10-13 15:02:54 +02:00
Stefan Hajnoczi
1c9805a398 block: switch bdrv_read()/bdrv_write() to coroutines
The bdrv_read()/bdrv_write() functions call .bdrv_read()/.bdrv_write().
They should go through bdrv_co_do_readv() and bdrv_co_do_writev()
instead in order to unify request processing code across sync, aio, and
coroutine interfaces.  This is also an important step towards removing
BlockDriverState .bdrv_read()/.bdrv_write() in the future.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-10-13 15:02:53 +02:00
Stefan Hajnoczi
c5fbe57111 block: split out bdrv_co_do_readv() and bdrv_co_do_writev()
The public interface for I/O in coroutine context is bdrv_co_readv() and
bdrv_co_writev().  Split out the request processing code into
bdrv_co_do_readv() and bdrv_co_writev() so that it can be called
internally when we refactor all request processing to use coroutines.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-10-13 15:02:53 +02:00
Stefan Hajnoczi
1ed20acf2f block: directly invoke .bdrv_* from emulation functions
The emulation functions which supply default BlockDriver .bdrv_*()
functions given another implemented .bdrv_*() function should not use
public bdrv_*() interfaces.  This patch ensures they invoke .bdrv_*()
directly to avoid adding an extra layer of coroutine request processing
and possibly entering an infinite loop.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-10-13 15:02:53 +02:00
Stefan Hajnoczi
a652d16025 block: directly invoke .bdrv_aio_*() in bdrv_co_io_em()
We will unify block layer request processing across sync, aio, and
coroutines and this means a .bdrv_co_*() emulation function should not
call back into the public interface.  There's no need here, just call
.bdrv_aio_*() directly.

The gory details: bdrv_co_io_em() cannot call back into the public
bdrv_aio_*() interface since that will be handled using coroutines,
which causes us to call into bdrv_co_io_em() again in an infinite loop
:).

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-10-13 15:02:27 +02:00
Luiz Capitulino
d2078cc238 HMP: Print 'io-status' information
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-10-11 09:42:45 +02:00
Luiz Capitulino
f04ef60100 QMP: query-status: Add 'io-status' key
Contains the I/O status for the given device. The key is only present
if the device supports it and the VM is configured to stop on errors.

Please, check the documentation being added in this commit for more
information.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-10-11 09:42:45 +02:00
Luiz Capitulino
28a7282a5d block: Keep track of devices' I/O status
This commit adds support to the BlockDriverState type to keep track
of devices' I/O status.

There are three possible status: BDRV_IOS_OK (no error), BDRV_IOS_ENOSPC
(no space error) and BDRV_IOS_FAILED (any other error). The distinction
between no space and other errors is important because a management
application may want to watch for no space in order to extend the
space assigned to the VM and put it to run again.

Qemu devices supporting the I/O status feature have to enable it
explicitly by calling bdrv_iostatus_enable() _and_ have to be
configured to stop the VM on errors (ie. werror=stop|enospc or
rerror=stop).

In case of multiple errors being triggered in sequence only the first
one is stored. The I/O status is always reset to BDRV_IOS_OK when the
'cont' command is issued.

Next commits will add support to some devices and extend the
query-block/info block commands to return the I/O status information.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-10-11 09:41:47 +02:00
Stefan Hajnoczi
59370aaa56 trace: add arguments to bdrv_co_io_em() trace event
It is useful to know the BlockDriverState as well as the
sector_num/nb_sectors of an emulated .bdrv_co_*() request.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2011-10-03 10:56:27 +01:00
Stefan Hajnoczi
28dcee10c5 trace: trace bdrv_open_common()
bdrv_open_common() is a useful point to trace since it reveals the
filename and block driver for a given BlockDriverState.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2011-10-03 10:55:50 +01:00
Markus Armbruster
7d4b4ba5c2 block: New change_media_cb() parameter load
To let device models distinguish between eject and load.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-09-12 15:17:22 +02:00
Markus Armbruster
29e05f2022 block: Reset buffer alignment on detach
BlockDriverState member buffer_alignment is initially 512.  The device
model may set them, with bdrv_set_buffer_alignment().  If the device
model gets detached (hot unplug), the device's alignment is left
behind.  Only okay because device hot unplug automatically destroys
the BlockDriverState.  But that's a questionable feature, best not to
rely on it.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-09-12 15:17:22 +02:00
Markus Armbruster
7b6f9300d5 block: New bdrv_set_buffer_alignment()
Device models should be able to set it without an unclean include of
block_int.h.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-09-12 15:17:22 +02:00
Markus Armbruster
e4def80b36 block: Show whether the virtual tray is open in info block
Need to ask the device, so this requires new BlockDevOps member
is_tray_open().

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-09-12 15:17:21 +02:00
Markus Armbruster
9e6a4c9177 block: Drop BlockDriverState member removable
It's a confused mess (see previous commit).  No users remain.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-09-12 15:17:21 +02:00
Markus Armbruster
2c6942fa7b block: Clean up remaining users of "removable"
BlockDriverState member removable is a confused mess.  It is true when
an ide-cd, scsi-cd or floppy qdev is attached, or when the
BlockDriverState was created with -drive if={floppy,sd} or -drive
if={ide,scsi,xen,none},media=cdrom ("created removable"), except when
an ide-hd, scsi-hd, scsi-generic or virtio-blk qdev is attached.

Three users remain:

1. eject_device(), via bdrv_is_removable() uses it to determine
   whether a block device can eject media.

2. bdrv_info() is monitor command "info block".  QMP documentation
   says "true if the device is removable, false otherwise".  From the
   monitor user's point of view, the only sensible interpretation of
   "is removable" is "can eject media with monitor commands eject and
   change".

A block device can eject media unless a device is attached that
doesn't support it.  Switch the two users over to new
bdrv_dev_has_removable_media() that returns exactly that.

3. bdrv_getlength() uses to suppress its length cache when media can
   change (see commit 46a4e4e6).  Media change is either monitor
   command change (updates the length cache), monitor command eject
   (doesn't update the length cache, easily fixable), or physical
   media change (invalidates length cache, not so easily fixable).

I'm refraining from improving anything here, because this series is
long enough already.  Instead, I simply switch it over to
bdrv_dev_has_removable_media() as well.

This changes the behavior of the length cache and of monitor commands
eject and change in two cases:

a. drive not created removable, no device attached

   The commit makes the drive removable, and defeats the length cache.

   Example: -drive if=none

b. drive created removable, but the attached drive is non-removable,
   and doesn't call bdrv_set_removable(..., 0) (most devices don't)

   The commit makes the drive non-removable, and enables the length
   cache.

   Example: -drive if=xen,media=cdrom -M xenpv

   The other non-removable devices that don't call
   bdrv_set_removable() can't currently use a drive created removable,
   either because they aren't qdevified, or because they lack a drive
   property.  Won't stay that way.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-09-12 15:17:21 +02:00
Markus Armbruster
025e849a50 block: Rename bdrv_set_locked() to bdrv_lock_medium()
While there, make the locked parameter bool.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-09-12 15:17:20 +02:00
Markus Armbruster
f107639a6f block: Drop medium lock tracking, ask device models instead
Requires new BlockDevOps member is_medium_locked().  Implement for IDE
and SCSI CD-ROMs.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-09-12 15:17:20 +02:00
Markus Armbruster
fdec4404dd block: Leave enforcing tray lock to device models
The device model knows best when to accept the guest's eject command.
No need to detour through the block layer.

bdrv_eject() can't fail anymore.  Make it void.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-09-12 15:17:20 +02:00
Markus Armbruster
22cf56c4d8 block: Drop tray status tracking, no longer used
Commit 4be9762a is now completely redone.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-09-12 15:17:20 +02:00
Markus Armbruster
a1aff5bf67 block: Revert entanglement of bdrv_is_inserted() with tray status
Commit 4be9762a changed bdrv_is_inserted() to fail when the tray is
open.  Unfortunately, there are two different kinds of users, with
conflicting needs.

1. Device models using bdrv_eject(), currently ide-cd and scsi-cd.
They expect bdrv_is_inserted() to reflect the tray status.  Commit
4be9762a makes them happy.

2. Code that wants to know whether a BlockDriverState has media, such
as find_image_format(), bdrv_flush_all().  Commit 4be9762a makes them
unhappy.  In particular, it breaks flush on VM stop for media ejected
by the guest.

Revert the change to bdrv_is_inserted().  Check the tray status in the
device models instead.

Note on IDE: Since only ATAPI devices have a tray, and they don't
accept ATA commands since the recent commit "ide: Reject ATA commands
specific to drive kinds", checking in atapi.c suffices.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-09-12 15:17:20 +02:00
Markus Armbruster
07b70bfbb3 savevm: Include writable devices with removable media
savevm and loadvm silently ignore block devices with removable media,
such as floppies and SD cards.  Rolling back a VM to a previous
checkpoint will *not* roll back writes to block devices with removable
media.

Moreover, bdrv_is_removable() is a confused mess, and wrong in at
least one case: it considers "-drive if=xen,media=cdrom -M xenpv"
removable.  It'll be cleaned up later in this series.

Read-only block devices are also ignored, but that's okay.

Fix by ignoring only read-only block devices and empty block devices.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-09-06 11:24:07 +02:00
Markus Armbruster
c602a489f9 block: Clean up bdrv_flush_all()
Change (!bdrv_is_removable(bs) || bdrv_is_inserted(bs)) to just
bdrv_is_inserted().  Rationale:

    The value of bdrv_is_removable(bs) matters only when
    bdrv_is_inserted(bs) is false.

    bdrv_is_inserted(bs) is true when bs is open (bs->drv != NULL) and
    not an empty host drive (CD-ROM or floppy).

    Therefore, bdrv_is_removable(bs) matters only when:

    1. bs is not open
       old: may call bdrv_flush(bs), which does nothing
       new: won't call

    2. bs is an empty host drive
       old: may call bdrv_flush(bs), which calls driver method
            raw_flush(), which calls fdatasync() or equivalent, which
            can't do anything useful while the drive is empty
       new: won't call

Result is bs->drv && !bdrv_is_read_only(bs) && bdrv_is_inserted(bs).
bdrv_is_inserted(bs) implies bs->drv.  Drop the redundant test.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-09-06 11:24:07 +02:00
Markus Armbruster
8e49ca4624 block: Leave tracking media change to device models
hw/fdc.c is the only one that cares.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-09-06 11:24:06 +02:00
Markus Armbruster
145feb176f block: Split change_cb() into change_media_cb(), resize_cb()
Multiplexing callbacks complicates matters needlessly.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-09-06 11:23:51 +02:00
Markus Armbruster
0e49de5232 block: Generalize change_cb() to BlockDevOps
So we can more easily add device model callbacks.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-09-06 11:23:51 +02:00
Markus Armbruster
fa879d62eb block: Attach non-qdev devices as well
For now, this just protects against programming errors like having the
same drive back multiple non-qdev devices, or untimely bdrv_delete().
Later commits will add other interesting uses.

While there, rename BlockDriverState member peer to dev, bdrv_attach()
to bdrv_attach_dev(), bdrv_detach() to bdrv_detach_dev(), and
bdrv_get_attached() to bdrv_get_attached_dev().

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-09-06 11:23:51 +02:00
Stefan Weil
541dc0d47f Use new macro QEMU_PACKED for packed structures
Most changes were made using these commands:

git grep -la '__attribute__((packed))'|xargs perl -pi -e 's/__attribute__\(\(packed\)\)/QEMU_PACKED/'
git grep -la '__attribute__ ((packed))'|xargs perl -pi -e 's/__attribute__ \(\(packed\)\)/QEMU_PACKED/'
git grep -la '__attribute__((__packed__))'|xargs perl -pi -e 's/__attribute__\(\(__packed__\)\)/QEMU_PACKED/'
git grep -la '__attribute__ ((__packed__))'|xargs perl -pi -e 's/__attribute__ \(\(__packed__\)\)/QEMU_PACKED/'
git grep -la '__attribute((packed))'|xargs perl -pi -e 's/__attribute\(\(packed\)\)/QEMU_PACKED/'

Whitespace in linux-user/syscall_defs.h was fixed manually
to avoid warnings from scripts/checkpatch.pl.

Manual changes were also applied to hw/pc.c.

I did not fix indentation with tabs in block/vvfat.c.
The patch will show 4 errors with scripts/checkpatch.pl.

Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2011-09-03 10:45:59 +00:00
Christoph Hellwig
c488c7f649 block: latency accounting
Account the total latency for read/write/flush requests.  This allows
management tools to average it based on a snapshot of the nr ops
counters and allow checking for SLAs or provide statistics.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-26 18:18:38 +02:00
Christoph Hellwig
a597e79ce1 block: explicit I/O accounting
Decouple the I/O accounting from bdrv_aio_readv/writev/flush and
make the hardware models call directly into the accounting helpers.

This means:
 - we do not count internal requests from image formats in addition
   to guest originating I/O
 - we do not double count I/O ops if the device model handles it
   chunk wise
 - we only account I/O once it actuall is done
 - can extent I/O accounting to synchronous or coroutine I/O easily
 - implement I/O latency tracking easily (see the next patch)

I've conveted the existing device model callers to the new model,
device models that are using synchronous I/O and weren't accounted
before haven't been updated yet.  Also scsi hasn't been converted
to the end-to-end accounting as I want to defer that after the pending
scsi layer overhaul.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-25 18:18:42 +02:00
Christoph Hellwig
e8045d6726 block: include flush requests in info blockstats
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-23 17:41:14 +02:00
Stefan Hajnoczi
92196b2f56 block: add cache=directsync parameter to -drive
This patch adds -drive cache=directsync for O_DIRECT | O_SYNC host file
I/O with no disk write cache presented to the guest.

This mode is useful when guests may not be sending flushes when
appropriate and therefore leave data at risk in case of power failure.
When cache=directsync is used, write operations are only completed to
the guest when data is safely on disk.

This new mode is like cache=writethrough but it bypasses the host page
cache.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-23 14:15:17 +02:00
Stefan Hajnoczi
c3993cdca3 block: parse cache mode flags in a single place
This patch introduces bdrv_parse_cache_flags() which sets open flags
given a cache mode.  Previously this was duplicated in blockdev.c and
qemu-img.c.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-23 14:15:17 +02:00
Robert Wang
d62b5dea30 fix code format
Fix code format to make checkpatch.pl happy.

Signed-off-by: Robert Wang <wdongxu@linux.vnet.ibm.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-08-22 10:17:52 -05:00
Anthony Liguori
7267c0947d Use glib memory allocation and free functions
qemu_malloc/qemu_free no longer exist after this commit.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-08-20 23:01:08 -05:00
Kevin Wolf
e7a8a7837a block: Use bdrv_co_* instead of synchronous versions in coroutines
If we're already in a coroutine, there is no reason to use the synchronous
version of block layer functions when a coroutine one exists. This makes
bdrv_read/write/flush use bdrv_co_* when used inside a coroutine.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-04 11:27:15 +02:00
Kevin Wolf
384acbf46b async: Remove AsyncContext
The purpose of AsyncContexts was to protect qcow and qcow2 against reentrancy
during an emulated bdrv_read/write (which includes a qemu_aio_wait() call and
can run AIO callbacks of different requests if it weren't for AsyncContexts).

Now both qcow and qcow2 are protected by CoMutexes and AsyncContexts can be
removed.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-02 15:53:41 +02:00
Kevin Wolf
f9f05dc58c block: Add bdrv_co_readv/writev emulation
In order to be able to call bdrv_co_readv/writev for drivers that don't
implement the functions natively, add an emulation that uses the AIO functions
to implement them.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-02 15:53:40 +02:00
Kevin Wolf
6848542018 block: Emulate AIO functions with bdrv_co_readv/writev
Use the bdrv_co_readv/writev callbacks to implement bdrv_aio_readv/writev and
bdrv_read/write if a driver provides the coroutine version instead of the
synchronous or AIO version.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-02 15:53:40 +02:00
Kevin Wolf
da1fa91d6c block: Add bdrv_co_readv/writev
Add new block driver callbacks bdrv_co_readv/writev, which work on a
QEMUIOVector like bdrv_aio_*, but don't need a callback. The function may only
be called inside a coroutine, so a block driver implementing this interface can
yield instead of blocking during I/O.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-02 15:53:40 +02:00
Frediano Ziglio
5bf3f8e4f7 block: Removed unused function bdrv_write_sync
Signed-off-by: Frediano Ziglio <freddy77@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-01 12:10:29 +02:00
Markus Armbruster
49aa46bb4b block: Don't let locked flag prevent medium load
Commit aea2a33c made bdrv_eject() obey the locked flag.  Correct for
medium eject (eject_flag set), incorrect for medium load (eject_flag
clear).  See MMC-5 Table 341 "Actions for Lock/Unlock/Eject".

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-01 12:10:28 +02:00
Markus Armbruster
822e1cd17e block: Make BlockDriver method bdrv_eject() return void
Callees always return 0, except for FreeBSD's cdrom_eject(), which
returns -ENOTSUP when the device is in a terminally wedged state.

The only caller is bdrv_eject(), and it maps -ENOTSUP to 0 since
commit 4be9762a.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-01 12:10:28 +02:00
Markus Armbruster
a19712b0db block: Reset device model callbacks on detach
BlockDriverState members change_cb and change_opaque are initially
null.  The device model may set them, with bdrv_set_change_cb().  If
the device model gets detached (hot unplug), they're left dangling.
Only safe because device hot unplug automatically destroys the
BlockDriverState.  But that's a questionable feature, best not to rely
on it.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-01 12:09:11 +02:00
Fam Zheng
4a1d5e1fde block: add bdrv_get_allocated_file_size() operation
qemu-img.c wants to count allocated file size of image. Previously it
counts a single bs->file by 'stat' or Window API. As VMDK introduces
multiple file support, the operation becomes format specific with
platform specific meanwhile.

The functions are moved to block/raw-{posix,win32}.c and qemu-img.c calls
bdrv_get_allocated_file_size to count the bs. And also added VMDK code
to count his own extents.

Signed-off-by: Fam Zheng <famcool@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-07-19 15:39:08 +02:00
Kevin Wolf
d220894e02 bdrv_img_create: Fix segfault
Block drivers that don't support creating images don't have a size option. Fail
gracefully instead of segfaulting when trying to access the option's value.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-06-08 11:56:40 +02:00
Christoph Hellwig
a659979328 block: clarify the meaning of BDRV_O_NOCACHE
Change BDRV_O_NOCACHE to only imply bypassing the host OS file cache,
but no writeback semantics.  All existing callers are changed to also
specify BDRV_O_CACHE_WB to give them writeback semantics.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-06-08 10:39:32 +02:00
Markus Armbruster
8d278467ff block: Remove type hint, it's guest matter, doesn't belong here
No users of bdrv_get_type_hint() left.  bdrv_set_type_hint() can make
the media removable by side effect.  Make that explicit.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-05-19 10:26:23 +02:00
Markus Armbruster
d8aeeb31d5 block QMP: Deprecate query-block's "type", drop info block's "type="
query-block's specification documents response member "type" with
values "hd", "cdrom", "floppy", "unknown".

Its value is unreliable: a block device used as floppy has type
"floppy" if created with if=floppy, but type "hd" if created with
if=none.

That's because with if=none, the type is at best a declaration of
intent: the drive can be connected to any guest device.  Its type is
really the guest device's business.  Reporting it here is wrong.

No known user of QMP uses "type".  It's unlikely that any unknown
users exist, because its value is useless unless you know how the
block device was created.  But then you also know the true value.

Fixing the broken value risks breaking (hypothetical!) clients that
somehow rely on the current behavior.  Not fixing the value risks
breaking (hypothetical!) clients that rely on the value to be
accurate.  Can't entirely avoid hypothetical lossage.  Change the
value to be always "unknown".

This makes "info block" always report "type=unknown".  Pointless.
Change it to not report the type.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-05-19 10:26:19 +02:00
Stefan Weil
a1c7273b82 Fix typos in comments and code (occured -> occurred and related)
The code changed here is an unused data type name (evt_flush_occurred).

Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2011-05-08 10:02:18 +01:00
Stefan Weil
ebabb67a17 Fix typo in code and comments
Replace writeable -> writable

Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2011-05-06 08:19:25 +01:00
Stefan Hajnoczi
46a4e4e608 block: Do not cache device size for removable media
The block layer caches the device size to avoid doing lseek(fd, 0,
SEEK_END) every time this value is needed.  For removable media the
device size becomes stale if a new medium is inserted.  This patch
simply prevents device size caching for removable media.

A smarter solution is to update the cached device size when a new medium
is inserted.  Given that there are currently bugs with CD-ROM media
change I do not want to implement that approach until we've gotten
things correct first.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-04-07 13:51:47 +02:00
Stefan Hajnoczi
b8c6d09589 trace: Trace bdrv_set_locked()
It can be handy to know when the guest locks/unlocks the CD-ROM tray.
This trace event makes that possible.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-04-07 13:51:47 +02:00
Ryan Harper
d22b2f41c4 Do not delete BlockDriverState when deleting the drive
When removing a drive from the host-side via drive_del we currently have
the following path:

drive_del
qemu_aio_flush()
bdrv_close()    // zaps bs->drv, which makes any subsequent I/O get
                // dropped.  Works as designed
drive_uninit()
bdrv_delete()   // frees the bs.  Since the device is still connected to
                // bs, any subsequent I/O is a use-after-free.

The value of bs->drv becomes unpredictable on free.  As long as it
remains null, I/O still gets dropped, however it could become non-null
at any point after the free resulting SEGVs or other QEMU state
corruption.

To resolve this issue as simply as possible, we can chose to not
actually delete the BlockDriverState pointer.  Since bdrv_close()
handles setting the drv pointer to NULL, we just need to remove the
BlockDriverState from the QLIST that is used to enumerate the block
devices.  This is currently handled within bdrv_delete, so move this
into its own function, bdrv_make_anon().

The result is that we can now invoke drive_del, this closes the file
descriptors and sets BlockDriverState->drv to NULL which prevents futher
IO to the device, and since we do not free BlockDriverState, we don't
have to worry about the copy retained in the block devices.

We also don't attempt to remove the qdev property since we are no longer
deleting the BlockDriverState on drives with associated drives.  This
also allows for removing Drives with no devices associated either.

Reported-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-04-07 13:51:47 +02:00
Ryan Harper
301db7c2dd Don't allow multiwrites against a block device without underlying medium
If the block device has been closed, we no longer have a medium to submit
IO against, check for this before submitting io.  This prevents a segfault
further in the code where we dereference elements of the block driver.

Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-03-15 13:21:14 +01:00
Stefan Hajnoczi
a13aac04e1 trace: Trace bdrv_aio_flush()
Add a trace event for bdrv_aio_flush() to complement the existing
bdrv_aio_readv() and bdrv_aio_writev() events.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2011-03-07 15:34:42 +00:00
Blue Swirl
5bbdbb4676 fdc: move floppy geometry guessing to block.c
Other geometry guessing functions already reside in block.c.

Remove some unused or debugging only fields.

Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2011-02-20 09:33:17 +00:00
Marcelo Tosatti
8591675f44 block: enable in_use flag
Set block device in use during block migration, disallow drive_del and
bdrv_truncate for in use devices.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-02-07 12:51:19 +01:00
Marcelo Tosatti
db593f2565 Add flag to indicate external users to block device
Certain operations such as drive_del or resize cannot be performed
while external users (eg. block migration) reference the block device.

Add a flag to indicate that.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-02-07 12:51:19 +01:00
Christoph Hellwig
db97ee6a97 block: tell drivers about an image resize
Extend the change_cb callback with a reason argument, and use it
to tell drivers about size changes.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-01-31 10:03:00 +01:00
Stefan Hajnoczi
96df67d1c3 block: Use backing format driver during image creation
The backing format should be honored during image creation.  For some
reason we currently use the image format to open the backing file.  This
fails when the backing file has a different format than the image being
created.  Keep the image and backing format drivers completely separate.

Also print the backing filename if there is an error opening the backing
file instead of the image filename.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Acked-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-01-24 16:49:50 +01:00
Blue Swirl
71df0eeb98 block: delete a write-only variable
Avoid a warning with GCC 4.6.0:
/src/qemu/block.c: In function 'bdrv_img_create':
/src/qemu/block.c:2862:25: error: variable 'fmt' set but not used [-Werror=unused-but-set-variable]

CC: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2011-01-06 18:25:37 +00:00
Christoph Hellwig
bb8bf76fb1 block: add discard support
Add a new bdrv_discard method to free blocks in a mapping image, and a new
drive property to set the granularity for these discard.  If no discard
granularity support is set discard support is disabled.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-12-17 16:11:03 +01:00
Jes Sorensen
4f70f249ca bdrv_img_create() use proper errno return values
Kevin suggested to have bdrv_img_create() return proper -errno values
on error.

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-12-17 16:11:03 +01:00
Jes Sorensen
792da93a63 Prevent creating an image with the same filename as backing file
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-12-17 16:11:03 +01:00
Jes Sorensen
f88e1a4201 qemu-img.c: Re-factor img_create()
This patch re-factors img_create() moving the code doing the actual
work into block.c where it can be shared with QEMU. This is needed to
be able to create images from QEMU to be used for live snapshots.

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-12-17 16:11:03 +01:00
Stefan Hajnoczi
df2dbb4a50 block: Fix the use of protocols in backing files
Backing filenames may contain a protocol.  The code currently doesn't
consider this case and produces filenames that embed "<protocol>:".
Don't combine filenames if the backing filename contains a protocol.

Based on an earlier patch by Anthony Liguori <aliguori@us.ibm.com>.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-12-17 16:10:59 +01:00
Stefan Hajnoczi
9e0b22f4f2 block: Introduce path_has_protocol() function
The bdrv_find_protocol() function returns NULL if an unknown protocol
name is given.  It returns the "file" protocol when the filename
contains no protocol at all.  This makes it difficult to distinguish
between paths which contain a protocol and those which do not.

Factor out a helper function that tests whether or not a filename has a
protocol.  The next patch makes use of this function.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-12-17 16:10:59 +01:00
Stefan Hajnoczi
16905d7175 block: Make bdrv_create_file() ':' handling consistent
Filenames may start with "<protocol>:" to explicitly use a protocol like
nbd.  Filenames with unknown protocols are rejected in most of QEMU
except for bdrv_create_file().  Even if a file with an invalid filename
can be created, QEMU cannot use it since all the other relevant
functions reject such paths.  Make bdrv_create_file() consistent.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-12-14 15:44:21 +01:00
Marcelo Tosatti
4dcafbb1eb block: set sector dirty on AIO write completion
Sectors are marked dirty in the bitmap on AIO submission. This is wrong
since data has not reached storage.

Set a given sector as dirty in the dirty bitmap on AIO completion, so that
reading a sector marked as dirty is guaranteed to return uptodate data.

Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-11-21 09:16:56 -06:00
Marcelo Tosatti
6d59fec11e block: fix shift in dirty bitmap calculation
Otherwise upper 32 bits of bitmap entries are not correctly calculated.

Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-11-21 09:16:56 -06:00
Kevin Wolf
205ef7961f block: Allow bdrv_flush to return errors
This changes bdrv_flush to return 0 on success and -errno in case of failure.
It's a requirement for implementing proper error handle in users of bdrv_flush.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2010-11-04 12:52:16 +01:00
edison
51ef67270b Copy snapshots out of QCOW2 disk
In order to backup snapshots, created from QCOW2 iamge, we want to copy snapshots out of QCOW2 disk to a seperate storage.
The following patch adds a new option in "qemu-img": qemu-img convert -f qcow2 -O qcow2 -s snapshot_name src_img bck_img.
Right now, it only supports to copy the full snapshot, delta snapshot is on the way.

Changes from V1: all the comments from Kevin are addressed:
Add read-only checking
Fix coding style
Change the name from bdrv_snapshot_load to bdrv_snapshot_load_tmp

Signed-off-by: Disheng Su <edison@cloud.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-10-22 14:49:35 +02:00
Stefan Hajnoczi
bbf0a44081 trace: Trace bdrv_aio_{readv,writev}
Observing block layer aio readv/writev operations is useful for
debugging image formats or understanding guest disk I/O patterns.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2010-10-09 08:17:03 +00:00
Stefan Hajnoczi
6d519a5f95 trace: Trace virtio-blk, multiwrite, and paio_submit
This patch adds trace events that make it possible to observe
virtio-blk.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2010-09-09 16:22:45 -05:00
Anthony Liguori
8b33d9eeba Revert "Make default invocation of block drivers safer (v3)"
This reverts commit 79368c81bf.

Conflicts:

	block.c

I haven't been able to come up with a solution yet for the corruption caused by
unaligned requests from the IDE disk so revert until a solution can be written.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-09-08 17:09:15 -05:00
Kevin Wolf
ee1811965f block: Fix image re-open in bdrv_commit
Arguably we should re-open the backing file with the backing file format and
not with the format of the snapshot image.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-08-30 18:29:22 +02:00
Markus Armbruster
4be9762adb block: Change bdrv_eject() not to drop the image
bdrv_eject() gets called when a device model opens or closes the tray.

If the block driver implements method bdrv_eject(), that method gets
called.  Drivers host_cdrom implements it, and it opens and closes the
physical tray, and nothing else.  When a device model opens, then
closes the tray, media changes only if the user actively changes the
physical media while the tray is open.  This is matches how physical
hardware behaves.

If the block driver doesn't implement method bdrv_eject(), we do
something quite different: opening the tray severs the connection to
the image by calling bdrv_close(), and closing the tray does nothing.
When the device model opens, then closes the tray, media is gone,
unless the user actively inserts another one while the tray is open,
with a suitable change command in the monitor.  This isn't how
physical hardware behaves.  Rather inconvenient when programs
"helpfully" eject media to give you a chance to change it.  The way
bdrv_eject() behaves here turns that chance into a must, which is not
what these programs or their users expect.

Change the default action not to call bdrv_close().  Instead, note the
tray status in new BlockDriverState member tray_open.  Use it in
bdrv_is_inserted().

Arguably, the device models should keep track of tray status
themselves.  But this is less invasive.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-08-03 15:57:22 +02:00
Kevin Wolf
336c1c1255 block: Fix bdrv_has_zero_init
Assuming that any image on a block device is not properly zero-initialized is
actually wrong: Only raw images have this problem. Any other image format
shouldn't care about it, they initialize everything properly themselves.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-08-03 15:57:22 +02:00
Kevin Wolf
8a4266144e block: Change bdrv_commit to handle multiple sectors at once
bdrv_commit copies the image to its backing file sector by sector, which
is (surprise!) relatively slow. Let's take a larger buffer and handle more
sectors at once if possible.

With a 1G qcow2 file, this brought the time bdrv_commit takes down from
5:06 min to 1:14 min for me.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-08-03 15:57:22 +02:00
Blue Swirl
199630b62e Fix -snapshot deleting images on disk change
Block device change command did not copy BDRV_O_SNAPSHOT flag. Thus
the new image did not have this flag and the file got deleted during
opening.

Fix by copying BDRV_O_SNAPSHOT flag.

Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-07-26 13:39:40 +02:00
Stefan Weil
c98ac35d87 block: Use error codes from lower levels for error message
"No such file or directory" is a misleading error message
when a user tries to open a file with wrong permissions.

Cc: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-07-26 13:39:40 +02:00
Anthony Liguori
79368c81bf Make default invocation of block drivers safer (v3)
CVE-2008-2004 described a vulnerability in QEMU whereas a malicious user could
trick the block probing code into accessing arbitrary files in a guest.  To
mitigate this, we added an explicit format parameter to -drive which disabling
block probing.

Fast forward to today, and the vast majority of users do not use this parameter.
libvirt does not use this by default nor does virt-manager.

Most users want block probing so we should try to make it safer.

This patch adds some logic to the raw device which attempts to detect a write
operation to the beginning of a raw device.  If the first 4 bytes happen to
match an image file that has a backing file that we support, it scrubs the
signature to all zeros.  If a user specifies an explicit format parameter, this
behavior is disabled.

I contend that while a legitimate guest could write such a signature to the
header, we would behave incorrectly anyway upon the next invocation of QEMU.
This simply changes the incorrect behavior to not involve a security
vulnerability.

I've tested this pretty extensively both in the positive and negative case.  I'm
not 100% confident in the block layer's ability to deal with zero sized writes
particularly with respect to the aio functions so some additional eyes would be
appreciated.

Even in the case of a single sector write, we have to make sure to invoked the
completion from a bottom half so just removing the zero sized write is not an
option.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-07-15 08:17:06 -05:00
Kevin Wolf
9ac228e02c qcow2/vdi: Change check to distinguish error cases
This distinguishes between harmless leaks and real corruption. Hopefully users
better understand what qemu-img check wants to tell them.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-07-06 17:05:49 +02:00
Kevin Wolf
e076f3383b qemu-img check: Distinguish different kinds of errors
People think that their images are corrupted when in fact there are just some
leaked clusters. Differentiating several error cases should make the messages
more comprehensible.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-07-06 17:05:48 +02:00
Kevin Wolf
de189a1b4a block: Handle multiwrite errors only when all requests have completed
Don't try to be clever by freeing all temporary data and calling all callbacks
when the return value (an error) is certain. Doing so has at least two
important problems:

* The temporary data that is freed (qiov, possibly zero buffer) is still used
  by the requests that have not yet completed.
* Calling the callbacks for all requests in the multiwrite means for the caller
  that it may free buffers etc. which are still in use.

Just remember the error value and do the cleanup when all requests have
completed.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-07-02 15:44:12 +02:00
Kevin Wolf
453f9a1652 block: Fix early failure in multiwrite
bdrv_aio_writev may call the callback immediately (and it will commonly do so
in error cases). Current code doesn't consider this. For details see the
comment added by this patch.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-07-02 15:44:12 +02:00
Markus Armbruster
7d0d69509a block: Fix virtual media change for if=none
BlockDriverState member removable controls whether virtual media
change (monitor commands change, eject) is allowed.  It is set when
the "type hint" is BDRV_TYPE_CDROM or BDRV_TYPE_FLOPPY.

The type hint is only set by drive_init().  It sets BDRV_TYPE_FLOPPY
for if=floppy.  It sets BDRV_TYPE_CDROM for media=cdrom and if=ide,
scsi, xen, or none.

if=ide and if=scsi work, because the type hint makes it a CD-ROM.
if=xen likewise, I think.

For the same reason, if=none works when it's used by ide-drive or
scsi-disk.  For other guest devices, there are problems:

* fdc: you can't change virtual media

    $ qemu [...] -drive if=none,id=foo,... -global isa-fdc.driveA=foo
    QEMU 0.12.50 monitor - type 'help' for more information
    (qemu) eject foo
    Device 'foo' is not removable

  unless you add media=cdrom, but that makes it readonly.

* virtio: if you add media=cdrom, you can change virtual media.  If
  you eject, the guest gets I/O errors.  If you change, the guest sees
  the drive's contents suddenly change.

* scsi-generic: if you add media=cdrom, you can change virtual media.
  I didn't test what that does to the guest or the physical device,
  but it can't be pretty.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-07-02 13:18:02 +02:00
Markus Armbruster
3ac906f771 block: Clean up bdrv_snapshots()
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-07-02 13:18:02 +02:00
Markus Armbruster
f9092b108f savevm: Survive hot-unplug of snapshot device
savevm.c keeps a pointer to the snapshot block device.  If you manage
to get that device deleted, the pointer dangles, and the next snapshot
operation will crash & burn.  Unplugging a guest device that uses it
does the trick:

    $ MALLOC_PERTURB_=234 qemu-system-x86_64 [...]
    QEMU 0.12.50 monitor - type 'help' for more information
    (qemu) info snapshots
    No available block device supports snapshots
    (qemu) drive_add auto if=none,file=tmp.qcow2
    OK
    (qemu) device_add usb-storage,id=foo,drive=none1
    (qemu) info snapshots
    Snapshot devices: none1
    Snapshot list (from none1):
    ID        TAG                 VM SIZE                DATE       VM CLOCK
    (qemu) device_del foo
    (qemu) info snapshots
    Snapshot devices:
    Segmentation fault (core dumped)

Move management of that pointer to block.c, and zap it when the device
it points becomes unusable.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-07-02 13:18:02 +02:00
Markus Armbruster
18846dee1a block: Catch attempt to attach multiple devices to a blockdev
For instance, -device scsi-disk,drive=foo -device scsi-disk,drive=foo
happily creates two SCSI disks connected to the same block device.
It's all downhill from there.

Device usb-storage deliberately attaches twice to the same blockdev,
which fails with the fix in place.  Detach before the second attach
there.

Also catch attempt to delete while a guest device model is attached.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-07-02 13:18:02 +02:00
Ryan Harper
15c7733bb2 Don't reset bs->is_temporary in bdrv_open_common
To fix https://bugs.launchpad.net/qemu/+bug/597402 where qemu fails to
call unlink() on temporary snapshots due to bs->is_temporary getting clobbered
in bdrv_open_common() after being set in bdrv_open() which calls the former.

We don't need to initialize bs->is_temporary in bdrv_open_common().

Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-07-02 13:18:01 +02:00
Christoph Hellwig
39508e7adb block: allow filenames with colons again for host devices
Before the raw/file split we used to allow filenames with colons for host
device only.  While this was more by accident than by design people rely
on it, so we need to bring it back.

So move the host device probing to be before the protocol detection
again.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-07-02 13:18:01 +02:00
Kevin Wolf
f08145fe16 block: Add bdrv_(p)write_sync
Add new functions that write and flush the written data to disk immediately.
This is what needs to be used for image format metadata to maintain integrity
for cache=... modes that don't use O_DSYNC. (Actually, we only need barriers,
and therefore the functions are defined as such, but flushes is what is
implemented in this patch - we can try to change that later)

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-06-22 14:38:02 +02:00
Blue Swirl
5ffbbc67b5 block: fix a warning and possible truncation
Fix a warning from OpenBSD gcc (3.3.5 (propolice)):
/src/qemu/block.c: In function `bdrv_info_stats_bs':
/src/qemu/block.c:1548: warning: long long int format, long unsigned
int arg (arg 6)

There may be also truncation effects.

Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-06-15 09:42:30 +02:00
Markus Armbruster
2f399b0aad block: New bdrv_next()
This is a more flexible alternative to bdrv_iterate().

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-06-15 09:41:59 +02:00
Markus Armbruster
6ab4b5ab8f block: Decouple block device "commit all" from DriveInfo
do_commit() and mux_proc_byte() iterate over the list of drives
defined with drive_init().  This misses host block devices defined by
other means.  Such means don't exist now, but will be introduced later
in this series.

Change them to use new bdrv_commit_all(), which iterates over all host
block devices.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-06-15 09:41:59 +02:00
Markus Armbruster
abd7f68d08 block: Move error actions from DriveInfo to BlockDriverState
That's where they belong semantically (block device host part), even
though the actions are actually executed by guest device code.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-06-15 09:41:59 +02:00
Miguel Di Ciurcio Filho
feeee5aca7 savevm: Really verify if a drive supports snapshots
Both bdrv_can_snapshot() and bdrv_has_snapshot() does not work as advertized.

First issue: Their names implies different porpouses, but they do the same thing
and have exactly the same code. Maybe copied and pasted and forgotten?
bdrv_has_snapshot() is called in various places for actually checking if there
is snapshots or not.

Second issue: the way bdrv_can_snapshot() verifies if a block driver supports or
not snapshots does not catch all cases. E.g.: a raw image.

So when do_savevm() is called, first thing it does is to set a global
BlockDriverState to save the VM memory state calling get_bs_snapshots().

static BlockDriverState *get_bs_snapshots(void)
{
    BlockDriverState *bs;
    DriveInfo *dinfo;

    if (bs_snapshots)
        return bs_snapshots;
    QTAILQ_FOREACH(dinfo, &drives, next) {
        bs = dinfo->bdrv;
        if (bdrv_can_snapshot(bs))
            goto ok;
    }
    return NULL;
 ok:
    bs_snapshots = bs;
    return bs;
}

bdrv_can_snapshot() may return a BlockDriverState that does not support
snapshots and do_savevm() goes on.

Later on in do_savevm(), we find:

    QTAILQ_FOREACH(dinfo, &drives, next) {
        bs1 = dinfo->bdrv;
        if (bdrv_has_snapshot(bs1)) {
            /* Write VM state size only to the image that contains the state */
            sn->vm_state_size = (bs == bs1 ? vm_state_size : 0);
            ret = bdrv_snapshot_create(bs1, sn);
            if (ret < 0) {
                monitor_printf(mon, "Error while creating snapshot on '%s'\n",
                               bdrv_get_device_name(bs1));
            }
        }
    }

bdrv_has_snapshot(bs1) is not checking if the device does support or has
snapshots as explained above. Only in bdrv_snapshot_create() the device is
actually checked for snapshot support.

So, in cases where the first device supports snapshots, and the second does not,
the snapshot on the first will happen anyways. I believe this is not a good
behavior. It should be an all or nothing process.

This patch addresses these issues by making bdrv_can_snapshot() actually do
what it must do and enforces better tests to avoid errors in the middle of
do_savevm(). bdrv_has_snapshot() is removed and replaced by bdrv_can_snapshot()
where appropriate.

bdrv_can_snapshot() was moved from savevm.c to block.c. It makes more sense to me.

The loadvm_state() function was updated too to enforce that when loading a VM at
least all writable devices must support snapshots too.

Signed-off-by: Miguel Di Ciurcio Filho <miguel.filho@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-06-15 09:41:58 +02:00
MORITA Kazutaka
7cdb1f6d30 block: call the snapshot handlers of the protocol drivers
When snapshot handlers are not defined in the format driver, it is
better to call the ones of the protocol driver.  This enables us to
implement snapshot support in the protocol driver.

We need to call bdrv_close() and bdrv_open() handlers of the format
driver before and after bdrv_snapshot_goto() call of the protocol.  It is
because the contents of the block driver state may need to be changed
after loading vmstate.

Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-06-04 11:43:40 +02:00
MORITA Kazutaka
2bc93fed76 close all the block drivers before the qemu process exits
This patch calls the close handler of the block driver before the qemu
process exits.

This is necessary because the sheepdog block driver releases the lock
of VM images in the close handler.

Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-06-04 11:43:40 +02:00
Kevin Wolf
08a00559f0 block: Assume raw for drives without media
qemu -cdrom /dev/cdrom with an empty CD-ROM drive doesn't work any more because
we try to guess the format and when this fails (because there is no medium) we
exit with an error message.

This patch should restore the old behaviour by assuming raw format for such
drives.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-06-04 11:43:40 +02:00
Jes Sorensen
eb5a316514 Cleanup: Be consistent and use BDRV_SECTOR_SIZE instead of 512
Clean up block.c and use BDRV_SECTOR_SIZE rather than hard coded
numbers (512) when referring to sector size throughout the code.

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-06-04 11:43:39 +02:00
Jes Sorensen
3e82990b52 Cleanup: bdrv_open() no need to shift total_size just to shift back.
In bdrv_open() there is no need to shift total_size >> 9 just to
multiply it by 512 again just a few lines later, since this is the
only place the variable is used.

Mask with BDRV_SECTOR_MASK to protect against case where we are
passed a corrupted image.

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-06-04 11:43:39 +02:00
Anthony Liguori
358c360feb Merge remote branch 'kwolf/for-anthony' into staging 2010-06-03 14:55:49 -05:00
Luiz Capitulino
637503d122 Monitor: Drop QMP documentation from code
Previous commit added QMP documentation to the qemu-monitor.hx
file, it's is a copy of this information.

While it's good to keep it near code, maintaining two copies of
the same information is too hard and has little benefit as we
don't expect client writers to consult the code to find how to
use a QMP command.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-06-01 13:48:43 -05:00
Nicholas A. Bellinger
1a39685910 block: Add missing bdrv_delete() for SG_IO BlockDriver in find_image_format()
This patch adds a missing bdrv_delete() call in find_image_format() so that a
SG_IO BlockDriver properly releases the temporary BlockDriverState *bs created
from bdrv_file_open()

Signed-off-by: Nicholas A. Bellinger <nab@linux-iscsi.org>
Reported-by: Chris Krumme <chris.krumme@windriver.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-28 13:29:17 +02:00
MORITA Kazutaka
b50cbabc1b add support for protocol driver create_options
This patch enables protocol drivers to use their create options which
are not supported by the format.  For example, protcol drivers can use
a backing_file option with raw format.

Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-28 13:14:25 +02:00
Kevin Wolf
cbf1dff2f1 block: Fix multiwrite with overlapping requests
With overlapping requests, the total number of sectors is smaller than the sum
of the nb_sectors of both requests.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-28 13:14:25 +02:00
Alexander Graf
016f5cf6ff Add cache=unsafe parameter to -drive
Usually the guest can tell the host to flush data to disk. In some cases we
don't want to flush though, but try to keep everything in cache.

So let's add a new cache value to -drive that allows us to set the cache
policy to most aggressive, disabling flushes. We call this mode "unsafe",
as guest data is not guaranteed to survive host crashes anymore.

This patch also adds a noop function for aio, so we can do nothing in AIO
fashion.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2010-05-26 20:05:14 +02:00
Nicholas Bellinger
396759ad4a block: Add SG_IO device check in refresh_total_sectors()
This patch adds a special case check for scsi-generic devices in
refresh_total_sectors() to skip the subsequent BlockDriver->bdrv_getlength()
that will be returning -ESPIPE from block/raw-posic.c:raw_getlength() for
BlockDriverState->sg=1 devices.

Signed-off-by: Nicholas A. Bellinger <nab@linux-iscsi.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-21 11:49:19 +02:00
Nicholas Bellinger
f8ea0b00e0 block: Make find_image_format() return 'raw' BlockDriver for SG_IO devices
This patch adds a special BlockDriverState->sg check in block.c:find_image_format()
after bdrv_file_open() -> block/raw-posix.c:hdev_open() has been called to determine
if we are dealing with a Linux host scsi-generic device.

The patch then returns the BlockDriver * from bdrv_find_format("raw"), skipping the
subsequent bdrv_read() and rest of find_image_format().

Signed-off-by: Nicholas A. Bellinger <nab@linux-iscsi.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-21 11:49:19 +02:00
Christoph Hellwig
77be4366ba block: fix sector comparism in multiwrite_req_compare
The difference between the start sectors of two requests can be larger
than the size of the "int" type, which can lead to a not correctly
sorted multiwrite array and thus spurious I/O errors and filesystem
corruption due to incorrect request merges.

So instead of doing the cute sector arithmetics trick spell out the
exact comparisms.

Spotted by Kevin Wolf based on a testcase from Michael Tokarev.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-21 11:49:19 +02:00
Kevin Wolf
35ed5de6be block: Remove special case for vvfat
The special case doesn't really us buy anything. Without it vvfat works more
consistently as a protocol. We get raw on top of vvfat now, which works just
as well as using vvfat directly.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-17 10:20:05 +02:00
Daniel P. Berrange
21955137ee Fix docs for block stats monitor command
The 'parent' field in the 'query-blockstats' monitor command is
part of the top level block device QDict, not part of the 2nd
level 'stats' QDict.

* block.c: Fix docs for 'parent' field in block stats monitor
  command output

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-17 10:20:05 +02:00
Bruce Rogers
af474591e5 use qemu_free() instead of free()
There is a call to free() where qemu_free() should instead be used.

Signed-off-by: Bruce Rogers <brogers@novell.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-17 10:20:05 +02:00
Kevin Wolf
c33491978c block: Fix bdrv_commit
When reopening the image, don't guess the driver, but use the same driver as
was used before. This is important if the format=... option was used for that
image.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-17 10:20:05 +02:00
Kevin Wolf
209930818b block: Fix protocol detection for Windows devices
We can't assume the file protocol for Windows devices, they need the same
detection as other files for which an explicit protocol is not specified.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-17 10:20:05 +02:00
Kevin Wolf
b666d23950 block: Avoid unchecked casts for AIOCBs
Use container_of for one direction and &acb->common for the other one.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-17 10:20:05 +02:00
Jan Kiszka
d748768c09 block: Release allocated options after bdrv_open
They aren't used afterwards nor supposed to be stored by a bdrv_create
handler.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-03 10:07:32 +02:00
Kevin Wolf
294cc35f3d block: Add wr_highest_sector blockstat
This adds the wr_highest_sector blockstat which implements what is generally
known as the high watermark. It is the highest offset of a sector written to
the respective BlockDriverState since it has been opened.

The query-blockstat QMP command is extended to add this value to the result,
and also to add the statistics of the underlying protocol in a new "parent"
field. Note that to get the "high watermark" of a qcow2 image, you need to look
into the wr_highest_sector field of the parent (which can be a file, a
host_device, ...). The wr_highest_sector of the qcow2 BlockDriverState itself
is the highest offset on the _virtual_ disk that the guest has written to.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-03 10:07:32 +02:00
Stefan Hajnoczi
51762288b4 block: Cache total_sectors to reduce bdrv_getlength calls
The BlockDriver bdrv_getlength function is called from the I/O code path
when checking that the request falls within the device.  Unfortunately
this involves an lseek system call in the raw protocol; every read or
write request will incur this lseek cost.

Jan Kiszka <jan.kiszka@siemens.com> identified this issue and its
latency overhead.  This patch caches device length in the existing
total_sectors variable so lseek calls can be avoided for fixed size
devices.

Growable devices fall back to the full bdrv_getlength code path because
I have not added logic to detect extending the size of the device in a
write.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-03 10:07:32 +02:00
Stefan Hajnoczi
557df6aca2 block: Set backing_hd to NULL after deleting it
It is safer to set backing_hd to NULL after deleting it so that any use
after deletion is obvious during development.  Happy segfaulting!

This patch should be applied after Kevin Wolf's "vmdk: Convert to
bdrv_open" so that vmdk does not segfault on close.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-03 10:07:31 +02:00
Kevin Wolf
f2feebbd93 block: bdrv_has_zero_init
This fixes the problem that qemu-img's use of no_zero_init only considered the
no_zero_init flag of the format driver, but not of the underlying protocols.

Between the raw/file split and this fix, converting to host devices is broken.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-03 10:07:30 +02:00
Kevin Wolf
66f82ceed6 block: Open the underlying image file in generic code
Format drivers shouldn't need to bother with things like file names, but rather
just get an open BlockDriverState for the underlying protocol. This patch
introduces this behaviour for bdrv_open implementation. For protocols which
need to access the filename to open their file/device/connection/... a new
callback bdrv_file_open is introduced which doesn't get an underlying file
opened.

For now, also some of the more obscure formats use bdrv_file_open because they
open() the file themselves instead of using the block.c functions. They need to
be fixed in later patches.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-03 10:07:30 +02:00
Kevin Wolf
5791533251 block: Avoid forward declaration of bdrv_open_common
Move bdrv_open_common so it's defined before its callers and remove the forward
declaration.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-03 10:07:30 +02:00
Kevin Wolf
b6ce07aa83 block: Split bdrv_open
bdrv_open contains quite some code that is only useful for opening images (as
opposed to opening files by a protocol), for example snapshots.

This patch splits the code so that we have bdrv_open_file() for files (uses
protocols), bdrv_open() for images (uses format drivers) and bdrv_open_common()
for the code common for opening both images and files.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-03 10:07:30 +02:00
Christoph Hellwig
84a12e6648 block: separate raw images from the file protocol
We're running into various problems because the "raw" file access, which
is used internally by the various image formats is entangled with the
"raw" image format, which maps the VM view 1:1 to a file system.

This patch renames the raw file backends to the file protocol which
is treated like other protocols (e.g. nbd and http) and adds a new
"raw" image format which is just a wrapper around calls to the underlying
protocol.

The patch is surprisingly simple, besides changing the probing logical
in block.c to only look for image formats when using bdrv_open and
renaming of the old raw protocols to file there's almost nothing in there.

For creating images, a new bdrv_create_file is introduced which guesses the
protocol to use. This allows using qemu-img create -f raw (or just using the
default) for both files and host devices. Converting the other format drivers
to use this function to create their images is left for later patches.

The only issues still open are in the handling of the host devices.
Firstly in current qemu we can specifiy the host* format names
on various command line acceping images, but the new code can't
do that without adding some translation.  Second the layering breaks
the no_zero_init flag in the BlockDriver used by qemu-img.  I'm not
happy how this is done per-driver instead of per-state so I'll
prepare a separate patch to clean this up.

There's some more cleanup opportunity after this patch, e.g. using
separate lists and registration functions for image formats vs
protocols and maybe even host drivers, but this can be done at a
later stage.

Also there's a check for protocol in bdrv_open for the BDRV_O_SNAPSHOT
case that I don't quite understand, but which I fear won't work as
expected - possibly even before this patch.

Note that this patch requires various recent block patches from Kevin
and me, which should all be in his block queue.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-03 10:07:30 +02:00
Stefan Hajnoczi
1e1ea48d42 block: Free iovec arrays allocated by multiwrite_merge()
A new iovec array is allocated when creating a merged write request.
This patch ensures that the iovec array is deleted in addition to its
qiov owner.

Reported-by: Leszek Urbanski <tygrys@moo.pl>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-04-23 16:21:58 +02:00
Stefan Hajnoczi
8a22f02a88 block: Convert first_drv to QLIST
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-04-23 16:21:57 +02:00
Stefan Hajnoczi
1b7bdbc13c block: Convert bdrv_first to QTAILQ
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-04-23 16:21:57 +02:00
Stefan Hajnoczi
b66460e4e9 block: Do not export bdrv_first
The bdrv_first linked list of BlockDriverStates is currently extern so
that block migration can iterate the list.  However, since there is
already a bdrv_iterate() function there is no need to expose bdrv_first.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-04-23 16:21:57 +02:00
Christoph Hellwig
6db956039d block: get rid of the BDRV_O_FILE flag
BDRV_O_FILE is only used to communicate between bdrv_file_open and bdrv_open.
It affects two things:  first bdrv_open only searches for protocols using
find_protocol instead of all image formats and host drivers.  We can easily
move that to the caller and pass the found driver to bdrv_open.  Second
it is used to not force a read-write open of a snapshot file.  But we never
use bdrv_file_open to open snapshots and this behaviour doesn't make sense
to start with.

qemu-io abused the BDRV_O_FILE for it's growable option, switch it to
using bdrv_file_open to make sure we only open files as growable were
we can actually support that.

This patch requires Kevin's "[PATCH] Replace calls of old bdrv_open" to
be applied first.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-04-23 16:08:46 +02:00
Kevin Wolf
d6e9098e10 Replace calls of old bdrv_open
What is known today as bdrv_open2 becomes the new bdrv_open. All remaining
callers of the old function are converted to the new one. In some places they
even know the right format, so they should have used bdrv_open2 from the
beginning.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-04-23 16:08:46 +02:00
Kevin Wolf
8b9b0cc2fd blkdebug: Add events and rules
Block drivers can trigger a blkdebug event whenever they reach a place where it
could be useful to inject an error for testing/debugging purposes.

Rules are read from a blkdebug config file and describe which action is taken
when an event is triggered. For now this is only injecting an error (with a few
options) or changing the state (which is an integer). Rules can be declared to
be active only in a specific state; this way later rules can distiguish on
which path we came to trigger their event.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-04-23 16:08:46 +02:00
Kevin Wolf
7eb58a6c55 block: Fix multiwrite memory leak in error case
Previously multiwrite_user_cb was never called if a request in the multiwrite
batch failed right away because it did set mcb->error immediately. Make it look
more like a normal callback to fix this.

Reported-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2010-04-10 00:39:35 +02:00
Kevin Wolf
0f0b604b00 block: Fix error code in multiwrite for immediate failures
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2010-04-10 00:39:33 +02:00
Kevin Wolf
cb6d3ca07b block: Fix multiwrite error handling
When two requests of the same multiwrite batch fail, the callback of all
requests in that batch were called twice. This could have any kind of nasty
effects, in my case it lead to use after free and eventually a segfault.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2010-04-10 00:14:23 +02:00
Shahar Havivi
fd04a2aeda Wrong error message in block_passwd command
Signed-off-by: Shahar Havivi <shaharh@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-03-17 10:41:38 -05:00
Naphtali Sprei
4dca4b639c block: more read-only changes, related to backing files
Open backing file read-only where possible
Upgrade backing file to read-write during commit, back to read-only after commit
  If upgrade fail, back to read-only. If also fail, "disconnect" the drive.

Signed-off-by: Naphtali Sprei <nsprei@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-02-19 15:32:15 -06:00
Luiz Capitulino
ba14414174 Monitor: remove unneeded checks
It's not needed to check the return of qobject_from_jsonf()
anymore, as an assert() has been added there.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-02-10 13:46:17 -06:00
Christoph Hellwig
15dc2697a5 block: saner flags filtering in bdrv_open2
Clean up the current mess about figuring out which flags to pass to the
driver.  BDRV_O_FILE, BDRV_O_SNAPSHOT and BDRV_O_NO_BACKING are flags
only used by the block layer internally so filter them out directly.
Previously BDRV_O_NO_BACKING could accidentally be passed to the drivers,
but wasn't ever used.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-02-10 11:57:03 -06:00
Luiz Capitulino
2582bfedd2 block: BLOCK_IO_ERROR QMP event
This commit introduces the bdrv_mon_event() function, which
should be called by block subsystems (eg. IDE) when a I/O
error occurs, so that an QMP event is emitted.

The following information is currently provided in the event:

- device name
- operation (ie. "read" or "write")
- action taken (eg. "stop")

Event example:

{ "event": "BLOCK_IO_ERROR",
    "data": { "device": "ide0-hd1",
              "operation": "write",
              "action": "stop" },
    "timestamp": { "seconds": 1265044230, "microseconds": 450486 } }

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-02-10 11:57:03 -06:00
Liran Schour
aaa0eb75e2 Count dirty blocks and expose an API to get dirty count
This will manage dirty counter for each device and will allow to get the
dirty counter from above.

Signed-off-by: Liran Schour <lirans@il.ibm.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-02-09 16:56:14 -06:00
Christoph Hellwig
e2a305fb13 block: avoid creating too large iovecs in multiwrite_merge
If we go over the maximum number of iovecs support by syscall we get
back EINVAL from the kernel which translate to I/O errors for the guest.

Add a MAX_IOV defintion for platforms that don't have it.  For now we use
the same 1024 define that's used on Linux and various other platforms,
but until the windows block backend implements some kind of vectored I/O
it doesn't matter.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-01-26 17:08:03 -06:00
Herve Poussineau
f8a83245d9 win32: pair qemu_memalign() with qemu_vfree()
Win32 suffers from a very big memory leak when dealing with SCSI devices.
Each read/write request allocates memory with qemu_memalign (ie
VirtualAlloc) but frees it with qemu_free (ie free).
Pair all qemu_memalign() calls with qemu_vfree() to prevent such leaks.

Signed-off-by: Herve Poussineau <hpoussin@reactos.org>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-01-26 16:41:06 -06:00
Christoph Hellwig
6987307ca3 block: clean up bdrv_open2 structure a bit
Check the whitelist as early as possible instead of continuing the
setup, and move all the error handling code to the end of the
function.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-01-26 15:42:02 -06:00
Naphtali Sprei
37226ad946 No need anymoe for bdrv_set_read_only
Signed-off-by: Naphtali Sprei <nsprei@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-01-26 15:42:01 -06:00
Kevin Wolf
9a8c4cceaf block: Return original error codes in bdrv_pread/write
Don't assume -EIO but return the real error.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-01-26 14:59:19 -06:00
Anthony Liguori
3e39789b64 Revert "block: prevent multiwrite_merge from creating too large iovecs"
This reverts commit 0076bc0c1d.

Kevin Wolf pointed out that this breaks the mingw32 build.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-01-20 10:12:23 -06:00
Christoph Hellwig
0076bc0c1d block: prevent multiwrite_merge from creating too large iovecs
If we go over the maximum number of iovecs support by syscall we get
back EINVAL from the kernel which translate to I/O errors for the guest.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-01-20 08:51:40 -06:00
Christoph Hellwig
1d44952fc7 block: fix cache flushing in bdrv_commit
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-01-20 08:51:11 -06:00
Naphtali Sprei
03cbdac7ef Disable fall-back to read-only when cannot open drive's file for read-write
Signed-off-by: Naphtali Sprei <nsprei@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-01-20 08:25:22 -06:00
Naphtali Sprei
f5edb014ed Clean-up a little bit the RW related bits of BDRV_O_FLAGS. BDRV_O_RDONLY gone (and so is BDRV_O_ACCESS). Default value for bdrv_flags (0/zero) is READ-ONLY. Need to explicitly request READ-WRITE.
Instead of using the field 'readonly' of the BlockDriverState struct for passing the request,
pass the request in the flags parameter to the function.

Signed-off-by: Naphtali Sprei <nsprei@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-01-20 08:25:22 -06:00
Christoph Hellwig
3f5075ae63 block: flush backing_hd in the right place
The backing device is only modified from bdrv_commit.  So instead of
flushing it every time bdrv_flush is called for the front-end device
only flush it after we're written data to it in bdrv_commit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-01-13 17:14:15 -06:00
Kevin Wolf
756e6736a1 block: Add bdrv_change_backing_file
Introduce the functions needed to change the backing file of an image. The
function is implemented for qcow2.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-01-13 17:14:15 -06:00
Kevin Wolf
b783e409bf block: Introduce BDRV_O_NO_BACKING
If an image references a backing file that doesn't exist, qemu-img info fails
to open this image. Exactly in this case the info would be valuable, though:
the user might want to find out which file is missing.

This patch introduces a BDRV_O_NO_BACKING flag to ignore the backing file when
opening the image. qemu-img info is the first user and provides info now even
if the backing file is invalid.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-01-13 17:14:15 -06:00
Kirill A. Shutemov
114cdfa908 block.c: fix warning with _FORTIFY_SOURCE
CC    block.o
cc1: warnings being treated as errors
block.c: In function 'bdrv_open2':
block.c:400: error: ignoring return value of 'realpath', declared with attribute warn_unused_result

Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2009-12-25 18:19:22 +00:00
Luiz Capitulino
218a536a7a block: Convert bdrv_info_stats() to QObject
Each device statistic information is stored in a QDict and
the returned QObject is a QList of all devices.

This commit should not change user output.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-12-12 07:59:49 -06:00
Luiz Capitulino
d15e546567 block: Convert bdrv_info() to QObject
Each block device information is stored in a QDict and the
returned QObject is a QList of all devices.

This commit should not change user output.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-12-12 07:59:49 -06:00
Jan Kiszka
c6d2283068 block migration: Cleanup dirty tracking code
This switches the dirty bitmap to a true bitmap, reducing its footprint
(specifically in caches). It moreover fixes off-by-one bugs in
set_dirty_bitmap (nb_sectors+1 were marked) and bdrv_get_dirty (limit
check allowed one sector behind end of drive). And is drops redundant
dirty_tracking field from BlockDriverState.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-12-03 10:48:52 -06:00
Jan Kiszka
6ea44308b0 block migration: Rework constants API
Instead of duplicating the definition of constants or introducing
trivial retrieval functions move the SECTOR constants into the public
block API. This also obsoletes sector_per_block in BlkMigState.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-12-03 10:48:52 -06:00
Jan Kiszka
a55eb92c22 block migration: Fix coding style and whitespaces
No functional changes.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-12-03 10:48:52 -06:00
lirans@il.ibm.com
7cd1e32a86 Expose a mechanism to trace block writes
To support live migration without shared storage we need to be able to trace
writes to disk while migrating. This Patch expose dirty block tracking per
device to be polled from upper layer.

Changes from v4:
- Register dirty tracking for each block device.
- Minor coding style issues.
- Block.c will now manage a dirty bitmap per device once
  bdrv_set_dirty_tracking() is called. Bitmap is polled by the upper
  layer (block-migration.c).

Signed-off-by: Liran Schour <lirans@il.ibm.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-11-17 08:03:31 -06:00
Markus Armbruster
eb852011ab Configurable block format whitelist
We have code for a quite a few block formats.  While I trust that all
of these formats are useful at least for some people in some
circumstances, some of them are of a kind that friends don't let
friends use in production.

This patch provides an optional block format whitelist, default off.
If a whitelist is configured with --block-drv-whitelist, QEMU proper
can use only whitelisted formats.  Other programs, like qemu-img, are
not affected.

Drivers for formats off the whitelist still participate in format
probing, to ensure all programs probe exactly the same.  Without that,
QEMU proper would be prone to treat images with a format off the
whitelist as raw when the image's format is probed.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-11-09 08:43:02 -06:00
Naphtali Sprei
59f2689d90 Added readonly flag to -drive command
This is a slightly revised patch for adding readonly flag to the -drive command.
Even though this patch is "stand-alone", it assumes a previous related patch (in Anthony staging tree), that passes
the readonly attribute of the drive to the guest OS, applied first.

This enables sharing same image between guests, with readonly access.
Implementaion mark the drive as read_only and changes the flags when actually opening the file.
The readonly attribute of a qcow also passed to it's base file.
For ide that cannot pass the readonly attribute to the guest OS, disallow the readonly flag.

Also, return error code from bdrv_truncate for readonly drive.

Signed-off-by: Naphtali Sprei <nsprei@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-11-09 08:43:01 -06:00
Kevin Wolf
65d6b3d885 block: Use new AsyncContext for bdrv_read/write emulation
bdrv_read/write emulation is used as the perfect example why we need something
like AsyncContexts. So maybe they better start using it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-10-27 12:28:59 -05:00
Blue Swirl
72cf2d4f0e Fix sys-queue.h conflict for good
Problem: Our file sys-queue.h is a copy of the BSD file, but there are
some additions and it's not entirely compatible. Because of that, there have
been conflicts with system headers on BSD systems. Some hacks have been
introduced in the commits 15cc923584,
f40d753718,
96555a96d7 and
3990d09adf but the fixes were fragile.

Solution: Avoid the conflict entirely by renaming the functions and the
file. Revert the previous hacks.

Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2009-09-12 07:36:22 +00:00
Christoph Hellwig
b2e12bc6e3 block: add aio_flush operation
Instead stalling the VCPU while serving a cache flush try to do it
asynchronously.  Use our good old helper thread pool to issue an
asynchronous fdatasync for raw-posix.  Note that while Linux AIO
implements a fdatasync operation it is not useful for us because
it isn't actually implement in asynchronous fashion.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-09-11 10:19:46 -05:00
Christoph Hellwig
e900a7b748 block: add enable_write_cache flag
Add a enable_write_cache flag in the block driver state, and use it to
decide if we claim to have a volatile write cache that needs controlled
flushing from the guest.  The flag is off if cache=writethrough is
defined because O_DSYNC guarantees that every write goes to stable
storage, and it is on for cache=none and cache=writeback.

Both scsi-disk and ide now use the new flage, changing from their
defaults of always off (ide) or always on (scsi-disk).

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-09-11 10:19:46 -05:00
Kevin Wolf
40b4f53967 Add bdrv_aio_multiwrite
One performance problem of qcow2 during the initial image growth are
sequential writes that are not cluster aligned. In this case, when a first
requests requires to allocate a new cluster but writes only to the first
couple of sectors in that cluster, the rest of the cluster is zeroed - just
to be overwritten by the following second request that fills up the cluster.

Let's try to merge sequential write requests to the same cluster, so we can
avoid to write the zero padding to the disk in the first place.

As a nice side effect, also other formats take advantage of dealing with less
and larger requests.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-09-11 10:18:06 -05:00
Christoph Hellwig
5c6c3a6c54 raw-posix: add Linux native AIO support
Now that do have a nicer interface to work against we can add Linux native
AIO support.  It's an extremly thing layer just setting up an iocb for
the io_submit system call in the submission path, and registering an
eventfd with the qemu poll handler to do complete the iocbs directly
from there.

This started out based on Anthony's earlier AIO patch, but after
estimated 42,000 rewrites and just as many build system changes
there's not much left of it.

To enable native kernel aio use the aio=native sub-command on the
drive command line.  I have also added an option to qemu-io to
test the aio support without needing a guest.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-08-27 20:30:22 -05:00