The code changed here is an unused data type name (evt_flush_occurred).
Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
The block layer caches the device size to avoid doing lseek(fd, 0,
SEEK_END) every time this value is needed. For removable media the
device size becomes stale if a new medium is inserted. This patch
simply prevents device size caching for removable media.
A smarter solution is to update the cached device size when a new medium
is inserted. Given that there are currently bugs with CD-ROM media
change I do not want to implement that approach until we've gotten
things correct first.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
It can be handy to know when the guest locks/unlocks the CD-ROM tray.
This trace event makes that possible.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
When removing a drive from the host-side via drive_del we currently have
the following path:
drive_del
qemu_aio_flush()
bdrv_close() // zaps bs->drv, which makes any subsequent I/O get
// dropped. Works as designed
drive_uninit()
bdrv_delete() // frees the bs. Since the device is still connected to
// bs, any subsequent I/O is a use-after-free.
The value of bs->drv becomes unpredictable on free. As long as it
remains null, I/O still gets dropped, however it could become non-null
at any point after the free resulting SEGVs or other QEMU state
corruption.
To resolve this issue as simply as possible, we can chose to not
actually delete the BlockDriverState pointer. Since bdrv_close()
handles setting the drv pointer to NULL, we just need to remove the
BlockDriverState from the QLIST that is used to enumerate the block
devices. This is currently handled within bdrv_delete, so move this
into its own function, bdrv_make_anon().
The result is that we can now invoke drive_del, this closes the file
descriptors and sets BlockDriverState->drv to NULL which prevents futher
IO to the device, and since we do not free BlockDriverState, we don't
have to worry about the copy retained in the block devices.
We also don't attempt to remove the qdev property since we are no longer
deleting the BlockDriverState on drives with associated drives. This
also allows for removing Drives with no devices associated either.
Reported-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
If the block device has been closed, we no longer have a medium to submit
IO against, check for this before submitting io. This prevents a segfault
further in the code where we dereference elements of the block driver.
Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add a trace event for bdrv_aio_flush() to complement the existing
bdrv_aio_readv() and bdrv_aio_writev() events.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Other geometry guessing functions already reside in block.c.
Remove some unused or debugging only fields.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Set block device in use during block migration, disallow drive_del and
bdrv_truncate for in use devices.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Certain operations such as drive_del or resize cannot be performed
while external users (eg. block migration) reference the block device.
Add a flag to indicate that.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Extend the change_cb callback with a reason argument, and use it
to tell drivers about size changes.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The backing format should be honored during image creation. For some
reason we currently use the image format to open the backing file. This
fails when the backing file has a different format than the image being
created. Keep the image and backing format drivers completely separate.
Also print the backing filename if there is an error opening the backing
file instead of the image filename.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Acked-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Avoid a warning with GCC 4.6.0:
/src/qemu/block.c: In function 'bdrv_img_create':
/src/qemu/block.c:2862:25: error: variable 'fmt' set but not used [-Werror=unused-but-set-variable]
CC: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Add a new bdrv_discard method to free blocks in a mapping image, and a new
drive property to set the granularity for these discard. If no discard
granularity support is set discard support is disabled.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Kevin suggested to have bdrv_img_create() return proper -errno values
on error.
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch re-factors img_create() moving the code doing the actual
work into block.c where it can be shared with QEMU. This is needed to
be able to create images from QEMU to be used for live snapshots.
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Backing filenames may contain a protocol. The code currently doesn't
consider this case and produces filenames that embed "<protocol>:".
Don't combine filenames if the backing filename contains a protocol.
Based on an earlier patch by Anthony Liguori <aliguori@us.ibm.com>.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The bdrv_find_protocol() function returns NULL if an unknown protocol
name is given. It returns the "file" protocol when the filename
contains no protocol at all. This makes it difficult to distinguish
between paths which contain a protocol and those which do not.
Factor out a helper function that tests whether or not a filename has a
protocol. The next patch makes use of this function.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Filenames may start with "<protocol>:" to explicitly use a protocol like
nbd. Filenames with unknown protocols are rejected in most of QEMU
except for bdrv_create_file(). Even if a file with an invalid filename
can be created, QEMU cannot use it since all the other relevant
functions reject such paths. Make bdrv_create_file() consistent.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Sectors are marked dirty in the bitmap on AIO submission. This is wrong
since data has not reached storage.
Set a given sector as dirty in the dirty bitmap on AIO completion, so that
reading a sector marked as dirty is guaranteed to return uptodate data.
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Otherwise upper 32 bits of bitmap entries are not correctly calculated.
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This changes bdrv_flush to return 0 on success and -errno in case of failure.
It's a requirement for implementing proper error handle in users of bdrv_flush.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
In order to backup snapshots, created from QCOW2 iamge, we want to copy snapshots out of QCOW2 disk to a seperate storage.
The following patch adds a new option in "qemu-img": qemu-img convert -f qcow2 -O qcow2 -s snapshot_name src_img bck_img.
Right now, it only supports to copy the full snapshot, delta snapshot is on the way.
Changes from V1: all the comments from Kevin are addressed:
Add read-only checking
Fix coding style
Change the name from bdrv_snapshot_load to bdrv_snapshot_load_tmp
Signed-off-by: Disheng Su <edison@cloud.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Observing block layer aio readv/writev operations is useful for
debugging image formats or understanding guest disk I/O patterns.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
This reverts commit 79368c81bf.
Conflicts:
block.c
I haven't been able to come up with a solution yet for the corruption caused by
unaligned requests from the IDE disk so revert until a solution can be written.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Arguably we should re-open the backing file with the backing file format and
not with the format of the snapshot image.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
bdrv_eject() gets called when a device model opens or closes the tray.
If the block driver implements method bdrv_eject(), that method gets
called. Drivers host_cdrom implements it, and it opens and closes the
physical tray, and nothing else. When a device model opens, then
closes the tray, media changes only if the user actively changes the
physical media while the tray is open. This is matches how physical
hardware behaves.
If the block driver doesn't implement method bdrv_eject(), we do
something quite different: opening the tray severs the connection to
the image by calling bdrv_close(), and closing the tray does nothing.
When the device model opens, then closes the tray, media is gone,
unless the user actively inserts another one while the tray is open,
with a suitable change command in the monitor. This isn't how
physical hardware behaves. Rather inconvenient when programs
"helpfully" eject media to give you a chance to change it. The way
bdrv_eject() behaves here turns that chance into a must, which is not
what these programs or their users expect.
Change the default action not to call bdrv_close(). Instead, note the
tray status in new BlockDriverState member tray_open. Use it in
bdrv_is_inserted().
Arguably, the device models should keep track of tray status
themselves. But this is less invasive.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Assuming that any image on a block device is not properly zero-initialized is
actually wrong: Only raw images have this problem. Any other image format
shouldn't care about it, they initialize everything properly themselves.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
bdrv_commit copies the image to its backing file sector by sector, which
is (surprise!) relatively slow. Let's take a larger buffer and handle more
sectors at once if possible.
With a 1G qcow2 file, this brought the time bdrv_commit takes down from
5:06 min to 1:14 min for me.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Block device change command did not copy BDRV_O_SNAPSHOT flag. Thus
the new image did not have this flag and the file got deleted during
opening.
Fix by copying BDRV_O_SNAPSHOT flag.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
"No such file or directory" is a misleading error message
when a user tries to open a file with wrong permissions.
Cc: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
CVE-2008-2004 described a vulnerability in QEMU whereas a malicious user could
trick the block probing code into accessing arbitrary files in a guest. To
mitigate this, we added an explicit format parameter to -drive which disabling
block probing.
Fast forward to today, and the vast majority of users do not use this parameter.
libvirt does not use this by default nor does virt-manager.
Most users want block probing so we should try to make it safer.
This patch adds some logic to the raw device which attempts to detect a write
operation to the beginning of a raw device. If the first 4 bytes happen to
match an image file that has a backing file that we support, it scrubs the
signature to all zeros. If a user specifies an explicit format parameter, this
behavior is disabled.
I contend that while a legitimate guest could write such a signature to the
header, we would behave incorrectly anyway upon the next invocation of QEMU.
This simply changes the incorrect behavior to not involve a security
vulnerability.
I've tested this pretty extensively both in the positive and negative case. I'm
not 100% confident in the block layer's ability to deal with zero sized writes
particularly with respect to the aio functions so some additional eyes would be
appreciated.
Even in the case of a single sector write, we have to make sure to invoked the
completion from a bottom half so just removing the zero sized write is not an
option.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This distinguishes between harmless leaks and real corruption. Hopefully users
better understand what qemu-img check wants to tell them.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
People think that their images are corrupted when in fact there are just some
leaked clusters. Differentiating several error cases should make the messages
more comprehensible.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Don't try to be clever by freeing all temporary data and calling all callbacks
when the return value (an error) is certain. Doing so has at least two
important problems:
* The temporary data that is freed (qiov, possibly zero buffer) is still used
by the requests that have not yet completed.
* Calling the callbacks for all requests in the multiwrite means for the caller
that it may free buffers etc. which are still in use.
Just remember the error value and do the cleanup when all requests have
completed.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
bdrv_aio_writev may call the callback immediately (and it will commonly do so
in error cases). Current code doesn't consider this. For details see the
comment added by this patch.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
BlockDriverState member removable controls whether virtual media
change (monitor commands change, eject) is allowed. It is set when
the "type hint" is BDRV_TYPE_CDROM or BDRV_TYPE_FLOPPY.
The type hint is only set by drive_init(). It sets BDRV_TYPE_FLOPPY
for if=floppy. It sets BDRV_TYPE_CDROM for media=cdrom and if=ide,
scsi, xen, or none.
if=ide and if=scsi work, because the type hint makes it a CD-ROM.
if=xen likewise, I think.
For the same reason, if=none works when it's used by ide-drive or
scsi-disk. For other guest devices, there are problems:
* fdc: you can't change virtual media
$ qemu [...] -drive if=none,id=foo,... -global isa-fdc.driveA=foo
QEMU 0.12.50 monitor - type 'help' for more information
(qemu) eject foo
Device 'foo' is not removable
unless you add media=cdrom, but that makes it readonly.
* virtio: if you add media=cdrom, you can change virtual media. If
you eject, the guest gets I/O errors. If you change, the guest sees
the drive's contents suddenly change.
* scsi-generic: if you add media=cdrom, you can change virtual media.
I didn't test what that does to the guest or the physical device,
but it can't be pretty.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
savevm.c keeps a pointer to the snapshot block device. If you manage
to get that device deleted, the pointer dangles, and the next snapshot
operation will crash & burn. Unplugging a guest device that uses it
does the trick:
$ MALLOC_PERTURB_=234 qemu-system-x86_64 [...]
QEMU 0.12.50 monitor - type 'help' for more information
(qemu) info snapshots
No available block device supports snapshots
(qemu) drive_add auto if=none,file=tmp.qcow2
OK
(qemu) device_add usb-storage,id=foo,drive=none1
(qemu) info snapshots
Snapshot devices: none1
Snapshot list (from none1):
ID TAG VM SIZE DATE VM CLOCK
(qemu) device_del foo
(qemu) info snapshots
Snapshot devices:
Segmentation fault (core dumped)
Move management of that pointer to block.c, and zap it when the device
it points becomes unusable.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
For instance, -device scsi-disk,drive=foo -device scsi-disk,drive=foo
happily creates two SCSI disks connected to the same block device.
It's all downhill from there.
Device usb-storage deliberately attaches twice to the same blockdev,
which fails with the fix in place. Detach before the second attach
there.
Also catch attempt to delete while a guest device model is attached.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
To fix https://bugs.launchpad.net/qemu/+bug/597402 where qemu fails to
call unlink() on temporary snapshots due to bs->is_temporary getting clobbered
in bdrv_open_common() after being set in bdrv_open() which calls the former.
We don't need to initialize bs->is_temporary in bdrv_open_common().
Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Before the raw/file split we used to allow filenames with colons for host
device only. While this was more by accident than by design people rely
on it, so we need to bring it back.
So move the host device probing to be before the protocol detection
again.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add new functions that write and flush the written data to disk immediately.
This is what needs to be used for image format metadata to maintain integrity
for cache=... modes that don't use O_DSYNC. (Actually, we only need barriers,
and therefore the functions are defined as such, but flushes is what is
implemented in this patch - we can try to change that later)
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Fix a warning from OpenBSD gcc (3.3.5 (propolice)):
/src/qemu/block.c: In function `bdrv_info_stats_bs':
/src/qemu/block.c:1548: warning: long long int format, long unsigned
int arg (arg 6)
There may be also truncation effects.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This is a more flexible alternative to bdrv_iterate().
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
do_commit() and mux_proc_byte() iterate over the list of drives
defined with drive_init(). This misses host block devices defined by
other means. Such means don't exist now, but will be introduced later
in this series.
Change them to use new bdrv_commit_all(), which iterates over all host
block devices.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
That's where they belong semantically (block device host part), even
though the actions are actually executed by guest device code.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Both bdrv_can_snapshot() and bdrv_has_snapshot() does not work as advertized.
First issue: Their names implies different porpouses, but they do the same thing
and have exactly the same code. Maybe copied and pasted and forgotten?
bdrv_has_snapshot() is called in various places for actually checking if there
is snapshots or not.
Second issue: the way bdrv_can_snapshot() verifies if a block driver supports or
not snapshots does not catch all cases. E.g.: a raw image.
So when do_savevm() is called, first thing it does is to set a global
BlockDriverState to save the VM memory state calling get_bs_snapshots().
static BlockDriverState *get_bs_snapshots(void)
{
BlockDriverState *bs;
DriveInfo *dinfo;
if (bs_snapshots)
return bs_snapshots;
QTAILQ_FOREACH(dinfo, &drives, next) {
bs = dinfo->bdrv;
if (bdrv_can_snapshot(bs))
goto ok;
}
return NULL;
ok:
bs_snapshots = bs;
return bs;
}
bdrv_can_snapshot() may return a BlockDriverState that does not support
snapshots and do_savevm() goes on.
Later on in do_savevm(), we find:
QTAILQ_FOREACH(dinfo, &drives, next) {
bs1 = dinfo->bdrv;
if (bdrv_has_snapshot(bs1)) {
/* Write VM state size only to the image that contains the state */
sn->vm_state_size = (bs == bs1 ? vm_state_size : 0);
ret = bdrv_snapshot_create(bs1, sn);
if (ret < 0) {
monitor_printf(mon, "Error while creating snapshot on '%s'\n",
bdrv_get_device_name(bs1));
}
}
}
bdrv_has_snapshot(bs1) is not checking if the device does support or has
snapshots as explained above. Only in bdrv_snapshot_create() the device is
actually checked for snapshot support.
So, in cases where the first device supports snapshots, and the second does not,
the snapshot on the first will happen anyways. I believe this is not a good
behavior. It should be an all or nothing process.
This patch addresses these issues by making bdrv_can_snapshot() actually do
what it must do and enforces better tests to avoid errors in the middle of
do_savevm(). bdrv_has_snapshot() is removed and replaced by bdrv_can_snapshot()
where appropriate.
bdrv_can_snapshot() was moved from savevm.c to block.c. It makes more sense to me.
The loadvm_state() function was updated too to enforce that when loading a VM at
least all writable devices must support snapshots too.
Signed-off-by: Miguel Di Ciurcio Filho <miguel.filho@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
When snapshot handlers are not defined in the format driver, it is
better to call the ones of the protocol driver. This enables us to
implement snapshot support in the protocol driver.
We need to call bdrv_close() and bdrv_open() handlers of the format
driver before and after bdrv_snapshot_goto() call of the protocol. It is
because the contents of the block driver state may need to be changed
after loading vmstate.
Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch calls the close handler of the block driver before the qemu
process exits.
This is necessary because the sheepdog block driver releases the lock
of VM images in the close handler.
Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
qemu -cdrom /dev/cdrom with an empty CD-ROM drive doesn't work any more because
we try to guess the format and when this fails (because there is no medium) we
exit with an error message.
This patch should restore the old behaviour by assuming raw format for such
drives.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Clean up block.c and use BDRV_SECTOR_SIZE rather than hard coded
numbers (512) when referring to sector size throughout the code.
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
In bdrv_open() there is no need to shift total_size >> 9 just to
multiply it by 512 again just a few lines later, since this is the
only place the variable is used.
Mask with BDRV_SECTOR_MASK to protect against case where we are
passed a corrupted image.
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Previous commit added QMP documentation to the qemu-monitor.hx
file, it's is a copy of this information.
While it's good to keep it near code, maintaining two copies of
the same information is too hard and has little benefit as we
don't expect client writers to consult the code to find how to
use a QMP command.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This patch adds a missing bdrv_delete() call in find_image_format() so that a
SG_IO BlockDriver properly releases the temporary BlockDriverState *bs created
from bdrv_file_open()
Signed-off-by: Nicholas A. Bellinger <nab@linux-iscsi.org>
Reported-by: Chris Krumme <chris.krumme@windriver.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch enables protocol drivers to use their create options which
are not supported by the format. For example, protcol drivers can use
a backing_file option with raw format.
Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
With overlapping requests, the total number of sectors is smaller than the sum
of the nb_sectors of both requests.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Usually the guest can tell the host to flush data to disk. In some cases we
don't want to flush though, but try to keep everything in cache.
So let's add a new cache value to -drive that allows us to set the cache
policy to most aggressive, disabling flushes. We call this mode "unsafe",
as guest data is not guaranteed to survive host crashes anymore.
This patch also adds a noop function for aio, so we can do nothing in AIO
fashion.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
This patch adds a special case check for scsi-generic devices in
refresh_total_sectors() to skip the subsequent BlockDriver->bdrv_getlength()
that will be returning -ESPIPE from block/raw-posic.c:raw_getlength() for
BlockDriverState->sg=1 devices.
Signed-off-by: Nicholas A. Bellinger <nab@linux-iscsi.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch adds a special BlockDriverState->sg check in block.c:find_image_format()
after bdrv_file_open() -> block/raw-posix.c:hdev_open() has been called to determine
if we are dealing with a Linux host scsi-generic device.
The patch then returns the BlockDriver * from bdrv_find_format("raw"), skipping the
subsequent bdrv_read() and rest of find_image_format().
Signed-off-by: Nicholas A. Bellinger <nab@linux-iscsi.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The difference between the start sectors of two requests can be larger
than the size of the "int" type, which can lead to a not correctly
sorted multiwrite array and thus spurious I/O errors and filesystem
corruption due to incorrect request merges.
So instead of doing the cute sector arithmetics trick spell out the
exact comparisms.
Spotted by Kevin Wolf based on a testcase from Michael Tokarev.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The special case doesn't really us buy anything. Without it vvfat works more
consistently as a protocol. We get raw on top of vvfat now, which works just
as well as using vvfat directly.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The 'parent' field in the 'query-blockstats' monitor command is
part of the top level block device QDict, not part of the 2nd
level 'stats' QDict.
* block.c: Fix docs for 'parent' field in block stats monitor
command output
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
There is a call to free() where qemu_free() should instead be used.
Signed-off-by: Bruce Rogers <brogers@novell.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
When reopening the image, don't guess the driver, but use the same driver as
was used before. This is important if the format=... option was used for that
image.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We can't assume the file protocol for Windows devices, they need the same
detection as other files for which an explicit protocol is not specified.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
They aren't used afterwards nor supposed to be stored by a bdrv_create
handler.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This adds the wr_highest_sector blockstat which implements what is generally
known as the high watermark. It is the highest offset of a sector written to
the respective BlockDriverState since it has been opened.
The query-blockstat QMP command is extended to add this value to the result,
and also to add the statistics of the underlying protocol in a new "parent"
field. Note that to get the "high watermark" of a qcow2 image, you need to look
into the wr_highest_sector field of the parent (which can be a file, a
host_device, ...). The wr_highest_sector of the qcow2 BlockDriverState itself
is the highest offset on the _virtual_ disk that the guest has written to.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The BlockDriver bdrv_getlength function is called from the I/O code path
when checking that the request falls within the device. Unfortunately
this involves an lseek system call in the raw protocol; every read or
write request will incur this lseek cost.
Jan Kiszka <jan.kiszka@siemens.com> identified this issue and its
latency overhead. This patch caches device length in the existing
total_sectors variable so lseek calls can be avoided for fixed size
devices.
Growable devices fall back to the full bdrv_getlength code path because
I have not added logic to detect extending the size of the device in a
write.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
It is safer to set backing_hd to NULL after deleting it so that any use
after deletion is obvious during development. Happy segfaulting!
This patch should be applied after Kevin Wolf's "vmdk: Convert to
bdrv_open" so that vmdk does not segfault on close.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This fixes the problem that qemu-img's use of no_zero_init only considered the
no_zero_init flag of the format driver, but not of the underlying protocols.
Between the raw/file split and this fix, converting to host devices is broken.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Format drivers shouldn't need to bother with things like file names, but rather
just get an open BlockDriverState for the underlying protocol. This patch
introduces this behaviour for bdrv_open implementation. For protocols which
need to access the filename to open their file/device/connection/... a new
callback bdrv_file_open is introduced which doesn't get an underlying file
opened.
For now, also some of the more obscure formats use bdrv_file_open because they
open() the file themselves instead of using the block.c functions. They need to
be fixed in later patches.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
bdrv_open contains quite some code that is only useful for opening images (as
opposed to opening files by a protocol), for example snapshots.
This patch splits the code so that we have bdrv_open_file() for files (uses
protocols), bdrv_open() for images (uses format drivers) and bdrv_open_common()
for the code common for opening both images and files.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We're running into various problems because the "raw" file access, which
is used internally by the various image formats is entangled with the
"raw" image format, which maps the VM view 1:1 to a file system.
This patch renames the raw file backends to the file protocol which
is treated like other protocols (e.g. nbd and http) and adds a new
"raw" image format which is just a wrapper around calls to the underlying
protocol.
The patch is surprisingly simple, besides changing the probing logical
in block.c to only look for image formats when using bdrv_open and
renaming of the old raw protocols to file there's almost nothing in there.
For creating images, a new bdrv_create_file is introduced which guesses the
protocol to use. This allows using qemu-img create -f raw (or just using the
default) for both files and host devices. Converting the other format drivers
to use this function to create their images is left for later patches.
The only issues still open are in the handling of the host devices.
Firstly in current qemu we can specifiy the host* format names
on various command line acceping images, but the new code can't
do that without adding some translation. Second the layering breaks
the no_zero_init flag in the BlockDriver used by qemu-img. I'm not
happy how this is done per-driver instead of per-state so I'll
prepare a separate patch to clean this up.
There's some more cleanup opportunity after this patch, e.g. using
separate lists and registration functions for image formats vs
protocols and maybe even host drivers, but this can be done at a
later stage.
Also there's a check for protocol in bdrv_open for the BDRV_O_SNAPSHOT
case that I don't quite understand, but which I fear won't work as
expected - possibly even before this patch.
Note that this patch requires various recent block patches from Kevin
and me, which should all be in his block queue.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
A new iovec array is allocated when creating a merged write request.
This patch ensures that the iovec array is deleted in addition to its
qiov owner.
Reported-by: Leszek Urbanski <tygrys@moo.pl>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The bdrv_first linked list of BlockDriverStates is currently extern so
that block migration can iterate the list. However, since there is
already a bdrv_iterate() function there is no need to expose bdrv_first.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
BDRV_O_FILE is only used to communicate between bdrv_file_open and bdrv_open.
It affects two things: first bdrv_open only searches for protocols using
find_protocol instead of all image formats and host drivers. We can easily
move that to the caller and pass the found driver to bdrv_open. Second
it is used to not force a read-write open of a snapshot file. But we never
use bdrv_file_open to open snapshots and this behaviour doesn't make sense
to start with.
qemu-io abused the BDRV_O_FILE for it's growable option, switch it to
using bdrv_file_open to make sure we only open files as growable were
we can actually support that.
This patch requires Kevin's "[PATCH] Replace calls of old bdrv_open" to
be applied first.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
What is known today as bdrv_open2 becomes the new bdrv_open. All remaining
callers of the old function are converted to the new one. In some places they
even know the right format, so they should have used bdrv_open2 from the
beginning.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Block drivers can trigger a blkdebug event whenever they reach a place where it
could be useful to inject an error for testing/debugging purposes.
Rules are read from a blkdebug config file and describe which action is taken
when an event is triggered. For now this is only injecting an error (with a few
options) or changing the state (which is an integer). Rules can be declared to
be active only in a specific state; this way later rules can distiguish on
which path we came to trigger their event.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Previously multiwrite_user_cb was never called if a request in the multiwrite
batch failed right away because it did set mcb->error immediately. Make it look
more like a normal callback to fix this.
Reported-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
When two requests of the same multiwrite batch fail, the callback of all
requests in that batch were called twice. This could have any kind of nasty
effects, in my case it lead to use after free and eventually a segfault.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Open backing file read-only where possible
Upgrade backing file to read-write during commit, back to read-only after commit
If upgrade fail, back to read-only. If also fail, "disconnect" the drive.
Signed-off-by: Naphtali Sprei <nsprei@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
It's not needed to check the return of qobject_from_jsonf()
anymore, as an assert() has been added there.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Clean up the current mess about figuring out which flags to pass to the
driver. BDRV_O_FILE, BDRV_O_SNAPSHOT and BDRV_O_NO_BACKING are flags
only used by the block layer internally so filter them out directly.
Previously BDRV_O_NO_BACKING could accidentally be passed to the drivers,
but wasn't ever used.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This commit introduces the bdrv_mon_event() function, which
should be called by block subsystems (eg. IDE) when a I/O
error occurs, so that an QMP event is emitted.
The following information is currently provided in the event:
- device name
- operation (ie. "read" or "write")
- action taken (eg. "stop")
Event example:
{ "event": "BLOCK_IO_ERROR",
"data": { "device": "ide0-hd1",
"operation": "write",
"action": "stop" },
"timestamp": { "seconds": 1265044230, "microseconds": 450486 } }
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This will manage dirty counter for each device and will allow to get the
dirty counter from above.
Signed-off-by: Liran Schour <lirans@il.ibm.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
If we go over the maximum number of iovecs support by syscall we get
back EINVAL from the kernel which translate to I/O errors for the guest.
Add a MAX_IOV defintion for platforms that don't have it. For now we use
the same 1024 define that's used on Linux and various other platforms,
but until the windows block backend implements some kind of vectored I/O
it doesn't matter.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Win32 suffers from a very big memory leak when dealing with SCSI devices.
Each read/write request allocates memory with qemu_memalign (ie
VirtualAlloc) but frees it with qemu_free (ie free).
Pair all qemu_memalign() calls with qemu_vfree() to prevent such leaks.
Signed-off-by: Herve Poussineau <hpoussin@reactos.org>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Check the whitelist as early as possible instead of continuing the
setup, and move all the error handling code to the end of the
function.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
If we go over the maximum number of iovecs support by syscall we get
back EINVAL from the kernel which translate to I/O errors for the guest.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Instead of using the field 'readonly' of the BlockDriverState struct for passing the request,
pass the request in the flags parameter to the function.
Signed-off-by: Naphtali Sprei <nsprei@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
The backing device is only modified from bdrv_commit. So instead of
flushing it every time bdrv_flush is called for the front-end device
only flush it after we're written data to it in bdrv_commit.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Introduce the functions needed to change the backing file of an image. The
function is implemented for qcow2.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
If an image references a backing file that doesn't exist, qemu-img info fails
to open this image. Exactly in this case the info would be valuable, though:
the user might want to find out which file is missing.
This patch introduces a BDRV_O_NO_BACKING flag to ignore the backing file when
opening the image. qemu-img info is the first user and provides info now even
if the backing file is invalid.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
CC block.o
cc1: warnings being treated as errors
block.c: In function 'bdrv_open2':
block.c:400: error: ignoring return value of 'realpath', declared with attribute warn_unused_result
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Each device statistic information is stored in a QDict and
the returned QObject is a QList of all devices.
This commit should not change user output.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Each block device information is stored in a QDict and the
returned QObject is a QList of all devices.
This commit should not change user output.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This switches the dirty bitmap to a true bitmap, reducing its footprint
(specifically in caches). It moreover fixes off-by-one bugs in
set_dirty_bitmap (nb_sectors+1 were marked) and bdrv_get_dirty (limit
check allowed one sector behind end of drive). And is drops redundant
dirty_tracking field from BlockDriverState.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Instead of duplicating the definition of constants or introducing
trivial retrieval functions move the SECTOR constants into the public
block API. This also obsoletes sector_per_block in BlkMigState.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
To support live migration without shared storage we need to be able to trace
writes to disk while migrating. This Patch expose dirty block tracking per
device to be polled from upper layer.
Changes from v4:
- Register dirty tracking for each block device.
- Minor coding style issues.
- Block.c will now manage a dirty bitmap per device once
bdrv_set_dirty_tracking() is called. Bitmap is polled by the upper
layer (block-migration.c).
Signed-off-by: Liran Schour <lirans@il.ibm.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
We have code for a quite a few block formats. While I trust that all
of these formats are useful at least for some people in some
circumstances, some of them are of a kind that friends don't let
friends use in production.
This patch provides an optional block format whitelist, default off.
If a whitelist is configured with --block-drv-whitelist, QEMU proper
can use only whitelisted formats. Other programs, like qemu-img, are
not affected.
Drivers for formats off the whitelist still participate in format
probing, to ensure all programs probe exactly the same. Without that,
QEMU proper would be prone to treat images with a format off the
whitelist as raw when the image's format is probed.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This is a slightly revised patch for adding readonly flag to the -drive command.
Even though this patch is "stand-alone", it assumes a previous related patch (in Anthony staging tree), that passes
the readonly attribute of the drive to the guest OS, applied first.
This enables sharing same image between guests, with readonly access.
Implementaion mark the drive as read_only and changes the flags when actually opening the file.
The readonly attribute of a qcow also passed to it's base file.
For ide that cannot pass the readonly attribute to the guest OS, disallow the readonly flag.
Also, return error code from bdrv_truncate for readonly drive.
Signed-off-by: Naphtali Sprei <nsprei@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
bdrv_read/write emulation is used as the perfect example why we need something
like AsyncContexts. So maybe they better start using it.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Problem: Our file sys-queue.h is a copy of the BSD file, but there are
some additions and it's not entirely compatible. Because of that, there have
been conflicts with system headers on BSD systems. Some hacks have been
introduced in the commits 15cc923584,
f40d753718,
96555a96d7 and
3990d09adf but the fixes were fragile.
Solution: Avoid the conflict entirely by renaming the functions and the
file. Revert the previous hacks.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Instead stalling the VCPU while serving a cache flush try to do it
asynchronously. Use our good old helper thread pool to issue an
asynchronous fdatasync for raw-posix. Note that while Linux AIO
implements a fdatasync operation it is not useful for us because
it isn't actually implement in asynchronous fashion.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Add a enable_write_cache flag in the block driver state, and use it to
decide if we claim to have a volatile write cache that needs controlled
flushing from the guest. The flag is off if cache=writethrough is
defined because O_DSYNC guarantees that every write goes to stable
storage, and it is on for cache=none and cache=writeback.
Both scsi-disk and ide now use the new flage, changing from their
defaults of always off (ide) or always on (scsi-disk).
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
One performance problem of qcow2 during the initial image growth are
sequential writes that are not cluster aligned. In this case, when a first
requests requires to allocate a new cluster but writes only to the first
couple of sectors in that cluster, the rest of the cluster is zeroed - just
to be overwritten by the following second request that fills up the cluster.
Let's try to merge sequential write requests to the same cluster, so we can
avoid to write the zero padding to the disk in the first place.
As a nice side effect, also other formats take advantage of dealing with less
and larger requests.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Now that do have a nicer interface to work against we can add Linux native
AIO support. It's an extremly thing layer just setting up an iocb for
the io_submit system call in the submission path, and registering an
eventfd with the qemu poll handler to do complete the iocbs directly
from there.
This started out based on Anthony's earlier AIO patch, but after
estimated 42,000 rewrites and just as many build system changes
there's not much left of it.
To enable native kernel aio use the aio=native sub-command on the
drive command line. I have also added an option to qemu-io to
test the aio support without needing a guest.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
The VM state offset is a concept internal to the image format. Replace
the old bdrv_{get,put}_buffer method that require an index into the
image file that is constructed from the VM state offset and an offset
into the vmstate with the bdrv_{load,save}_vmstate that just take an
offset into the VM state.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Commit 6a7ad299 ("Call qemu_bh_delete at bdrv_aio_bh_cb") deletes emulated
aio bottom halves to prevent endless accumulation. However, it leaves a
stale ->bh pointer, which is then waited on when the aio is reused.
Zeroing the pointer fixes the issue, allowing vmdk format images to be used.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Fix missing strnlen (a GNU extension) problems by using qemu_strnlen
used for user emulators also for system emulators.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Problem: It is impossible to feed filenames with the character colon because
qemu interprets such names as a protocol. For example filename scsi:0, is
interpreted as a protocol by name "scsi".
This patch allows user to espace colon characters. For example the above
filename can now be expressed either as 'scsi\:0' or as file:scsi:0
anything following the "file:" tag is interpreted verbatin. However if "file:"
tag is omitted then any colon characters in the string must be escaped using
backslash.
Here are couple of examples:
scsi\:0\:abc is a local file scsi:0:abc
http\://myweb is a local file by name http://myweb
file:scsi:0:abc is a local file scsi:0:abc
file:http://myweb is a local file by name http://myweb
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Section 10.8.25 ("START/STOP UNIT Command") of SFF-8020i states that
if the device is locked we should refuse to eject if the device is
locked.
ASC_MEDIA_REMOVAL_PREVENTED is the appropriate return in this case.
In order to stop itself from ejecting the media it is running from,
Fedora's installer (anaconda) requires the CDROMEJECT ioctl() to fail
if the drive has been previously locked.
See also https://bugzilla.redhat.com/501412
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Also replave qemu_bh_cancel with qemu_bh_delete in bdrv_aio_cancel_em.
Otherwise the bh will live forever in the bh list.
Signed-off-by: Dor Laor <dor@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Add a bdrv_probe_device method to all BlockDriver instances implementing
host devices to move matching of host device types into the actual drivers.
For now we keep exacly the old matching behaviour based on the devices names,
although we really should have better detetion methods based on device
information in the future.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Instead of declaring one BlockDriver for all host devices declared one
for each type: a generic one for normal disk devices, a Linux floppy
driver and a CDROM driver for Linux and FreeBSD. This gets rid of a lot
of messy ifdefs and switching based on the type in the various removal
device methods.
block.c grows a new method to find the correct host device driver based
on OS-sepcific criteria, which will later into the actual drivers in a
later patch in this series.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Now that we have a separate aio pool structure we can remove those
aio pool details from BlockDriver.
Every driver supporting AIO now needs to declare a static AIOPool
with the aiocb size and the cancellation method. This cleans up the
current code considerably and will make it cleaner and more obvious
to support two different aio implementations behind a single
BlockDriver.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This patch converts the remaining users of bdrv_create2 to bdrv_create and
removes the now unused function.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Now we can make use of the newly introduced option structures. Instead of
having bdrv_create carry more and more parameters (which are format specific in
most cases), just pass a option structure as defined by the driver itself.
bdrv_create2() contains an emulation of the old interface to simplify the
transition.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This patch makes the range checks for block requests more strict: It fixes a
potential integer overflow and checks for negative offsets. Also, it adds the
check for compressed writes.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
this patch adds a buffer_alignment field to BlockDriverState and
implements a qemu_blockalign function that uses that field to allocate a
memory aligned buffer to be used by the block driver.
buffer_alignment is initialized to 512 but each block driver can set
a different value (at the moment none of them do).
This patch modifies ide.c, block-qcow.c, block-qcow2.c and block.c to
use qemu_blockalign instead of qemu_memalign.
There is only one place left that still uses qemu_memalign to allocate
buffers used by block drivers that is posix-aio-compat:handle_aiocb_rw
because it is not possible to get the BlockDriverState from that
function. However I think it is not important because posix-aio-compat
already deals with driver specific code so it is supposed to know its
own needs.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@7229 c046a42c-6fe2-441c-8c8c-71466251a162
From: Kevin Wolf <kwolf@redhat.com>
Introduce a new bdrv_check function pointer for block drivers. Modify qcow2 to
return an error status in check_refcounts(), so it can implement bdrv_check.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@7214 c046a42c-6fe2-441c-8c8c-71466251a162
This ties up the preadv/pwritev syscalls to qemu if they are declared in
unistd.h. This is the case currently on at least NetBSD and OpenBSD and
will hopefully soon be the case on Linux.
Thanks to Blue Swirl and Gerd Hoffmann for the configure autodetection
of preadv/pwritev.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@7021 c046a42c-6fe2-441c-8c8c-71466251a162
Make all AIO requests vectored and defer linearization until the actual
I/O thread. This prepares for using native preadv/pwritev.
Also enables asynchronous direct I/O by handling that case in the I/O thread.
Qcow and qcow2 propably want to be adopted to directly deal with multi-segment
requests, but that can be implemented later.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@7020 c046a42c-6fe2-441c-8c8c-71466251a162
Always use the vectored APIs to reduce code churn once we switch the BlockDriver
API to be vectored.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@7019 c046a42c-6fe2-441c-8c8c-71466251a162
We now enforce that you cannot write beyond the end of a non-growable file.
qcow2 files are not growable but we rely on them being growable to do
savevm/loadvm. Temporarily allow them to be growable by introducing a new
API specifically for savevm read/write operations.
Reported-by: malc
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6994 c046a42c-6fe2-441c-8c8c-71466251a162
All the bdrv_ helpers should check for bs->drv being zero as that means
there is no backend image open. bdrv_flush fails to perform that check
and can thus cause NULL pointer dereferences.
Found using qemu-io.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6943 c046a42c-6fe2-441c-8c8c-71466251a162
Remove code dealing with negative sector numbers for byte access in
bdrv_check_request as sector numbers can't ever be negative.
Previously we supported negative sector counts for byte access, but
never sector numbers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6942 c046a42c-6fe2-441c-8c8c-71466251a162
Added a backing_format field to BlockDriverState.
Added bdrv_create2 and drv->bdrv_create2 to create an image with
a known backing file format.
Upon bdrv_open2 if backing format is known use it, instead of
probing the (backing) image.
Signed-off-by: Uri Lublin <uril@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6908 c046a42c-6fe2-441c-8c8c-71466251a162
Okay, I started looking into how to handle scsi-generic I/O in the
new world order.
I think the best is to use the SG_IO ioctl instead of the read/write
interface as that allows us to support scsi passthrough on disk/cdrom
devices, too. See Hannes patch on the kvm list from August for an
example.
Now that we always do ioctls we don't need another abstraction than
bdrv_ioctl for the synchronous requests for now, and for asynchronous
requests I've added a aio_ioctl abstraction keeping it simple.
Long-term we might want to move the ops to a higher-level abstraction
and let the low-level code fill out the request header, but I'm lazy
enough to leave that to the people trying to support scsi-passthrough
on a non-Linux OS.
Tested lightly by issuing various sg_ commands from sg3-utils in a guest
to a host CDROM device.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6895 c046a42c-6fe2-441c-8c8c-71466251a162
If a bounced vectored aio fails immediately (the inner aio submission
returning NULL) then the bounce handler erronously returns an aio
request which will never be completed (and which crashes when cancelled).
Fix by detecting that the inner request has failed and propagating the
error.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6892 c046a42c-6fe2-441c-8c8c-71466251a162
Now that we have a dedicated acb pool for vector translation acbs, we can
store the vector translation state in the acbs instead of in an external
structure.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6873 c046a42c-6fe2-441c-8c8c-71466251a162
This allows us to remove a hack in the vectored aio cancellation code.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6871 c046a42c-6fe2-441c-8c8c-71466251a162
Move the AIOCB allocation code to use a dedicate structure, AIOPool. AIOCB
specific information, such as the AIOCB size and cancellation routine, is
moved into the pool.
At present, there is exactly one pool per block format driver, maintaining
the status quo.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6870 c046a42c-6fe2-441c-8c8c-71466251a162
Now that scsi generic no longer uses bdrv_pread() and bdrv_pwrite(), we can
drop the corresponding internal APIs, which overlap bdrv_read()/bdrv_write()
and, being byte oriented, are unnatural for a block device.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6824 c046a42c-6fe2-441c-8c8c-71466251a162
Add an internal API for the generic block layer to send scsi generic commands
to block format driver. This means block format drivers no longer need
to consider overloaded nb_sectors parameters.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6823 c046a42c-6fe2-441c-8c8c-71466251a162
When a scsi device is backed by a scsi generic device instead of an
ordinary host block device, the block API is abused in a couple of annoying
ways:
- nb_sectors is negative, and specifies a byte count instead of a sector count
- offset is ignored, since scsi-generic is essentially a packet protocol
This overloading makes hacking the block layer difficult. Remove it by
introducing a new explicit API for scsi-generic devices. The new API
is still backed by the old implementation, but at least the users are
insulated.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6822 c046a42c-6fe2-441c-8c8c-71466251a162
This series is broken by design as it requires expensive IO operations at
open time causing very long delays when starting a virtual machine for the
first time.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6814 c046a42c-6fe2-441c-8c8c-71466251a162
This series is broken by design as it requires expensive IO operations at
open time causing very long delays when starting a virtual machine for the
first time.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6813 c046a42c-6fe2-441c-8c8c-71466251a162
We want to globally define WIN_LEAN_AND_MEAN and WINVER to particular values so
let's do it in OS_CFLAGS.
Then, we can pepper in windows.h includes where using #includes that require it.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6783 c046a42c-6fe2-441c-8c8c-71466251a162
Refactor the monitor API and prepare it for decoupled terminals:
term_print functions are renamed to monitor_* and all monitor services
gain a new parameter (mon) that will once refer to the monitor instance
the output is supposed to appear on. However, the argument remains
unused for now. All monitor command callbacks are also extended by a mon
parameter so that command handlers are able to pass an appropriate
reference to monitor output services.
For the case that monitor outputs so far happen without clearly
identifiable context, the global variable cur_mon is introduced that
shall once provide a pointer either to the current active monitor (while
processing commands) or to the default one. On the mid or long term,
those use case will be obsoleted so that this variable can be removed
again.
Due to the broad usage of the monitor interface, this patch mostly deals
with converting users of the monitor API. A few of them are already
extended to pass 'mon' from the command handler further down to internal
functions that invoke monitor_printf.
At this chance, monitor-related prototypes are moved from console.h to
a new monitor.h. The same is done for the readline API.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6711 c046a42c-6fe2-441c-8c8c-71466251a162
Currently, waiting for the user to type in some password blocks the
whole VM because monitor_readline starts its own I/O loop. And this loop
also screws up reading passwords from virtual console.
Patch below fixes the shortcomings by using normal I/O processing also
for waiting on a password. To keep to modal property for the monitor
terminal, the command handler is temporarily replaced by a password
handler and a callback infrastructure is established to process the
result before switching back to command mode.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6710 c046a42c-6fe2-441c-8c8c-71466251a162
Reading the passwords for encrypted hard disks during early startup is
broken (I guess for quiet a while now):
- No monitor terminal is ready for input at this point
- Forcing all mux'ed terminals into monitor mode can confuse other
users of that channels
To overcome these issues and to lay the ground for a clean decoupling of
monitor terminals, this patch changes the initial password inquiry as
follows:
- Prevent autostart if there is some encrypted disk
- Once the user tries to resume the VM, prompt for all missing
passwords
- Only resume if all passwords were accepted
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6707 c046a42c-6fe2-441c-8c8c-71466251a162
If the backing file is encrypted, 'info block' currently does not report
the disk as encrypted. Fix this by using the standard API to check disk
encryption mode. Moreover, switch to a canonical output format.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6706 c046a42c-6fe2-441c-8c8c-71466251a162
Introduce bdrv_get_encrypted_filename service to allow more informative
password prompting.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6704 c046a42c-6fe2-441c-8c8c-71466251a162
Make bdrv_iterate more useful by passing the BlockDriverState to the
iterator instead of the device name.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6703 c046a42c-6fe2-441c-8c8c-71466251a162
Make sure that we always delete temporary disk images on error, remove
obsolete malloc error checks and return proper error codes.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6702 c046a42c-6fe2-441c-8c8c-71466251a162
Introduce a growable flag that's set by bdrv_file_open(). Block devices should
never be growable, only files that are being used by block devices.
I went through Fabrice's early comments about the patch that was first applied.
While I disagree with that patch, I also disagree with Fabrice's suggestion.
There's no good reason to do the checks in the block drivers themselves. It
just increases the possibility that this bug could show up again. Since we're
calling bdrv_getlength() to determine the length, we're giving the block drivers
a chance to chime in and let us know what range is valid.
Basically, this patch makes the BlockDriver API guarantee that all requests are
within 0..bdrv_getlength() which to me seems like a Good Thing.
What do others think?
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6677 c046a42c-6fe2-441c-8c8c-71466251a162
'num_free_bytes' is the number of non-allocated bytes below highest-allocation.
It's useful, together with the highest-allocation, to figure out how
fragmented the image is, and how likely it will run out-of-space soon.
For example when the highest allocation is high (almost end-of-disk), but
many bytes (clusters) are free, and can be re-allocated when neeeded, than
we know it's probably not going to reach end-of-disk-space soon.
Added bookkeeping to block-qcow2.c
Export it using BlockDeviceInfo
Show it upon 'info blockstats' if BlockDeviceInfo exists
Signed-off-by: Uri Lublin <uril@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6407 c046a42c-6fe2-441c-8c8c-71466251a162
Most devices that are capable of DMA are also capable of scatter-gather.
With the memory mapping API, this means that the device code needs to be
able to access discontiguous host memory regions.
For block devices, this translates to vectored I/O. This patch implements
an aynchronous vectored interface for the qemu block devices. At the moment
all I/O is bounced and submitted through the non-vectored API; in the future
we will convert block devices to natively support vectored I/O wherever
possible.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6397 c046a42c-6fe2-441c-8c8c-71466251a162
the <sys/queue.h> system header. <sys/disk.h> uses SLIST_ENTRY
on NetBSD, which doesn't exist in sys-queue.h. Therefore,
include <sys/queue.h> before including sys-queue.h.
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5885 c046a42c-6fe2-441c-8c8c-71466251a162
Virtio will want to use the geometry detection code. It doesn't belong
in ide.c anyway.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5797 c046a42c-6fe2-441c-8c8c-71466251a162
Generate an option rom instead of using a hijacked boot sector for kernel
booting. This just requires adding a small option ROM header and a few more
instructions to the boot sector to take over the int19 vector and run our
boot code.
A disk is no longer needed when using -kernel on x86.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5650 c046a42c-6fe2-441c-8c8c-71466251a162
This patch changes the cache= option to accept none, writeback, or writethough
to control the host page cache behavior. By default, writethrough caching is
now used which internally is implemented by using O_DSYNC to open the disk
images. When using -snapshot, writeback is used by default since data integrity
it not at all an issue.
cache=none has the same behavior as cache=off previously. The later syntax is
still supported by now deprecated. I also cleaned up the O_DIRECT
implementation to avoid many of the #ifdefs.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5485 c046a42c-6fe2-441c-8c8c-71466251a162
This patch adds a bdrv_flush_all() function. It's necessary to ensure that all
IO operations have been flushed to disk before completely a live migration.
N.B. we don't actually use this now. We really should flush the block drivers
using an live savevm callback to avoid unnecessary guest down time.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5432 c046a42c-6fe2-441c-8c8c-71466251a162
This patch refactors the AIO layer to allow multiple AIO implementations. It's
only possible because of the recent signalfd() patch.
Right now, the AIO infrastructure is pretty specific to the block raw backend.
For other block devices to implement AIO, the qemu_aio_wait function must
support registration. This patch introduces a new function,
qemu_aio_set_fd_handler, which can be used to register a file descriptor to be
called back. qemu_aio_wait() now polls a set of file descriptors registered
with this function until one becomes readable or writable.
This patch should allow the implementation of alternative AIO backends (via a
thread pool or linux-aio) and AIO backends in non-traditional block devices
(like NBD).
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5297 c046a42c-6fe2-441c-8c8c-71466251a162
Right now, we sprinkle #if defined(QEMU_IMG) && defined(QEMU_NBD) all over the
code. It's ugly and causes us to have to build multiple object files for
linking against qemu and the tools.
This patch introduces a new file, qemu-tool.c which contains enough for
qemu-img, qemu-nbd, and QEMU to all share the same objects.
This also required getting qemu-nbd to be a bit more Windows friendly. I also
changed the Windows block-raw to use normal IO instead of overlapping IO since
we don't actually do AIO yet on Windows. I changed the various #if 0's to
#if WIN32_AIO to make it easier for someone to eventually fix AIO on Windows.
After this patch, there are no longer any #ifdef's related to qemu-img and
qemu-nbd.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5226 c046a42c-6fe2-441c-8c8c-71466251a162
realpath will horribly mangle a protocol so avoid calling it if the backing
file is a protocol.
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5200 c046a42c-6fe2-441c-8c8c-71466251a162
OpenBSD doesn't use AIO so don't try to build compatfd when not using AIO.
Also make sure to call qemu_aio_init() from bdrv_init. Everything that uses
bdrv calls bdrv_init so it makes sense to init aio from there instead of
in every single tool.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5197 c046a42c-6fe2-441c-8c8c-71466251a162
This patch introduces signalfd() to work around the signal/select race in
checking for AIO completions. For platforms that don't support signalfd(), we
emulate it with threads.
There was a long discussion about this approach. I don't believe there are any
fundamental problems with this approach and I believe eliminating the use of
signals is a good thing.
I've tested Windows and Linux using Windows and Linux guests. I've also checked
for disk IO performance regressions.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5187 c046a42c-6fe2-441c-8c8c-71466251a162
Right now, the Windows build is broken because of NBD. Using a mingw32 cross
compiler is also badly broken.
This patch fixes the Windows build by stubbing out NBD support until someone
fixes it for Windows. It also santizing the mingw32 cross compiler support
by replacing the --enable-mingw32 option with a compiler check to determine
if we're on windows or not.
Also remove the weird SDL pseudo-detection for mingw32 using a cross compiler.
The hardcoded sdl-config name is seemly arbitrary. If you cross compiler SDL
correctly and modify your PATH variable appropriately, it will Just Work when
cross compiling.
The audio driver detection is also broken for cross compiling so you have to
specify the audio drivers explicitly for now.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5046 c046a42c-6fe2-441c-8c8c-71466251a162
Qemu 0.9.1 and earlier does not perform range checks for block device
read or write requests, which allows guest host users with root
privileges to access arbitrary memory and escape the virtual machine.
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@4037 c046a42c-6fe2-441c-8c8c-71466251a162
Remove QEMU_TOOL. Replace with QEMU_IMG and NEED_CPU_H.
Avoid linking qemu-img against whole system emulatior.
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@3578 c046a42c-6fe2-441c-8c8c-71466251a162