This patch implements all ioctls currently implemented by device mapper,
enabling us to run dmsetup and kpartx inside of linux-user.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
TaskState contains linux_bprm struct which encapsulates argv among
other things.
argv might be used around the code and is expected to contain valid
data. Before this patch, ts->bprm->argv was NULL due to it being
freed right after loader_exec().
Signed-off-by: Fabio Erculiani <lxnay@sabayon.org>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
With the current fake /proc/self/stat implementation `ps` is
segfaulting because it expects to read PID and argv[0] as first and
second field respectively, with the latter being enclosed between
backets.
Reproducing is as easy as running: `ps` inside qemu-user chroot
with /proc mounted.
Signed-off-by: Fabio Erculiani <lxnay@sabayon.org>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
From original commit with Patchwork-id: 31108 by
Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
"The QED image format includes a file header bit to mark images dirty.
QED normally checks dirty images on open and fixes inconsistent
metadata. This is undesirable during live migration since the dirty bit
may be set if the source host is modifying the image file. The check
should be postponed until migration completes.
Skip operations that modify the image file if the BDRV_O_INCOMING flag
is set."
Signed-off-by: Benoit Canet <benoit.canet@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The QED image is reopened to flush metadata and check consistency.
Signed-off-by: Benoit Canet <benoit.canet@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Open images with BDRV_O_INCOMING in order to inform block drivers
that an incoming live migration is coming.
Signed-off-by: Benoit Canet <benoit.canet@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This function will clear all BDRV_O_INCOMING flags.
Signed-off-by: Benoit Canet <benoit.canet@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
From original patch with Patchwork-id: 31110 by
Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
"Add a flag to indicate that incoming migration is pending and care needs
to be taken for data consistency. Block drivers should not modify the
image file before incoming migration is complete since the migration
source host is still using the image file."
The rationale for not using bdrv->read_only is the following.
"Unfortunately this is not possible because too many other places in QEMU
test bdrv_is_read_only() and use it for their own evil purposes. For
example, ide_init_drive() will error out because read-only harddisks are
not supported. We're mixing guest and host side read-only concepts so
this simpler alternative does not work."
Signed-off-by: Benoit Canet <benoit.canet@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Close the now unused images that were part of the previous backing file
chain and adjust ->backing_hd, backing_filename and backing_format
properly.
Fixes https://bugzilla.redhat.com/show_bug.cgi?id=801449
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
qemu-io requires options first, then fixed parameters.
GNU getopt also allows options at the end, but POSIX getopt
doesn't. Try "export POSIXLY_CORRECT=y" to get the POSIX
behaviour with GNU getopt, too.
Cc: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
qemu-img requires first options, then file name, then size.
GNU getopt also allows options at the end, but POSIX getopt
doesn't. Try "export POSIXLY_CORRECT=y" to get the POSIX
behaviour with GNU getopt, too.
Cc: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The immportant thing here is that header extensions don't get silently
dropped when the header is rewritten, e.g. during a rebase.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This adds a tool that is meant to inspect and edit qcow2 files in a
low-level way, that wouldn't be possible with qemu-img/io, for example
by adding yet unknown extensions or flags. This way we can test whether
qemu deals properly with future backwards compatible extensions.
For now, let's start with the image header and header extensions.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We should return if reading of the header fails.
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
Acked-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Flush operation is supposed to flush the write-back cache of
sheepdog cluster.
By issuing flush operation, we can assure the Guest of data
reaching the sheepdog cluster storage.
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
A few fixups for bdrv_append():
The new bs (bs_new) passed into bdrv_append() should be anonymous. Rather
than call bdrv_make_anon() to enforce this, use an assert to catch when a caller
is passing in a bs_new that is not anonymous.
Also, the new top layer should have its backing_format reflect the original
top's format.
And last, after the swap of bs contents, the device_name will have been copied
down. This needs to be cleared to reflect the anonymity of the bs that was
pushed down.
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Some block drivers can verify their image files are clean or not. So we can show
it while using "qemu-img info".
Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Discussion can be found at:
http://patchwork.ozlabs.org/patch/128730/
This patch add image fragmentation statistics while using qemu-img check.
Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
I am not sure that these are really proper GtkDoc, but they follow
the existing documentation in block_int.h.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
There is no need to do this in every implementation of set_speed
(even though there is only one right now).
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Streaming can issue I/O while qcow2_close is running. This causes the
L2 caches to become very confused or, alternatively, could cause a
segfault when the streaming coroutine is reentered after closing its
block device. The fix is to cancel streaming jobs when closing their
underlying device.
The cancellation must be synchronous, on the other hand qemu_aio_wait
will not restart a coroutine that is sleeping in co_sleep. So add
a flag saying whether streaming has in-flight I/O. If the busy flag
is false, the coroutine is quiescent and, when cancelled, will not
issue any new I/O.
This protects streaming against closing, but not against deleting.
We have a reference count protecting us against concurrent deletion,
but I still added an assertion to ensure nothing bad happens.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We do not want jobs to keep a device busy for a possibly very long
time, and management could become confused because they thought a
device was not even there anymore. So, cancel long-running jobs
as soon as their device is going to disappear.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Currently dma_bdrv_io() takes a 'to_dev' boolean parameter to
determine the direction of DMA it is emulating. We already have a
DMADirection enum designed specifically to encode DMA directions.
This patch uses it for dma_bdrv_io() as well. This involves removing
the DMADirection definition from the #ifdef it was inside, but since that
only existed to protect the definition of dma_addr_t from places where
config.h is not included, there wasn't any reason for it to be there in
the first place.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Finally reindent all code and change goto statements to a loop.
Acked-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reads and writes to the underlying file can also occur with the simple
non-vectored I/O interfaces.
Acked-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
vdi.c really works as if it implemented bdrv_read and bdrv_write. However,
because only vector I/O is supported by the asynchronous callbacks, it
went through extra pain to bounce-buffer the I/O. This can be handled
by the block layer now that the format is coroutine-based.
Acked-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Most of the AIOCB really holds local variables that need to persist
across callback invocation. It can go away now.
Acked-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Now inline the former AIO callbacks into vdi_co_readv and vdi_co_writev.
While many cleanups are possible, the code now really looks synchronous.
Acked-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The next step is to take code that only triggers after the first operation,
and move it at the end of vdi_aio_read_cb and vdi_aio_write_cb.
Acked-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Even a basic conversion changing the bdrv_aio_readv/bdrv_aio_writev calls
to bdrv_co_readv/bdrv_co_writev, and callbacks to goto statements can
eliminate a lot of code. This is because error handling is simplified
and indirections through bottom halves can go away.
After this patch, I/O to the underlying file already happens via
coroutines, but the code still looks a lot like if asynchronous I/O was
being used.
Acked-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Nicolae Mogoreanu <mogo@google.com> noticed that I/O requests can lead
to QEMU crashes when the logical_block_size property is smaller than 512
bytes.
Using the new "blocksize" property we can properly enforce constraints
on the block size such that QEMU's block layer is able to operate
correctly.
Reported-by: Nicolae Mogoreanu <mogo@google.com>
Reported-by: Michael Halcrow <mhalcrow@google.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Storage interfaces like virtio-blk can be configured with block size
information so that the guest can take advantage of efficient I/O
request sizes.
According to the SCSI Block Commands (SBC) standard a device's block
size is "almost always greater than one byte and may be a multiple of
512 bytes". QEMU currently has a 512 byte minimum block size because
the block layer functions work at that granularity. Furthermore, the
block size should be a power of 2 because QEMU calculates bitmasks from
the value.
Introduce a "blocksize" property type so devices can enforce these
constraints on block size values. If the constraints are relaxed in the
future then this property can be updated.
Introduce the new PropertyValueNotPowerOf2 QError so QMP clients know
exactly why a block size value was rejected.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Fix a typo in the description for QERR_PROPERTY_VALUE_OUT_OF_RANGE where
"'" was used instead of ")".
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
After validation check, the 'checksum' is not written back
to footer, which leave it with zero.
This results in errors while loadding it under Microsoft's
Hyper-V environment, and also errors from utilities like
Citrix's vhd-util.
Signed-off-by: Zhang Shengju <sean_zhang@trendmicro.com.cn>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Allow the user to specify a disk's World Wide Name.
Linux guests can address disks by their unique World Wide Name number
(e.g. /dev/disk/by-id/wwn-0x5001517959123522). This patch adds support
for assigning a World Wide Name number to a virtual IDE disk.
Cc: kwolf@redhat.com
Signed-off-by: Floris Bos <dev@noc-ps.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
strncpy may not null-terminate the destination string.
Cc: kwolf@redhat.com
Signed-off-by: Floris Bos <dev@noc-ps.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Allow the user to override the default disk model name "QEMU HARDDISK".
Some Linux distributions use the /dev/disk/by-id/scsi-SATA_name-of-disk-
model_serial addressing scheme when refering to partitions in /etc/fstab
and elsewhere. This causes problems when starting a disk image taken from
an existing physical server under qemu, because when running under qemu
name-of-disk-model is always "QEMU HARDDISK".
This patch introduces a model=s option which in combination with the
existing serial=s option can be used to fake the disk the operating
system was previously on, allowing the OS to boot properly.
Cc: kwolf@redhat.com
Signed-off-by: Floris Bos <dev@noc-ps.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
And remove several block_int.h inclusions that should not be there.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
It can be useful to enable QEMU tracing when trying out block layer
interfaces via qemu-io. Tracing can be enabled using the new -T FILE
option where the given file contains a list of trace events to enable
(just like the qemu --trace events=FILE option).
$ echo qemu_vfree >my-events
$ ./qemu-io -T my-events ...
Remember to use ./configure --enable-trace-backend=BACKEND when building
qemu-io.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Since everything goes through the cache, callers don't use the L2 table
offset any more.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
It has happened more than once that patches that look perfectly sane
and work with simpletrace broke systemtap because they use 'next' as an
argument name for a tracing function. However, 'next' is a keyword for
systemtap, so we shouldn't use it.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch resolves a bug in memory listener registration.
"range_add" callback was called on each section of the both
address space (IO and memory space) even if it doesn't match
the address space filter.
Signed-off-by: Julien Grall <julien.grall@citrix.com>
Signed-off-by: Avi Kivity <avi@redhat.com>