filename was still uninitialised when it's used as a parameter to a
tracing function, so let's move the initialisation. Also, commit c2ad1b0c
forgot to add a NULL check, which this patch adds while we're at it.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Tested-by: Andreas Färber <afaerber@suse.de>
Message-id: 1366645720-11384-1-git-send-email-kwolf@redhat.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
If a filename is passed in the driver-specific options from the command
line, the backing file path from the image is ignored now.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
This allows using the file.filename option instead of the string that
comes from -drive file=... and is passed around as a separate parameter.
The goal is to get rid of this parameter and use the options QDict more
consistently.
With this option you can access not only the top-level image, but
specify a filename for the backing file (currently only if no backing
file exists, but we'll allow overriding it later)
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Options starting in "backing." are passed to the backing file now. If
you don't need to specify the filename for the backing file, you can add
it on the command line instead of in the image file:
$ qemu-nbd -t /tmp/test.img
$ qemu-img create -f qcow2 empty.qcow2 1G
$ qemu-system-x86_64 -drive file=empty.qcow2,backing.file.driver=nbd,\
backing.file.host=localhost
Note that this doesn't override the backing filename from the image. If
the image has one, this will fail because NBD doesn't want the options
and a filename at the same time.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Specifying the wrong driver could fail an assertion:
$ qemu-system-x86_64 -drive file.driver=qcow2,file=x
qemu-system-x86_64: block.c:721: bdrv_open_common: Assertion `file !=
((void *)0)' failed.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Directly pass the QEMUIOVector on instead of linearising it.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The wait_time variable is in seconds. Reflect this in a comment and use
NANOSECONDS_PER_SECOND instead of BLOCK_IO_SLICE_TIME * 10 (which
happens to have the right value).
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-By: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The current slice is extended when an I/O request exceeds the limit.
There is no need to extend the slice every time we check a request.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-By: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
It is not necessary to adjust the slice time at runtime. We already
extend the current slice in order to carry over accounting into the next
slice. Changing the actual slice time value introduces oscillations.
The guest may experience large changes in throughput or IOPS from one
moment to the next when slice times are adjusted.
Reported-by: Benoît Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-By: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
I/O throttling relies on bdrv_acct_done() which is called when a request
completes. This leaves a blind spot since we only charge for completed
requests, not submitted requests.
For example, if there is 1 operation remaining in this time slice the
guest could submit 3 operations and they will all be submitted
successfully since they don't actually get accounted for until they
complete.
Originally we probably thought this is okay since the requests will be
accounted when the time slice is extended. In practice it causes
fluctuations since the guest can exceed its I/O limit and it will be
punished for this later on.
Account for I/O upon submission so that I/O limits are enforced
properly.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-By: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
bdrv_open_common() implements direct use of protocols by copying the
pre-opened BlockDriverStates to bs using bdrv_swap(). It did however
first set some fields in bs, which end up in file after the swap. When
bdrv_open() destroys file, it appears to be open, and because it isn't,
qemu could segfault while trying to close it.
Reorder the operations to return immediately in such cases so that file
is correctly detected as closed.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
After this patch, using -drive with an empty file name continues to open
the file if driver-specific options are used. If no driver-specific
options are specified, the semantics stay as it was: It defines a drive
without an inserted medium.
In order to achieve this, bdrv_open() must be made safe to work with a
NULL filename parameter. The assumption that is made is that only block
drivers which implement bdrv_parse_filename() support using driver
specific options and could therefore work without a filename. These
drivers must make sure to cope with NULL in their implementation of
.bdrv_open() (this is only NBD for now). For all other drivers, the
block layer code will make sure to error out before calling into their
code - they can't possibly work without a filename.
Now an NBD connection can be opened like this:
qemu-system-x86_64 -drive file.driver=nbd,file.port=1234,file.host=::1
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
bdrv_open() uses two different variables called options. Rename one of
them to avoid confusion and to allow the outer one to be accessed
everywhere.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
If a driver needs structured data and not just a string, it can provide
a .bdrv_parse_filename callback now that parses the command line string
into separate options. Keeping this separate from .bdrv_open_filename
ensures that the preferred way of directly specifying the options always
works as well if parsing the string works.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Specify -drive file.option=... on the command line to pass the option to
the protocol instead of the format driver.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
brdv_truncate() is also called from readv/writev commands on self-
growing file based storage. this will result in requests waiting
for theirselves to complete.
This reverts commit 9a665b2b86.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
realpath(3) is used to get an absolute path to the image file when
creating a -drive snapshot=on temporary qcow2. This does not work for
protocols since their filenames ("proto:foo:...") do not correspond to
file system paths.
Commit 7c96d46ec2 ("Let snapshot work with
protocols") skipped realpath(3) for protocols. Later on the "raw"
format was introduced and broke the check.
Use path_has_protocol(filename) to decide if this image uses a protocol
or a filename.
Reported-by: Richard Jones <rjones@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
For now bdrv_get_aio_context() is just a stub that calls
qemu_aio_get_context() since the block layer is currently tied to the
main loop AioContext.
Add the stub now so that the block layer can begin accessing its
AioContext.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
The options are passed down to the block drivers, which are supposed to
remove all options they have processed. Anything that is left over in
the end is an unknown option and results in an error.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
It doesn't do anything yet except storing the options QDict in the
BlockDriverState.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
During a commit of 'all' using the HMP non-live commit, the operation
is aborted and returns error on the first error enountered. When
non-COW drives are in use (e.g. ejected floppy, cdrom, or drives without
a backing parent), that means a commit all will return an error of either
-ENOMEDIUM or -ENOTSUP. This is not desirable, so for the 'all' commit
case, only attempt the commit if both bs->drv and bs->backing_hd are
present.
More succinctly: 'commit all' now means a commit on all COW drives.
This means an individual commit to a specific non-COW drive will still
return the appropriate error (-ENOMEDIUM if eject / not present, -ENOTSUP
if no backing file).
Reported-by: Jan Kiszka <jan.kiszka@web.de>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
It is better to present homogeneous hardware independent of the storage
technology that is chosen on the host, hence we make discard a host
parameter; the user can choose whether to pass it down to the image
format and protocol, or to ignore it.
Using DISCARD with filesystems can cause very severe fragmentation, so it
is left default-off for now. This can change later when we implement the
"anchor" operation for efficient management of preallocated files.
There is still one choice to make: whether DISCARD has an effect on the
dirty bitmap or not. I chose yes, though there is a disadvantage: if
the guest is buggy and issues discards for data that is in use, there
will be no way to migrate storage for that guest without downgrading
the machine type to an older one.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
bdrv_truncate() invalidates the bdrv_check_request() result for
in-flight requests, so there should better be none.
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Lieven <pl@kamp.de>
Reported-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
There can be a need to turn output to stdout off. This patch adds a -q option
that enable "Quiet mode". In Quiet mode, only errors are printed out.
Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
There's no synchronous wrapper for bdrv_co_is_allocated_above function
so it's not possible to check for sector allocation in an image with
a backing file.
Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
In an image chain, if the base image is smaller than the current
image, we need to make sure to use the current images count of
unallocated blocks once we get to the end of the base image. Without
this change the code will return 0 blocks when it gets to the end
of the base image and mirror_run will fail its assertion.
Signed-off-by: Vishvananda Ishaya <vishvananda@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This is needed in the following patch.
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This actually uses the dirty bitmap in the block layer, and converts
mirroring to use an HBitmapIter.
Reviewed-by: Laszlo Ersek <lersek@redhat.com> (except block/mirror.c parts)
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Note that resetting bits in the dirty bitmap is done _before_ actually
processing the request. Writes, instead, set bits after the request
is completed.
This way, when there are concurrent write and discard requests, the
outcome will always be that the blocks are marked dirty. This scenario
should never happen, but it is safer to do it this way.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
bdrv_io_limits_enable() starts a new slice, but does not set io_base
correctly for that slice.
Here is how io_base is used:
bytes_base = bs->nr_bytes[is_write] - bs->io_base.bytes[is_write];
bytes_res = (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
if (bytes_base + bytes_res <= bytes_limit) {
/* no wait */
} else {
/* operation needs to be throttled */
}
As a result, any I/O operations that are triggered between now and
bs->slice_end are incorrectly limited. If 10 MB of data has been
written since the VM was started, QEMU thinks that 10 MB of data has
been written in this slice. This leads to a I/O lockup in the guest.
We fix this by delaying the start of a new slice to the next
call of bdrv_exceed_io_limits().
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The qiov_is_aligned() function checks whether a QEMUIOVector meets a
BlockDriverState's alignment requirements. This is needed by
virtio-blk-data-plane so:
1. Move the function from block/raw-posix.c to block/block.c.
2. Make it public in block/block.h.
3. Rename to bdrv_qiov_is_aligned().
4. Change return type from int to bool.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
A blank CD or DVD is visible as a zero-sized disks. Probing such
disks will lead to an EIO and a failure to start the VM. Treating
them as raw is a better solution.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This allows removing of MinGW specific code and improves
reentrancy for POSIX hosts.
[Removed unused ret variable in qemu_get_timedate() to fix warning:
vl.c: In function ‘qemu_get_timedate’:
vl.c:451:16: error: variable ‘ret’ set but not used [-Werror=unused-but-set-variable]
-- Stefan Hajnoczi]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This makes the blkdebug suspend/resume functionality available in
qemu-io. Use it like this:
$ ./qemu-io blkdebug::/tmp/test.qcow2
qemu-io> break write_aio req_a
qemu-io> aio_write 0 4k
qemu-io> blkdebug: Suspended request 'req_a'
qemu-io> resume req_a
blkdebug: Resuming request 'req_a'
qemu-io> wrote 4096/4096 bytes at offset 0
4 KiB, 1 ops; 0:00:30.71 (133.359788 bytes/sec and 0.0326 ops/sec)
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This commit adds an Error ** argument to bdrv_img_create() and set it
appropriately on error.
Callers of bdrv_img_create() pass NULL for the new argument and still
rely on bdrv_img_create()'s return value. Next commits will change
callers to use the Error object instead.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This fixes problems that are caused by the additional open/close cycle
of the existing format probing, for example related to qemu-nbd without
-t option or file descriptor passing.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Instead of waiting for all requests to complete, wait just for the
specific request that should be cancelled.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The local string tmp_filename is passed to function get_tmp_filename
which expects a string with minimum size MAX_PATH for w32 hosts.
MAX_PATH is 260 and PATH_MAX is 259, so tmp_filename was too short.
Commit eba25057b9 introduced this
regression.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Now that AIOPool no longer keeps a freelist, it isn't really a "pool"
anymore. Rename it to AIOCBInfo and make it const since it no longer
needs to be modified.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
AIO control blocks are frequently acquired and released because each aio
request involves at least one AIOCB. Therefore, we pool them to avoid
heap allocation overhead.
The problem with the freelist approach in AIOPool is thread-safety. If
we want BlockDriverStates to associate with AioContexts that execute in
multiple threads, then a global freelist becomes a problem.
This patch drops the freelist and instead uses g_slice_alloc() which is
tuned for per-thread fixed-size object pools. qemu_aio_get() and
qemu_aio_release() are now thread-safe.
Note that the change from g_malloc0() to g_slice_alloc() should be safe
since the freelist reuse case doesn't zero the AIOCB either.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* kwolf/for-anthony: (32 commits)
osdep: Less restrictive F_SEFL in qemu_dup_flags()
qemu-iotests: add testcases for mirroring on-source-error/on-target-error
qmp: add pull_event function
mirror: add support for on-source-error/on-target-error
iostatus: forward block_job_iostatus_reset to block job
qemu-iotests: add mirroring test case
mirror: implement completion
qmp: add drive-mirror command
mirror: introduce mirror job
block: introduce BLOCK_JOB_READY event
block: add block-job-complete
block: rename block_job_complete to block_job_completed
block: export dirty bitmap information in query-block
block: introduce new dirty bitmap functionality
block: add bdrv_open_backing_file
block: add bdrv_query_stats
block: add bdrv_query_info
qemu-config: Add new -add-fd command line option
monitor: Prevent removing fd from set during init
monitor: Enable adding an inherited fd to an fd set
...
Conflicts:
vl.c
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Assert that write_compressed is never used with the dirty bitmap.
Setting the bits early is wrong, because a coroutine might concurrently
examine them and copy incomplete data from the source.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Mirroring runs without the backing file so that it can be copied outside
QEMU. However, we need to add it at the time the job is completed and
QEMU switches to the target. Factor out the common bits of opening an
image and completing a mirroring operation.
The new function does not assume that the file is closed immediately after
it returns failure, so it keeps the BDRV_O_NO_BACKING flag up-to-date.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
qmp_query_blockstat cannot have errors, remove the Error argument and
create a new public function bdrv_query_stats out of it.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Currently, bdrv_find_backing_image compares bs->backing_file with
what is passed in as a backing_file name. Mismatches may occur,
however, when bs->backing_file and backing_file are not both
absolute or relative.
Use path_combine() to make sure any relative backing filenames are
relative to the current image filename being searched, and then use
realpath() to make all comparisons based on absolute filenames.
If either backing_file or bs->backing_file is determine to be a
protocol, then no filename normalization is performed.
This also changes bdrv_find_backing_image to no longer be recursive,
but iterative.
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The first user of close notifiers will be the embedded NBD server.
It would be possible to use them to do some of the ad hoc processing
(e.g. for block jobs and I/O limits) that is currently done by
bdrv_close.
Acked-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
There is no reason in principle to skip job cancellation and draining
of pending I/O when there is no medium in the disk. Do these unconditionally,
which also prepares the code for the next patch.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Also, use PATH_MAX, rather than the arbitrary 1024.
Using PATH_MAX is more consistent with other filename-related
variables in this file, like backing_filename and tmp_filename.
Acked-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Jim Meyering <meyering@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
The following behaviors are possible:
'report': The behavior is the same as in 1.1. An I/O error,
respectively during a read or a write, will complete the job immediately
with an error code.
'ignore': An I/O error, respectively during a read or a write, will be
ignored. For streaming, the job will complete with an error and the
backing file will be left in place. For mirroring, the sector will be
marked again as dirty and re-examined later.
'stop': The job will be paused and the job iostatus will be set to
failed or nospace, while the VM will keep running. This can only be
specified if the block device has rerror=stop and werror=stop or enospc.
'enospc': Behaves as 'stop' for ENOSPC errors, 'report' for others.
In all cases, even for 'report', the I/O error is reported as a QMP
event BLOCK_JOB_ERROR, with the same arguments as BLOCK_IO_ERROR.
It is possible that while stopping the VM a BLOCK_IO_ERROR event will be
reported and will clobber the event from BLOCK_JOB_ERROR, or vice versa.
This is not really avoidable since stopping the VM completes all pending
I/O requests. In fact, it is already possible now that a series of
BLOCK_IO_ERROR events are reported with rerror=stop, because vm_stop
calls bdrv_drain_all and this can generate further errors.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Move the common part of IDE/SCSI/virtio error handling to the block
layer. The new function bdrv_error_action subsumes all three of
bdrv_emit_qmp_error_event, vm_stop, bdrv_iostatus_set_err.
The same scheme will be used for errors in block jobs.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Do this while we are touching this part of the code, before introducing
more uses of "int is_read".
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This will let block-stream reuse the enum. Places that used the enums
are renamed accordingly.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We want to remove knowledge of BLOCK_ERR_STOP_ENOSPC from drivers;
drivers should only be told whether to stop/report/ignore the error.
On the other hand, we want to keep using the nicer BlockErrorAction
name in the drivers. So rename the enums, while leaving aside the
names of the enum values for now.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This is a simple helper function, that will return the base image
of a given image chain.
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add bdrv_find_overlay(), and bdrv_drop_intermediate().
bdrv_find_overlay(): given 'bs' and the active (topmost) BDS of an image chain,
find the image that is the immediate top of 'bs'
bdrv_drop_intermediate():
Given 3 BDS (active, top, base), drop images above
base up to and including top, and set base to be the
backing file of top's overlay node.
E.g., this converts:
bottom <- base <- intermediate <- top <- active
to
bottom <- base <- active
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The keep_read_only flag is no longer used, in favor of the bdrv
flag BDRV_O_ALLOW_RDWR.
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Currently, bdrv_commit() reopens images r/w itself, via risky
_delete() and _open() calls. Use the new safe method for drive reopen.
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This is based on Supriya Kannery's bdrv_reopen() patch series.
This provides a transactional method to reopen multiple
images files safely.
Image files are queue for reopen via bdrv_reopen_queue(), and the
reopen occurs when bdrv_reopen_multiple() is called. Changes are
staged in bdrv_reopen_prepare() and in the equivalent driver level
functions. If any of the staged images fails a prepare, then all
of the images left untouched, and the staged changes for each image
abandoned.
Block drivers are passed a reopen state structure, that contains:
* BDS to reopen
* flags for the reopen
* opaque pointer for any driver-specific data that needs to be
persistent from _prepare to _commit/_abort
* reopen queue pointer, if the driver needs to queue additional
BDS for a reopen
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
bdrv_set_enable_write_cache() sets the bs->enable_write_cache flag,
but without the flag recorded in bs->open_flags, then next time
a reopen() is performed the enable_write_cache setting may be
inadvertently lost.
This will set the flag in open_flags, so it is preserved across
reopens.
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
I believe the bs->keep_read_only flag is supposed to reflect
the initial open state of the device. If the device is initially
opened R/O, then commit operations, or reopen operations changing
to R/W, are prohibited.
Currently, the keep_read_only flag is only accurate for the active
layer, and its backing file. Subsequent images end up always having
the keep_read_only flag set.
For instance, what happens now:
[ base ] kro = 1, ro = 1
|
v
[ snap-1 ] kro = 1, ro = 1
|
v
[ snap-2 ] kro = 0, ro = 1
|
v
[ active ] kro = 0, ro = 0
What we want:
[ base ] kro = 0, ro = 1
|
v
[ snap-1 ] kro = 0, ro = 1
|
v
[ snap-2 ] kro = 0, ro = 1
|
v
[ active ] kro = 0, ro = 0
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The caller would not delete temporary file after failed get_tmp_filename().
Signed-off-by: Dunrong Huang <riegamaths@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The tray status should change also if you eject empty block device.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Commit 29cdb251 already added a comment that no unnecessary flushes to
disk will occur, this patch makes the code even get to the point of the
comment. This is mostly theoretical because in practice we only stack
one format on top of one protocol, the former implementing flush_to_os
and the latter only flush_to_disk. It starts to matter when drivers that
are not on top implement flush_to_os.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Use the dedicated counting function in qmp_query_block in order to
propagate the backing file depth to HMP and add backing_file_depth
to qmp-commands.hx
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Create bdrv_get_backing_file_depth() in order to be able to show
in QMP and HMP how many ancestors backing an image a block device
have.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
There are two producers of these hints: drive_init() on behalf of
-drive, and hd_geometry_guess().
The only consumer of the hint is hd_geometry_guess().
The callers of hd_geometry_guess() call it only when drive_init()
didn't set the hints. Therefore, drive_init()'s hints are never used.
Thus, hd_geometry_guess() only ever sees hints it produced itself in a
prior call. Only the first call computes something, subsequent calls
just repeat the first call's results. However, hd_geometry_guess() is
never called more than once: the device models don't, and the block
device is destroyed on unplug. Thus, dropping the repeat feature
doesn't break anything now.
If a block device wasn't destroyed on unplug and could be reused with
a new device, then repeating old results would be wrong. Thus,
dropping the repeat feature prevents future breakage.
This renders the hints unused. Purge them from the block layer.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Commit f3d54fc4 factored it out of hw/ide.c for reuse. Sensible,
except it was put into block.c. Device-specific functionality should
be kept in device code, not the block layer. Move it to
hw/hd-geometry.c, and make stylistic changes required to keep
checkpatch.pl happy.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Commit 5bbdbb46 moved it to block.c because "other geometry guessing
functions already reside in block.c". Device-specific functionality
should be kept in device code, not the block layer. Move it back.
Disk geometry guessing is still in block.c. To be moved out in a
later patch series.
Bonus: the floppy type used in pc_cmos_init() now obviously matches
the one in the FDrive. Before, we relied on
bdrv_get_floppy_geometry_hint() picking the same type both in
fd_revalidate() and in pc_cmos_init().
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* mjt/mjt-iov2:
rewrite iov_send_recv() and move it to iov.c
cleanup qemu_co_sendv(), qemu_co_recvv() and friends
export iov_send_recv() and use it in iov_send() and iov_recv()
rename qemu_sendv to iov_send, change proto and move declarations to iov.h
change qemu_iovec_to_buf() to match other to,from_buf functions
consolidate qemu_iovec_copy() and qemu_iovec_concat() and make them consistent
allow qemu_iovec_from_buffer() to specify offset from which to start copying
consolidate qemu_iovec_memset{,_skip}() into single function and use existing iov_memset()
rewrite iov_* functions
change iov_* function prototypes to be more appropriate
virtio-serial-bus: use correct lengths in control_out() message
Conflicts:
tests/Makefile
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
To prepare move of guess_disk_lchs() into hw/, where it poking
BlockDriverState member io_limits_enabled directly would be unclean.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
bdrv_get_floppy_geometry_hint() fails to store through its parameter
drive when bs has a geometry hint. Makes fd_revalidate() assign
random crap to drv->drive.
Has been broken that way for ages. Harmless, because:
* The only way to set a geometry hint is -drive if=none,cyls=...
Since commit c219331e, probably unintentional.
* The only use of drv->drive is as argument to another
bdrv_get_floppy_geometry_hint(). Which doesn't use it, since the
geometry hint is still there.
Drop the broken code, ignore -drive parameter cyls, heads and secs for
floppies even with if=none, just like before commit c219331e. Matches
-help, which explains cyls, heads, secs as "hard disk physical
geometry".
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The new function can be made a bit nicer than bdrv_append. It swaps the
whole contents, and then swaps back (using the usual t=a;a=b;b=t idiom)
the fields that need to stay on top. Thus, it does not need explicit
bdrv_detach_dev, bdrv_iostatus_disable, etc.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
While these should not be in use at the time a transaction is started,
a command in the prepare phase of a transaction might have added them,
so they need to be brought over.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
So callers don't need to know anything about maximum name length.
Returning a pointer is safe, because the name string lives as long as
the block driver it names, and block drivers don't die.
Requested by Peter Maydell.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Formats are entirely in charge of flushes for metadata writes. For
guest-initiated writes, a writethrough cache is faked in the block layer.
So we can always open in writeback mode.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Because the guest will be able to flip enable_write_cache, the actual
state may not match what is used to open the new snapshot.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We want to make the formats handle their own flushes
autonomously, while keeping for guests the ability to use a writethrough
cache. Since formats will write metadata via bs->file, bdrv_co_do_writev
is the only place where we need to add a flush.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The QED block driver already provides the functionality to not only
detect inconsistencies in images, but also fix them. However, this
functionality cannot be manually invoked with qemu-img, but the
check happens only automatically during bdrv_open().
This adds a -r switch to qemu-img check that allows manual invocation
of an image repair.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>