I/O throttling relies on bdrv_acct_done() which is called when a request
completes. This leaves a blind spot since we only charge for completed
requests, not submitted requests.
For example, if there is 1 operation remaining in this time slice the
guest could submit 3 operations and they will all be submitted
successfully since they don't actually get accounted for until they
complete.
Originally we probably thought this is okay since the requests will be
accounted when the time slice is extended. In practice it causes
fluctuations since the guest can exceed its I/O limit and it will be
punished for this later on.
Account for I/O upon submission so that I/O limits are enforced
properly.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-By: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
After this patch, using -drive with an empty file name continues to open
the file if driver-specific options are used. If no driver-specific
options are specified, the semantics stay as it was: It defines a drive
without an inserted medium.
In order to achieve this, bdrv_open() must be made safe to work with a
NULL filename parameter. The assumption that is made is that only block
drivers which implement bdrv_parse_filename() support using driver
specific options and could therefore work without a filename. These
drivers must make sure to cope with NULL in their implementation of
.bdrv_open() (this is only NBD for now). For all other drivers, the
block layer code will make sure to error out before calling into their
code - they can't possibly work without a filename.
Now an NBD connection can be opened like this:
qemu-system-x86_64 -drive file.driver=nbd,file.port=1234,file.host=::1
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
If a driver needs structured data and not just a string, it can provide
a .bdrv_parse_filename callback now that parses the command line string
into separate options. Keeping this separate from .bdrv_open_filename
ensures that the preferred way of directly specifying the options always
works as well if parsing the string works.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
The NBD block supports an URL syntax, for which a URL parser returns
separate hostname and port fields. It also supports the traditional qemu
syntax encoded in a filename. Until now, after parsing the URL to get
each piece of information, a new string is built to be fed to socket
functions.
Instead of building a string in the URL case that is immediately parsed
again, parse the string in both cases and use the QemuOpts interface to
qemu-sockets.c.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
CoQueue uses a BH to awake coroutines that were made ready to run again
using qemu_co_queue_next() or qemu_co_queue_restart_all(). The BH
currently runs in the iothread AioContext and would break coroutines
that run in a different AioContext.
This is a slightly tricky problem because the lifetime of the BH exceeds
that of the CoQueue. This means coroutines can be awoken after CoQueue
itself has been freed. Also, there is no qemu_co_queue_destroy()
function which we could use to handle freeing resources.
Introducing qemu_co_queue_destroy() has a ripple effect of requiring us
to also add qemu_co_mutex_destroy() and qemu_co_rwlock_destroy(), as
well as updating all callers. Avoid doing that.
We also cannot switch from BH to GIdle function because aio_poll() does
not dispatch GIdle functions. (GIdle functions make memory management
slightly easier because they free themselves.)
Finally, I don't want to move unlock_queue and unlock_bh into
AioContext. That would break encapsulation - AioContext isn't supposed
to know about CoQueue.
This patch implements a different solution: each qemu_co_queue_next() or
qemu_co_queue_restart_all() call creates a new BH and list of coroutines
to wake up. Callers tend to invoke qemu_co_queue_next() and
qemu_co_queue_restart_all() occasionally after blocking I/O, so creating
a new BH for each call shouldn't be massively inefficient.
Note that this patch does not add an interface for specifying the
AioContext. That is left to future patches which will convert CoQueue,
CoMutex, and CoRwlock to expose AioContext.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Now that each AioContext has a ThreadPool and the main loop AioContext
can be fetched with bdrv_get_aio_context(), we can eliminate the concept
of a global thread pool from thread-pool.c.
The submit functions must take a ThreadPool* argument.
block/raw-posix.c and block/raw-win32.c use
aio_get_thread_pool(bdrv_get_aio_context(bs)) to fetch the main loop's
ThreadPool.
tests/test-thread-pool.c must be updated to reflect the new
thread_pool_submit() function prototypes.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
For now bdrv_get_aio_context() is just a stub that calls
qemu_aio_get_context() since the block layer is currently tied to the
main loop AioContext.
Add the stub now so that the block layer can begin accessing its
AioContext.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
This patch adds a ThreadPool to AioContext. It's possible that some
AioContext instances will never use the ThreadPool, so defer creation
until aio_get_thread_pool().
The reason why AioContext should have the ThreadPool is because the
ThreadPool is bound to a AioContext instance where the work item's
callback function is invoked. It doesn't make sense to keep the
ThreadPool pointer anywhere other than AioContext. For example,
block/raw-posix.c can get its AioContext's ThreadPool and submit work.
Special note about headers: I used struct ThreadPool in aio.h because
there is a circular dependency if aio.h includes thread-pool.h.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
ThreadPool is tied to an AioContext through its event notifier, which
dictates in which AioContext the work item's callback function will be
invoked.
In order to support multiple AioContexts we need to support multiple
ThreadPool instances.
This patch adds the new/free functions. The free function deserves
special attention because it quiesces remaining worker threads. This
requires a new condition variable and a "stopping" flag to let workers
know they should terminate once idle.
We never needed to do this before since the global threadpool was not
explicitly destroyed until process termination.
Also stash the AioContext pointer in ThreadPool so that we can call
aio_set_event_notifier() in thread_pool_free(). We didn't need to hold
onto AioContext previously since there was no free function.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
It doesn't do anything yet except storing the options QDict in the
BlockDriverState.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
It is better to present homogeneous hardware independent of the storage
technology that is chosen on the host, hence we make discard a host
parameter; the user can choose whether to pass it down to the image
format and protocol, or to ignore it.
Using DISCARD with filesystems can cause very severe fragmentation, so it
is left default-off for now. This can change later when we implement the
"anchor" operation for efficient management of preallocated files.
There is still one choice to make: whether DISCARD has an effect on the
dirty bitmap or not. I chose yes, though there is a disadvantage: if
the guest is buggy and issues discards for data that is in use, there
will be no way to migrate storage for that guest without downgrading
the machine type to an older one.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
There can be a need to turn output to stdout off. This patch adds a -q option
that enable "Quiet mode". In Quiet mode, only errors are printed out.
Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
There's no synchronous wrapper for bdrv_co_is_allocated_above function
so it's not possible to check for sector allocation in an image with
a backing file.
Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Show how many clusters are compressed. This can be used to monitor how
many compressed clusters remain and whether to recompress the image.
Suggested-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This patch adds the support for reporting the image end offset (in
bytes). This is particularly useful after a conversion (or a rebase)
where the destination is a block device in order to find the first
unused byte at the end of the image.
Signed-off-by: Federico Simoncelli <fsimonce@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
AioHandler already has a GPollFD so we can directly use its
events/revents.
Add the int pollfds_idx field to AioContext so we can map g_poll(3)
results back to AioHandlers.
Reuse aio_dispatch() to invoke handlers after g_poll(3).
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Message-id: 1361356113-11049-10-git-send-email-stefanha@redhat.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Introduce a new option "adapter_type" when converting to vmdk images.
It can be one of the following: ide (default), buslogic, lsilogic
or legacyESX (according to the vmdk spec from vmware).
In case of a non-ide adapter, heads is set to 255 instead of the 16.
The latter is used for "ide".
Also see LP#545089
Signed-off-by: Othmar Pasteka <pasteka@kabsi.at>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This makes sense when the next commit starts using the extra buffer space
to perform many I/O operations asynchronously.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The desired granularity may be very different depending on the kind of
operation (e.g. continuous replication vs. collapse-to-raw) and whether
the VM is expected to perform lots of I/O while mirroring is in progress.
Allow the user to customize it, while providing a sane default so that
in general there will be no extra allocated space in the target compared
to the source.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This is needed in the following patch.
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This actually uses the dirty bitmap in the block layer, and converts
mirroring to use an HBitmapIter.
Reviewed-by: Laszlo Ersek <lersek@redhat.com> (except block/mirror.c parts)
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
aio_poll() must return true if any work is still pending, even if it
didn't make progress, so that bdrv_drain_all() doesn't stop waiting too
early. The possibility of stopping early occasionally lead to a failed
assertion in bdrv_drain_all(), when some in-flight request was missed
and the function didn't really drain all requests.
In order to make that change, the return value as specified in the
function comment must change for blocking = false; fortunately, the
return value of blocking = false callers is only used in test cases, so
this change shouldn't cause any trouble.
Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The qiov_is_aligned() function checks whether a QEMUIOVector meets a
BlockDriverState's alignment requirements. This is needed by
virtio-blk-data-plane so:
1. Move the function from block/raw-posix.c to block/block.c.
2. Make it public in block/block.h.
3. Rename to bdrv_qiov_is_aligned().
4. Change return type from int to bool.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The raw_get_aio_fd() function allows virtio-blk-data-plane to get the
file descriptor of a raw image file with Linux AIO enabled. This
interface is really a layering violation that can be resolved once the
block layer is able to run outside the global mutex - at that point
virtio-blk-data-plane will switch from custom Linux AIO code to using
the block layer.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>