Commit Graph

58 Commits

Author SHA1 Message Date
Paolo Bonzini 6b98bd6495 block: plug whole tree at once, introduce bdrv_io_unplugged_begin/end
Extract the handling of io_plug "depth" from linux-aio.c and let the
main bdrv_drain loop do nothing but wait on I/O.

Like the two newly introduced functions, bdrv_io_plug and bdrv_io_unplug
now operate on all children.  The visit order is now symmetrical between
plug and unplug, making it possible for formats to implement plug/unplug.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:07 +02:00
Paolo Bonzini ce0f141259 block: introduce bdrv_no_throttling_begin/end
Extract the handling of throttling from bdrv_flush_io_queue.  These
new functions will soon become BdrvChildRole callbacks, as they can
be generalized to "beginning of drain" and "end of drain".

Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:07 +02:00
Paolo Bonzini b6e84c97ed block: extract bdrv_drain_poll/bdrv_co_yield_to_drain from bdrv_drain/bdrv_co_drain
Do not call bdrv_drain_recurse twice in bdrv_co_drain.  A small
tweak to the logic in Fam's patch, which is harmless since no
one implements bdrv_drain anyway.  But better get it right.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:07 +02:00
Paolo Bonzini a72f641407 block: move restarting of throttled reqs to block/throttle-groups.c
We want to remove throttled_reqs from block/io.c.  This is the easy
part---hide the handling of throttled_reqs during disable/enable of
throttling within throttle-groups.c.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:07 +02:00
Paolo Bonzini 733bbc8cea block: make bdrv_start_throttled_reqs return void
The return value is unused and I am not sure why it would be useful.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:07 +02:00
Kevin Wolf 90c78624f1 block: Don't disable I/O throttling on sync requests
We had to disable I/O throttling with synchronous requests because we
didn't use to run timers in nested event loops when the code was
introduced. This isn't true any more, and throttling works just fine
even when using the synchronous API.

The removed code is in fact dead code since commit a8823a3b ('block: Use
blk_co_pwritev() for blk_write()') because I/O throttling can only be
set on the top layer, but BlockBackend always uses the coroutine
interface now instead of using the sync API emulation in block.c.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <1458660792-3035-2-git-send-email-kwolf@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:07 +02:00
Fam Zheng a77fd4bb29 block: Fix bdrv_drain in coroutine
Using the nested aio_poll() in coroutine is a bad idea. This patch
replaces the aio_poll loop in bdrv_drain with a BH, if called in
coroutine.

For example, the bdrv_drain() in mirror.c can hang when a guest issued
request is pending on it in qemu_co_mutex_lock().

Mirror coroutine in this case has just finished a request, and the block
job is about to complete. It calls bdrv_drain() which waits for the
other coroutine to complete. The other coroutine is a scsi-disk request.
The deadlock happens when the latter is in turn pending on the former to
yield/terminate, in qemu_co_mutex_lock(). The state flow is as below
(assuming a qcow2 image):

  mirror coroutine               scsi-disk coroutine
  -------------------------------------------------------------
  do last write

    qcow2:qemu_co_mutex_lock()
    ...
                                 scsi disk read

                                   tracked request begin

                                   qcow2:qemu_co_mutex_lock.enter

    qcow2:qemu_co_mutex_unlock()

  bdrv_drain
    while (has tracked request)
      aio_poll()

In the scsi-disk coroutine, the qemu_co_mutex_lock() will never return
because the mirror coroutine is blocked in the aio_poll(blocking=true).

With this patch, the added qemu_coroutine_yield() allows the scsi-disk
coroutine to make progress as expected:

  mirror coroutine               scsi-disk coroutine
  -------------------------------------------------------------
  do last write

    qcow2:qemu_co_mutex_lock()
    ...
                                 scsi disk read

                                   tracked request begin

                                   qcow2:qemu_co_mutex_lock.enter

    qcow2:qemu_co_mutex_unlock()

  bdrv_drain.enter
>   schedule BH
>   qemu_coroutine_yield()
>                                  qcow2:qemu_co_mutex_lock.return
>                                  ...
                                   tracked request end
    ...
    (resumed from BH callback)
  bdrv_drain.return
  ...

Reported-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1459855253-5378-2-git-send-email-famz@redhat.com
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-04-11 16:59:09 +01:00
Kevin Wolf 93f5e6d88a block: Introduce bdrv_co_writev_flags()
This function will allow drivers to implement BDRV_REQ_FUA natively
instead of sending a separate flush after the write.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30 12:16:02 +02:00
Kevin Wolf bfd18d1e0b block: Move enable_write_cache to BB level
Whether a write cache is used or not is a decision that concerns the
user (e.g. the guest device) rather than the backend. It was already
logically part of the BB level as bdrv_move_feature_fields() always kept
it on top of the BDS tree; with this patch, the core of it (the actual
flag and the additional flushes) is also implemented there.

Direct callers of bdrv_open() must pass BDRV_O_CACHE_WB now if bs
doesn't have a BlockBackend attached.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30 12:16:02 +02:00
Kevin Wolf 855a6a93a1 block: Handle flush error in bdrv_pwrite_sync()
We don't want to silently ignore a flush error.

Also, there is little point in avoiding the flush for writethrough modes
and once WCE is moved to the BB layer, we definitely need the flush here
because bdrv_pwrite() won't involve one any more.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30 12:16:01 +02:00
Pavel Dovgalyuk c32b82afaf block: add flush callback
This patch adds callback for flush request. This callback is responsible
for flushing whole block devices stack. bdrv_flush function does not
proceed to underlying devices. It should be performed by this callback
function, if needed.

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-30 12:12:15 +02:00
Daniel P. Berrange abb06c5ac1 block: add flag to indicate that no I/O will be performed
When opening an image it is useful to know whether the caller
intends to perform I/O on the image or not. In the case of
encrypted images this will allow the block driver to avoid
having to prompt for decryption keys when we merely want to
query header metadata about the image. eg qemu-img info

This flag is enforced at the top level only, since even if
we don't want todo I/O on the 'qcow2' file payload, the
underlying 'file' driver will still need todo I/O to read
the qcow2 header, for example.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-30 11:59:32 +02:00
Veronia Bahaa f348b6d1a5 util: move declarations out of qemu-common.h
Move declarations out of qemu-common.h for functions declared in
utils/ files: e.g. include/qemu/path.h for utils/path.c.
Move inline functions out of qemu-common.h and into new files (e.g.
include/qemu/bcd.h)

Signed-off-by: Veronia Bahaa <veroniabahaa@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 22:20:17 +01:00
Markus Armbruster da34e65cb4 include/qemu/osdep.h: Don't include qapi/error.h
Commit 57cb38b included qapi/error.h into qemu/osdep.h to get the
Error typedef.  Since then, we've moved to include qemu/osdep.h
everywhere.  Its file comment explains: "To avoid getting into
possible circular include dependencies, this file should not include
any other QEMU headers, with the exceptions of config-host.h,
compiler.h, os-posix.h and os-win32.h, all of which are doing a
similar job to this file and are under similar constraints."
qapi/error.h doesn't do a similar job, and it doesn't adhere to
similar constraints: it includes qapi-types.h.  That's in excess of
100KiB of crap most .c files don't actually need.

Add the typedef to qemu/typedefs.h, and include that instead of
qapi/error.h.  Include qapi/error.h in .c files that need it and don't
get it now.  Include qapi-types.h in qom/object.h for uint16List.

Update scripts/clean-includes accordingly.  Update it further to match
reality: replace config.h by config-target.h, add sysemu/os-posix.h,
sysemu/os-win32.h.  Update the list of includes in the qemu/osdep.h
comment quoted above similarly.

This reduces the number of objects depending on qapi/error.h from "all
of them" to less than a third.  Unfortunately, the number depending on
qapi-types.h shrinks only a little.  More work is needed for that one.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
[Fix compilation without the spice devel packages. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 22:20:15 +01:00
Kevin Wolf 5bd5119667 block: Pull up blk_read_unthrottled() implementation
Use blk_read(), so that it goes through blk_co_preadv() like all read
requests from the BB to the BDS.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:57 +01:00
Kevin Wolf a8823a3bfd block: Use blk_co_pwritev() for blk_write()
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:57 +01:00
Kevin Wolf 1bf1cbc91f block: Use blk_co_preadv() for blk_read()
This patch introduces blk_co_preadv() as a central function on the
BlockBackend level that is supposed to handle all read requests from the
BB to its root BDS eventually.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:57 +01:00
Max Reitz fe1a9cbc33 block: Move some bdrv_*_all() functions to BB
Move bdrv_commit_all() and bdrv_flush_all() to the BlockBackend level.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:56 +01:00
Paolo Bonzini 9dcf8ecd9e block: add missing call to bdrv_drain_recurse
This is also needed in bdrv_drain_all, not just in bdrv_drain.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1450867706-19860-3-git-send-email-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-02-09 13:52:26 +00:00
Fam Zheng ac987b30d0 block: Use returned *file in bdrv_co_get_block_status
Now that all drivers return the right "file" pointer, we can use it.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1453780743-16806-14-git-send-email-famz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-02-02 17:50:47 +01:00
Fam Zheng 67a0fd2a9b block: Add "file" output parameter to block status query functions
The added parameter can be used to return the BDS pointer which the
valid offset is referring to. Its value should be ignored unless
BDRV_BLOCK_OFFSET_VALID in ret is set.

Until block drivers fill in the right value, let's clear it explicitly
right before calling .bdrv_get_block_status.

The "bs->file" condition in bdrv_co_get_block_status is kept now to keep iotest
case 102 passing, and will be fixed once all drivers return the right file
pointer.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1453780743-16806-2-git-send-email-famz@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-02-02 17:50:47 +01:00
Kevin Wolf 04c01a5c8f block: Rename BDRV_O_INCOMING to BDRV_O_INACTIVE
Instead of covering only the state of images on the migration
destination before the migration is completed, the flag will also cover
the state of images on the migration source after completion. This
common state implies that the image is technically still open, but no
writes will happen and any cached contents will be reloaded from disk if
and when the image leaves this state.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-01-20 13:36:23 +01:00
Kevin Wolf 09e0c771e4 block: Assert no write requests under BDRV_O_INCOMING
As long as BDRV_O_INCOMING is set, the image file is only opened so we
have a file descriptor for it. We're definitely not supposed to modify
the image, it's still owned by the migration source.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-01-20 13:36:23 +01:00
Peter Maydell 80c71a241a block: Clean up includes
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.

This commit was created with scripts/clean-includes.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-01-20 13:36:23 +01:00
Stefan Hajnoczi 222565f65c block: replace IOV_MAX with BlockLimits.max_iov
Request merging must not result in a huge request that exceeds the
maximum number of iovec elements.  Use BlockLimits.max_iov instead of
hardcoding IOV_MAX.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-12-22 16:01:07 +08:00
Stefan Hajnoczi bd44feb754 block: add BlockLimits.max_iov field
The maximum number of struct iovec elements depends on the
BlockDriverState.  The raw-posix and iSCSI protocols have a maximum of
IOV_MAX but others could have different values.

Cc: Peter Lieven <pl@kamp.de>
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-12-22 16:01:07 +08:00
Paolo Bonzini ba88944495 block: fix bdrv_ioctl called from coroutine
When called from a coroutine, bdrv_ioctl must be asynchronous just like
e.g. bdrv_flush.  The code was incorrectly making it synchronous, fix
it.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-12-18 14:34:44 +01:00
Fam Zheng 61408b250e block: Don't wait serialising for non-COR read requests
The assertion problem was noticed in 06c3916b35, but it wasn't
completely fixed, because even though the req is not marked as
serialising, it still gets serialised by wait_serialising_requests
against other serialising requests, which could lead to the same
assertion failure.

Fix it by even more explicitly skipping the serialising for this
specific case.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1448962590-2842-2-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-12-03 11:08:07 +08:00
Fam Zheng 67da1dc5ce block: Introduce BlockDriver.bdrv_drain callback
Drivers can have internal request sources that generate IO, like the
need_check_timer in QED. Since we want quiesced periods that contain
nested event loops in block layer, we need to have a way to disable such
event sources.

Block drivers must implement the "bdrv_drain" callback if it has any
internal sources that can generate I/O activity, like a timer or a
worker thread (even in a library) that can schedule QEMUBH in an
asynchronous callback.

Update the comments of bdrv_drain and bdrv_drained_begin accordingly.

Like bdrv_requests_pending(), we should consider all the children of bs.
Before, the while loop just works, as bdrv_requests_pending() already
tracks its children; now we mustn't miss the callback, so recurse down
explicitly.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1447064214-29930-9-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:43 +01:00
Fam Zheng 5c5ae76acb block: Emulate bdrv_ioctl with bdrv_aio_ioctl and track both
Currently all drivers that support .bdrv_aio_ioctl also implement
.bdrv_ioctl redundantly.  To track ioctl requests in block layer it is
easier if we unify the two paths, because we'll need to run it in a
coroutine, as required by tracked_request_begin. While we're at it, use
.bdrv_aio_ioctl plus aio_poll() to emulate bdrv_ioctl().

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1447064214-29930-7-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:43 +01:00
Fam Zheng b1066c8755 block: Track discard requests
Both bdrv_discard and bdrv_aio_discard will call into bdrv_co_discard,
so add tracked_request_begin/end calls around the loop.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1447064214-29930-4-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:42 +01:00
Fam Zheng cdb5e3155e block: Track flush requests
Both bdrv_flush and bdrv_aio_flush eventually call bdrv_co_flush, add
tracked_request_begin and tracked_request_end pair in that function so
that all flush requests are now tracked.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1447064214-29930-3-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:42 +01:00
Fam Zheng ebde595ce6 block: Add more types for tracked request
We'll track more request types besides read and write, change the
boolean field to an enum.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1447064214-29930-2-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:08 +01:00
Kevin Wolf 37a639a7fb block: Consider all child nodes in bdrv_requests_pending()
The function manually recursed into bs->file and bs->backing to check
whether there were any requests pending, but it ignored other children.

There's no need to special case file and backing here, so just replace
these two explicit recursions by a loop recursing for all child nodes.

Reported-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 1446029211-27148-1-git-send-email-kwolf@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-10-29 17:59:27 +00:00
Fam Zheng 51288d7917 block: Introduce "drained begin/end" API
The semantics is that after bdrv_drained_begin(bs), bs will not get new external
requests until the matching bdrv_drained_end(bs).

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23 18:18:24 +02:00
Max Reitz 7f0e9da6f1 block: Move BlockAcctStats into BlockBackend
As the comment above bdrv_get_stats() says, BlockAcctStats is something
which belongs to the device instead of each BlockDriverState. This patch
therefore moves it into the BlockBackend.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23 18:18:23 +02:00
Max Reitz 53d8f9d8fb block: Remove wr_highest_sector from BlockAcctStats
BlockAcctStats contains statistics about the data transferred from and
to the device; wr_highest_sector does not fit in with the rest.

Furthermore, those statistics are supposed to be specific for a certain
device and not necessarily for a BDS (see the comment above
bdrv_get_stats()); on the other hand, wr_highest_sector may be a rather
important information to know for each BDS. When BlockAcctStats is
finally removed from the BDS, we will want to keep wr_highest_sector in
the BDS.

Finally, wr_highest_sector is renamed to wr_highest_offset and given the
appropriate meaning. Externally, it is represented as an offset so there
is no point in doing something different internally. Its definition is
changed to match that in qapi/block-core.json which is "the offset after
the greatest byte written to". Doing so should not cause any harm since
if external programs tried to calculate the volume usage by
(wr_highest_offset + 512) / volume_size, after this patch they will just
assume the volume to be full slightly earlier than before.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23 18:18:23 +02:00
Kevin Wolf 439db28cf9 block/io: Make bdrv_requests_pending() public
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-10-16 15:34:29 +02:00
Kevin Wolf 760e006384 block: Convert bs->backing_hd to BdrvChild
This is the final step in converting all of the BlockDriverState
pointers that block drivers use to BdrvChild.

After this patch, bs->children contains the full list of child nodes
that are referenced by a given BDS, and these children are only
referenced through BdrvChild, so that updating the pointer in there is
enough for changing edges in the graph.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-10-16 15:34:29 +02:00
Kevin Wolf 9a4f4c3156 block: Convert bs->file to BdrvChild
This patch removes the temporary duplication between bs->file and
bs->file_child by converting everything to BdrvChild.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-10-16 15:34:29 +02:00
Paolo Bonzini c84b31926f block: switch from g_slice allocator to malloc
Simplify memory allocation by sticking with a single API.  GSlice
is not that fast anyway (tcmalloc/jemalloc are better).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-10-12 11:17:45 +01:00
Wen Congyang 9568b511c9 block: Introduce a new API bdrv_co_no_copy_on_readv()
In some cases, we need to disable copy-on-read, and just
read the data.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Message-id: 1441682913-14320-2-git-send-email-wency@cn.fujitsu.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2015-09-25 08:37:07 -04:00
Stefan Hajnoczi 7a63f3cdc4 block: update bdrv_drain_all()/bdrv_drain() comments
The doc comments for bdrv_drain_all() and bdrv_drain() are outdated:

 * The bdrv_drain() comment is a poor man's bdrv_lock()/bdrv_unlock()
   which Fam Zheng is currently developing.  Unfortunately this warning
   was never really enough because devices keep submitting I/O and op
   blockers don't prevent that.

 * The bdrv_drain_all() comment is still partially correct but reflects
   the nature of the implementation rather than API documentation.

Do make it clear that bdrv_drain() is only appropriate within an
AioContext.  For anything spanning AioContexts you need
bdrv_drain_all().

Cc: Markus Armbruster <armbru@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1435854281-6078-1-git-send-email-stefanha@redhat.com
2015-07-07 10:31:08 +01:00
Alberto Garcia 764ba3ae51 block: remove redundant check before g_slist_find()
An empty GSList is represented by a NULL pointer, therefore it's a
perfectly valid argument for g_slist_find() and there's no need to
make any additional check.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 1435583533-5758-1-git-send-email-berto@igalia.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-07-02 10:06:23 +01:00
Fam Zheng 508249952c block: Fix dirty bitmap in bdrv_co_discard
Unsetting dirty globally with discard is not very correct. The discard may zero
out sectors (depending on can_write_zeroes_with_unmap), we should replicate
this change to destination side to make sure that the guest sees the same data.

Calling bdrv_reset_dirty also troubles mirror job because the hbitmap iterator
doesn't expect unsetting of bits after current position.

So let's do it the opposite way which fixes both problems: set the dirty bits
if we are to discard it.

Reported-by: wangxiaolong@ucloud.cn
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-07-02 10:06:23 +01:00
Fam Zheng ba3f0e2545 block: Add bdrv_get_block_status_above
Like bdrv_is_allocated_above, this function follows the backing chain until seeing
BDRV_BLOCK_ALLOCATED.  Base is not included.

Reimplement bdrv_is_allocated on top.

[Initialized bdrv_co_get_block_status_above() ret to 0 to silence
mingw64 compiler warning about the unitialized variable.  assert(bs !=
base) prevents that case but I suppose the program could be compiled
with -DNDEBUG.
--Stefan]

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-07-02 10:03:50 +01:00
Dimitris Aragiorgis 1b6bc94d5d Fix migration in case of scsi-generic
During migration, QEMU uses fsync()/fdatasync() on the open file
descriptor for read-write block devices to flush data just before
stopping the VM.

However, fsync() on a scsi-generic device returns -EINVAL which
causes the migration to fail. This patch skips flushing data in case
of an SG device, since submitting SCSI commands directly via an SG
character device (e.g. /dev/sg0) bypasses the page cache completely,
anyway.

Note that fsync() not only flushes the page cache but also the disk
cache. The scsi-generic device never sends flushes, and for
migration it assumes that the same SCSI device is used by the
destination host, so it does not issue any SCSI SYNCHRONIZE CACHE
(10) command.

Finally, remove the bdrv_is_sg() test from iscsi_co_flush() since
this is now redundant (we flush the underlying protocol at the end
of bdrv_co_flush() which, with this patch, we never reach).

Signed-off-by: Dimitris Aragiorgis <dimara@arrikto.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1435056300-14924-3-git-send-email-dimara@arrikto.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-06-23 15:08:52 +01:00
Alexander Yarygin f406c03c09 block: Let bdrv_drain_all() to call aio_poll() for each AioContext
After the commit 9b536adc ("block: acquire AioContext in
bdrv_drain_all()") the aio_poll() function got called for every
BlockDriverState, in assumption that every device may have its own
AioContext. If we have thousands of disks attached, there are a lot of
BlockDriverStates but only a few AioContexts, leading to tons of
unnecessary aio_poll() calls.

This patch changes the bdrv_drain_all() function allowing it find shared
AioContexts and to call aio_poll() only for unique ones.

Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Message-id: 1433936297-7098-4-git-send-email-yarygin@linux.vnet.ibm.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-06-23 15:06:16 +01:00
Markus Armbruster d49b683644 qerror: Move #include out of qerror.h
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
2015-06-22 18:20:40 +02:00
Alberto Garcia 76f4afb40f throttle: Add throttle group support
The throttle group support use a cooperative round robin scheduling
algorithm.

The principles of the algorithm are simple:
- Each BDS of the group is used as a token in a circular way.
- The active BDS computes if a wait must be done and arms the right
  timer.
- If a wait must be done the token timer will be armed so the token
  will become the next active BDS.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: f0082a86f3ac01c46170f7eafe2101a92e8fde39.1433779731.git.berto@igalia.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-06-12 14:00:00 +01:00