When an NBD block driver state is moved from one aio_context to another
(e.g. when doing a drain in a migration thread),
nbd_client_attach_aio_context_bh is executed that enters the connection
coroutine.
However, the assumption that ->connection_co is always present here
appears incorrect: the connection may have encountered an error other
than -EIO in the underlying transport, and thus may have decided to quit
rather than keep trying to reconnect, and therefore it may have
terminated the connection coroutine. As a result an attempt to reassign
the client in this state (NBD_CLIENT_QUIT) to a different aio_context
leads to a null pointer dereference:
#0 qio_channel_detach_aio_context (ioc=0x0)
at /build/qemu-gYtjVn/qemu-5.0.1/io/channel.c:452
#1 0x0000562a242824b3 in bdrv_detach_aio_context (bs=0x562a268d6a00)
at /build/qemu-gYtjVn/qemu-5.0.1/block.c:6151
#2 bdrv_set_aio_context_ignore (bs=bs@entry=0x562a268d6a00,
new_context=new_context@entry=0x562a260c9580,
ignore=ignore@entry=0x7feeadc9b780)
at /build/qemu-gYtjVn/qemu-5.0.1/block.c:6230
#3 0x0000562a24282969 in bdrv_child_try_set_aio_context
(bs=bs@entry=0x562a268d6a00, ctx=0x562a260c9580,
ignore_child=<optimized out>, errp=<optimized out>)
at /build/qemu-gYtjVn/qemu-5.0.1/block.c:6332
#4 0x0000562a242bb7db in blk_do_set_aio_context (blk=0x562a2735d0d0,
new_context=0x562a260c9580,
update_root_node=update_root_node@entry=true, errp=errp@entry=0x0)
at /build/qemu-gYtjVn/qemu-5.0.1/block/block-backend.c:1989
#5 0x0000562a242be0bd in blk_set_aio_context (blk=<optimized out>,
new_context=<optimized out>, errp=errp@entry=0x0)
at /build/qemu-gYtjVn/qemu-5.0.1/block/block-backend.c:2010
#6 0x0000562a23fbd953 in virtio_blk_data_plane_stop (vdev=<optimized
out>)
at /build/qemu-gYtjVn/qemu-5.0.1/hw/block/dataplane/virtio-blk.c:292
#7 0x0000562a241fc7bf in virtio_bus_stop_ioeventfd (bus=0x562a260dbf08)
at /build/qemu-gYtjVn/qemu-5.0.1/hw/virtio/virtio-bus.c:245
#8 0x0000562a23fefb2e in virtio_vmstate_change (opaque=0x562a260dbf90,
running=0, state=<optimized out>)
at /build/qemu-gYtjVn/qemu-5.0.1/hw/virtio/virtio.c:3220
#9 0x0000562a2402ebfd in vm_state_notify (running=running@entry=0,
state=state@entry=RUN_STATE_FINISH_MIGRATE)
at /build/qemu-gYtjVn/qemu-5.0.1/softmmu/vl.c:1275
#10 0x0000562a23f7bc02 in do_vm_stop (state=RUN_STATE_FINISH_MIGRATE,
send_stop=<optimized out>)
at /build/qemu-gYtjVn/qemu-5.0.1/cpus.c:1032
#11 0x0000562a24209765 in migration_completion (s=0x562a260e83a0)
at /build/qemu-gYtjVn/qemu-5.0.1/migration/migration.c:2914
#12 migration_iteration_run (s=0x562a260e83a0)
at /build/qemu-gYtjVn/qemu-5.0.1/migration/migration.c:3275
#13 migration_thread (opaque=opaque@entry=0x562a260e83a0)
at /build/qemu-gYtjVn/qemu-5.0.1/migration/migration.c:3439
#14 0x0000562a2435ca96 in qemu_thread_start (args=<optimized out>)
at /build/qemu-gYtjVn/qemu-5.0.1/util/qemu-thread-posix.c:519
#15 0x00007feed31466ba in start_thread (arg=0x7feeadc9c700)
at pthread_create.c:333
#16 0x00007feed2e7c41d in __GI___sysctl (name=0x0, nlen=608471908,
oldval=0x562a2452b138, oldlenp=0x0, newval=0x562a2452c5e0
<__func__.28102>, newlen=0)
at ../sysdeps/unix/sysv/linux/sysctl.c:30
#17 0x0000000000000000 in ?? ()
Fix it by checking that the connection coroutine is non-null before
trying to enter it. If it is null, no entering is needed, as the
connection is probably going down anyway.
Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210129073859.683063-3-rvkagan@yandex-team.ru>
Signed-off-by: Eric Blake <eblake@redhat.com>
When the reconnect in NBD client is in progress, the iochannel used for
NBD connection doesn't exist. Therefore an attempt to detach it from
the aio_context of the parent BlockDriverState results in a NULL pointer
dereference.
The problem is triggerable, in particular, when an outgoing migration is
about to finish, and stopping the dataplane tries to move the
BlockDriverState from the iothread aio_context to the main loop. If the
NBD connection is lost before this point, and the NBD client has entered
the reconnect procedure, QEMU crashes:
#0 qemu_aio_coroutine_enter (ctx=0x5618056c7580, co=0x0)
at /build/qemu-6MF7tq/qemu-5.0.1/util/qemu-coroutine.c:109
#1 0x00005618034b1b68 in nbd_client_attach_aio_context_bh (
opaque=0x561805ed4c00) at /build/qemu-6MF7tq/qemu-5.0.1/block/nbd.c:164
#2 0x000056180353116b in aio_wait_bh (opaque=0x7f60e1e63700)
at /build/qemu-6MF7tq/qemu-5.0.1/util/aio-wait.c:55
#3 0x0000561803530633 in aio_bh_call (bh=0x7f60d40a7e80)
at /build/qemu-6MF7tq/qemu-5.0.1/util/async.c:136
#4 aio_bh_poll (ctx=ctx@entry=0x5618056c7580)
at /build/qemu-6MF7tq/qemu-5.0.1/util/async.c:164
#5 0x0000561803533e5a in aio_poll (ctx=ctx@entry=0x5618056c7580,
blocking=blocking@entry=true)
at /build/qemu-6MF7tq/qemu-5.0.1/util/aio-posix.c:650
#6 0x000056180353128d in aio_wait_bh_oneshot (ctx=0x5618056c7580,
cb=<optimized out>, opaque=<optimized out>)
at /build/qemu-6MF7tq/qemu-5.0.1/util/aio-wait.c:71
#7 0x000056180345c50a in bdrv_attach_aio_context (new_context=0x5618056c7580,
bs=0x561805ed4c00) at /build/qemu-6MF7tq/qemu-5.0.1/block.c:6172
#8 bdrv_set_aio_context_ignore (bs=bs@entry=0x561805ed4c00,
new_context=new_context@entry=0x5618056c7580,
ignore=ignore@entry=0x7f60e1e63780)
at /build/qemu-6MF7tq/qemu-5.0.1/block.c:6237
#9 0x000056180345c969 in bdrv_child_try_set_aio_context (
bs=bs@entry=0x561805ed4c00, ctx=0x5618056c7580,
ignore_child=<optimized out>, errp=<optimized out>)
at /build/qemu-6MF7tq/qemu-5.0.1/block.c:6332
#10 0x00005618034957db in blk_do_set_aio_context (blk=0x56180695b3f0,
new_context=0x5618056c7580, update_root_node=update_root_node@entry=true,
errp=errp@entry=0x0)
at /build/qemu-6MF7tq/qemu-5.0.1/block/block-backend.c:1989
#11 0x00005618034980bd in blk_set_aio_context (blk=<optimized out>,
new_context=<optimized out>, errp=errp@entry=0x0)
at /build/qemu-6MF7tq/qemu-5.0.1/block/block-backend.c:2010
#12 0x0000561803197953 in virtio_blk_data_plane_stop (vdev=<optimized out>)
at /build/qemu-6MF7tq/qemu-5.0.1/hw/block/dataplane/virtio-blk.c:292
#13 0x00005618033d67bf in virtio_bus_stop_ioeventfd (bus=0x5618056d9f08)
at /build/qemu-6MF7tq/qemu-5.0.1/hw/virtio/virtio-bus.c:245
#14 0x00005618031c9b2e in virtio_vmstate_change (opaque=0x5618056d9f90,
running=0, state=<optimized out>)
at /build/qemu-6MF7tq/qemu-5.0.1/hw/virtio/virtio.c:3220
#15 0x0000561803208bfd in vm_state_notify (running=running@entry=0,
state=state@entry=RUN_STATE_FINISH_MIGRATE)
at /build/qemu-6MF7tq/qemu-5.0.1/softmmu/vl.c:1275
#16 0x0000561803155c02 in do_vm_stop (state=RUN_STATE_FINISH_MIGRATE,
send_stop=<optimized out>) at /build/qemu-6MF7tq/qemu-5.0.1/cpus.c:1032
#17 0x00005618033e3765 in migration_completion (s=0x5618056e6960)
at /build/qemu-6MF7tq/qemu-5.0.1/migration/migration.c:2914
#18 migration_iteration_run (s=0x5618056e6960)
at /build/qemu-6MF7tq/qemu-5.0.1/migration/migration.c:3275
#19 migration_thread (opaque=opaque@entry=0x5618056e6960)
at /build/qemu-6MF7tq/qemu-5.0.1/migration/migration.c:3439
#20 0x0000561803536ad6 in qemu_thread_start (args=<optimized out>)
at /build/qemu-6MF7tq/qemu-5.0.1/util/qemu-thread-posix.c:519
#21 0x00007f61085d06ba in start_thread ()
from /lib/x86_64-linux-gnu/libpthread.so.0
#22 0x00007f610830641d in sysctl () from /lib/x86_64-linux-gnu/libc.so.6
#23 0x0000000000000000 in ?? ()
Fix it by checking that the iochannel is non-null before trying to
detach it from the aio_context. If it is null, no detaching is needed,
and it will get reattached in the proper aio_context once the connection
is reestablished.
Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210129073859.683063-2-rvkagan@yandex-team.ru>
Signed-off-by: Eric Blake <eblake@redhat.com>
We are generally moving to int64_t for both offset and bytes parameters
on all io paths.
Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.
We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).
So, convert now copy_range parameters which are already 64bit to signed
type.
It's safe as we don't work with requests overflowing BDRV_MAX_LENGTH
(which is less than INT64_MAX), and do check the requests in
bdrv_co_copy_range_internal() (by bdrv_check_request32(), which calls
bdrv_check_request()).
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-17-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
We are generally moving to int64_t for both offset and bytes parameters
on all io paths.
Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.
We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).
Now, since bdrv_co_preadv_part() and bdrv_co_pwritev_part() have been
updated, update all their wrappers.
For all of them type of 'bytes' is widening, so callers are safe. We
have update request_fn in blkverify.c simultaneously. Still it's just a
pointer to one of bdrv_co_pwritev() or bdrv_co_preadv(), and type is
widening for callers of the request_fn anyway.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-16-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: grammar tweak]
Signed-off-by: Eric Blake <eblake@redhat.com>
We are generally moving to int64_t for both offset and bytes parameters
on all io paths.
Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.
We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).
So, prepare bdrv_co_preadv_part() and bdrv_co_pwritev_part() and their
remaining dependencies now.
bdrv_pad_request() is updated simultaneously, as pointer to bytes passed
to it both from bdrv_co_pwritev_part() and bdrv_co_preadv_part().
So, all callers of bdrv_pad_request() are updated to pass 64bit bytes.
bdrv_pad_request() is already good for 64bit requests, add
corresponding assertion.
Look at bdrv_co_preadv_part() and bdrv_co_pwritev_part().
Type is widening, so callers are safe. Let's look inside the functions.
In bdrv_co_preadv_part() and bdrv_aligned_pwritev() we only pass bytes
to other already int64_t interfaces (and some obviously safe
calculations), it's OK.
In bdrv_co_do_zero_pwritev() aligned_bytes may become large now, still
it's passed to bdrv_aligned_pwritev which supports int64_t bytes.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-15-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
We are generally moving to int64_t for both offset and bytes parameters
on all io paths.
Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.
We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).
So, prepare bdrv_aligned_preadv() now.
Make the bytes variable in bdrv_padding_rmw_read() int64_t, as it is
only used for pass-through to bdrv_aligned_preadv().
All bdrv_aligned_preadv() callers are safe as type is widening. Let's
look inside:
- add a new-style assertion that request is good.
- callees bdrv_is_allocated(), bdrv_co_do_copy_on_readv() supports
int64_t bytes
- conversion of bytes_remaining is OK, as we never have requests
overflowing BDRV_MAX_LENGTH
- looping through bytes_remaining is ok, num is updated to int64_t
- for bdrv_driver_preadv we have same limit of max_transfer
- qemu_iovec_memset is OK, as bytes+qiov_offset should not overflow
qiov->size anyway (thanks to bdrv_check_qiov_request())
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-14-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: grammar tweak]
Signed-off-by: Eric Blake <eblake@redhat.com>
We are generally moving to int64_t for both offset and bytes parameters
on all io paths.
Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.
We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).
So, prepare bdrv_co_do_copy_on_readv() now.
'bytes' type widening, so callers are safe. Look at the function
itself:
bytes, skip_bytes and progress become int64_t.
bdrv_round_to_clusters() is OK, cluster_bytes now may be large.
trace_bdrv_co_do_copy_on_readv() is OK
looping through cluster_bytes is still OK.
pnum is still capped to max_transfer, and to MAX_BOUNCE_BUFFER when we
are going to do COR operation. Therefor calculations in
qemu_iovec_from_buf() and bdrv_driver_preadv() should not change.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-13-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
We are generally moving to int64_t for both offset and bytes parameters
on all io paths.
Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.
We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).
So, prepare bdrv_aligned_pwritev() now and convert the dependencies:
bdrv_co_write_req_prepare() and bdrv_co_write_req_finish() to signed
type bytes.
Conversion of bdrv_co_write_req_prepare() and
bdrv_co_write_req_finish() is definitely safe, as all requests in
block/io must not overflow BDRV_MAX_LENGTH. Still add assertions.
For bdrv_aligned_pwritev() 'bytes' type is widened, so callers are
safe. Let's check usage of the parameter inside the function.
Passing to bdrv_co_write_req_prepare() and bdrv_co_write_req_finish()
is OK.
Passing to qemu_iovec_* is OK after new assertion. All other callees
are already updated to int64_t.
Checking alignment is not changed, offset + bytes and qiov_offset +
bytes calculations are safe (thanks to new assertions).
max_transfer is kept to be int for now. It has a default of INT_MAX
here, and some drivers may rely on it. It's to be refactored later.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-12-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
We are generally moving to int64_t for both offset and bytes parameters
on all io paths.
Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.
We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).
So, prepare bdrv_co_do_pwrite_zeroes() now.
Callers are safe, as converting int to int64_t is safe. Concentrate on
'bytes' usage in the function (thx to Eric Blake):
compute 'int tail' via % 'int alignment' - safe
fragmentation loop 'int num' - still fragments with a cap on
max_transfer
use of 'num' within the loop
MIN(bytes, max_transfer) as well as %alignment - still works, so
calculations in if (head) {} are safe
clamp size by 'int max_write_zeroes' - safe
drv->bdrv_co_pwrite_zeroes(int) - safe because of clamping
clamp size by 'int max_transfer' - safe
buf allocation is still clamped to max_transfer
qemu_iovec_init_buf(size_t) - safe because of clamping
bdrv_driver_pwritev(uint64_t) - safe
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-11-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
We are generally moving to int64_t for both offset and bytes parameters
on all io paths.
Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.
We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).
So, convert driver wrappers parameters which are already 64bit to
signed type.
Requests in block/io.c must never exceed BDRV_MAX_LENGTH (which is less
than INT64_MAX), which makes the conversion to signed 64bit type safe.
Add corresponding assertions.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-10-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
We are generally moving to int64_t for both offset and bytes parameters
on all io paths.
Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.
We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).
All requests in block/io must not overflow BDRV_MAX_LENGTH, all
external users of BdrvTrackedRequest already have corresponding
assertions, so we are safe. Add some assertions still.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-9-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Operations with qiov add more restrictions on bytes, let's cover it.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-8-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
The function is called from 64bit io handlers, and bytes is just passed
to throttle_account() which is 64bit too (unsigned though). So, let's
convert intermediate argument to 64bit too.
This patch is a first in the 64-bit-blocklayer series, so we are
generally moving to int64_t for both offset and bytes parameters on all
io paths. Main motivation is realization of 64-bit write_zeroes
operation for fast zeroing large disk chunks, up to the whole disk.
We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).
Patch-correctness audit by Eric Blake:
Caller has 32-bit, this patch now causes widening which is safe:
block/block-backend.c: blk_do_preadv() passes 'unsigned int'
block/block-backend.c: blk_do_pwritev_part() passes 'unsigned int'
block/throttle.c: throttle_co_pwrite_zeroes() passes 'int'
block/throttle.c: throttle_co_pdiscard() passes 'int'
Caller has 64-bit, this patch fixes potential bug where pre-patch
could narrow, except it's easy enough to trace that callers are still
capped at 2G actions:
block/throttle.c: throttle_co_preadv() passes 'uint64_t'
block/throttle.c: throttle_co_pwritev() passes 'uint64_t'
Implementation in question: block/throttle-groups.c
throttle_group_co_io_limits_intercept() takes 'unsigned int bytes'
and uses it: argument to util/throttle.c throttle_account(uint64_t)
All safe: it patches a latent bug, and does not introduce any 64-bit
gotchas once throttle_co_p{read,write}v are relaxed, and assuming
throttle_account() is not buggy.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20201211183934.169161-7-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Make bdrv_pad_request() honest: return error if
qemu_iovec_init_extended() failed.
Update also bdrv_padding_destroy() to clean the structure for safety.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-6-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Prepare for the following patch when bdrv_pad_request() will be able to
fail. Update the comments.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-5-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: grammar tweak]
Signed-off-by: Eric Blake <eblake@redhat.com>
Calculation of sum may theoretically overflow, so use 64bit type and
add some good assertions.
Use int64_t constantly.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-4-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: tweak assertion order]
Signed-off-by: Eric Blake <eblake@redhat.com>
Actually, we can't extend the io vector in all cases. Handle possible
MAX_IOV and size_t overflows.
For now add assertion to callers (actually they rely on success anyway)
and fix them in the following patch.
Add also some additional good assertions to qemu_iovec_init_slice()
while being here.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-3-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
It's better to pass &error_abort than just assert that result is 0: on
crash, we'll immediately see the reason in the backtrace.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-2-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: fix iotest 206 fallout]
Signed-off-by: Eric Blake <eblake@redhat.com>
size_to_str() can return a size like "4.24 MiB", with a single digit
integer part and two fractional digits. This is eight characters, but
commit b39847a5 changed the format string to only reserve seven
characters for the column.
This can result in unaligned columns, which in turn changes the output of
iotests case 267 because exceeding the column size defeats the attempt
to filter the size out of the output (observed with the ppc64 emulator).
The resulting change is only a whitespace change, but since commit
f203080b this is enough for iotests to consider the test failed.
Taking a character away from the tag name column and adding it to the VM
size column doesn't change anything in the common case (the tag name is
left justified, the VM size is right justified), but fixes this case.
Fixes: b39847a505
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210202155911.179865-1-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
NVMe controllers implement different versions of the spec,
and different features of it. It is useful to gather this
information when debugging.
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210127212137.3482291-3-philmd@redhat.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Commit 15b2260bef ("block/nvme: Trace controller capabilities")
misunderstood the doorbell stride value from the datasheet, use
the correct one. The 'doorbell_scale' variable used few lines
later is correct.
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210127212137.3482291-2-philmd@redhat.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
These cases require a bit more thought to review; in each case, the
code was appending to a list, but not with a FOOList **tail variable.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210113221013.390592-6-eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Flawed change to qmp_guest_network_get_interfaces() dropped]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
The easiest spots to use QAPI_LIST_APPEND are where we already have an
obvious pointer to the tail of a list. While at it, consistently use
the variable name 'tail' for that purpose.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20210113221013.390592-5-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Currently, blk_is_read_only() tells whether a given BlockBackend can
only be used in read-only mode because its root node is read-only. Some
callers actually try to answer a slightly different question: Is the
BlockBackend configured to be writable, by taking write permissions on
the root node?
This can differ, for example, for CD-ROM devices which don't take write
permissions, but may be backed by a writable image file. scsi-cd allows
write requests to the drive if blk_is_read_only() returns false.
However, the write request will immediately run into an assertion
failure because the write permission is missing.
This patch introduces separate functions for both questions.
blk_supports_write_perm() answers the question whether the block
node/image file can support writable devices, whereas blk_is_writable()
tells whether the BlockBackend is currently configured to be writable.
All calls of blk_is_read_only() are converted to one of the two new
functions.
Fixes: https://bugs.launchpad.net/bugs/1906693
Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210118123448.307825-2-kwolf@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
When a call to fcntl(2) for the purpose of adding file locks fails
with an error other than EAGAIN or EACCES, report the error returned
by fcntl.
EAGAIN or EACCES are elided as they are considered to be common
failures, indicating that a conflicting lock is held by another
process.
No errors are elided when removing file locks.
Signed-off-by: David Edmondson <david.edmondson@oracle.com>
Message-Id: <20210113164447.2545785-1-david.edmondson@oracle.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210116214705.822267-21-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Drop unused code.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210116214705.822267-20-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
This brings async request handling and block-status driven chunk sizes
to backup out of the box, which improves backup performance.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210116214705.822267-18-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210116214705.822267-17-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
We are going to stop use of this callback in the following commit.
Still the callback handling code will be dropped in a separate commit.
So, for now let's make it optional.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210116214705.822267-16-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Add new parameters to configure future backup features. The patch
doesn't introduce aio backup requests (so we actually have only one
worker) neither requests larger than one cluster. Still, formally we
satisfy these maximums anyway, so add the parameters now, to facilitate
further patch which will really change backup job behavior.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210116214705.822267-11-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Add function to cancel running async block-copy call. It will be used
in backup.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210116214705.822267-8-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
We are going to directly use one async block-copy operation for backup
job, so we need rate limiter.
We want to maintain current backup behavior: only background copying is
limited and copy-before-write operations only participate in limit
calculation. Therefore we need one rate limiter for block-copy state
and boolean flag for block-copy call state for actual limitation.
Note, that we can't just calculate each chunk in limiter after
successful copying: it will not save us from starting a lot of async
sub-requests which will exceed limit too much. Instead let's use the
following scheme on sub-request creation:
1. If at the moment limit is not exceeded, create the request and
account it immediately.
2. If at the moment limit is already exceeded, drop create sub-request
and handle limit instead (by sleep).
With this approach we'll never exceed the limit more than by one
sub-request (which pretty much matches current backup behavior).
Note also, that if there is in-flight block-copy async call,
block_copy_kick() should be used after set-speed to apply new setup
faster. For that block_copy_kick() published in this patch.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210116214705.822267-7-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
It simplifies debugging.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210116214705.822267-6-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
They will be used for backup.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210116214705.822267-5-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
We'll need async block-copy invocation to use in backup directly.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210116214705.822267-4-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Refactor common path to use BlockCopyCallState pointer as parameter, to
prepare it for use in asynchronous block-copy (at least, we'll need to
run block-copy in a coroutine, passing the whole parameters as one
pointer).
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210116214705.822267-3-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Experiments show, that copy_range is not always making things faster.
So, to make experimentation simpler, let's add a parameter. Some more
perf parameters will be added soon, so here is a new struct.
For now, add new backup qmp parameter with x- prefix for the following
reasons:
- We are going to add more performance parameters, some will be
related to the whole block-copy process, some only to background
copying in backup (ignored for copy-before-write operations).
- On the other hand, we are going to use block-copy interface in other
block jobs, which will need performance options as well.. And it
should be the same structure or at least somehow related.
So, there are too much unclean things about how the interface and now
we need the new options mostly for testing. Let's keep them
experimental for a while.
In do_backup_common() new x-perf parameter handled in a way to
make further options addition simpler.
We add use-copy-range with default=true, and we'll change the default
in further patch, after moving backup to use block-copy.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210116214705.822267-2-vsementsov@virtuozzo.com>
[mreitz: s/5\.2/6.0/]
Signed-off-by: Max Reitz <mreitz@redhat.com>
This patch completes the series with the COR-filter applied to
block-stream operations.
Adding the filter makes it possible in future implement discarding
copied regions in backing files during the block-stream job, to reduce
the disk overuse (we need control on permissions).
Also, the filter now is smart enough to do copy-on-read with specified
base, so we have benefit on guest reads even when doing block-stream of
the part of the backing chain.
Several iotests are slightly modified due to filter insertion.
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201216061703.70908-14-vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Add a direct link to target bs for convenience and to simplify
following commit which will insert COR filter above target bs.
This is a part of original commit written by Andrey.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201216061703.70908-13-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
The code already don't freeze base node and we try to make it prepared
for the situation when base node is changed during the operation. In
other words, block-stream doesn't own base node.
Let's introduce a new interface which should replace the current one,
which will in better relations with the code. Specifying bottom node
instead of base, and requiring it to be non-filter gives us the
following benefits:
- drop difference between above_base and base_overlay, which will be
renamed to just bottom, when old interface dropped
- clean way to work with parallel streams/commits on the same backing
chain, which otherwise become a problem when we introduce a filter
for stream job
- cleaner interface. Nobody will surprised the fact that base node may
disappear during block-stream, when there is no word about "base" in
the interface.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201216061703.70908-11-vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Stream in stream_prepare calls bdrv_change_backing_file() to change
backing-file in the metadata of bs.
It may use either backing-file parameter given by user or just take
filename of base on job start.
Backing file format is determined by base on job finish.
There are some problems with this design, we solve only two by this
patch:
1. Consider scenario with backing-file unset. Current concept of stream
supports changing of the base during the job (we don't freeze link to
the base). So, we should not save base filename at job start,
- let's determine name of the base on job finish.
2. Using direct base to determine filename and format is not very good:
base node may be a filter, so its filename may be JSON, and format_name
is not good for storing into qcow2 metadata as backing file format.
- let's use unfiltered_base
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[vsementsov: change commit subject, change logic in stream_prepare]
Message-Id: <20201216061703.70908-10-vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
If the flag BDRV_REQ_PREFETCH was set, skip idling read/write
operations in COR-driver. It can be taken into account for the
COR-algorithms optimization. That check is being made during the
block stream job by the moment.
Add the BDRV_REQ_PREFETCH flag to the supported_read_flags of the
COR-filter.
block: Modify the comment for the flag BDRV_REQ_PREFETCH as we are
going to use it alone and pass it to the COR-filter driver for further
processing.
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201216061703.70908-9-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Add the new member supported_read_flags to the BlockDriverState
structure. It will control the flags set for copy-on-read operations.
Make the block generic layer evaluate supported read flags before they
go to a block driver.
Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[vsementsov: use assert instead of abort]
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201216061703.70908-8-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Add an option to limit copy-on-read operations to specified sub-chain
of backing-chain, to make copy-on-read filter useful for block-stream
job.
Suggested-by: Max Reitz <mreitz@redhat.com>
Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[vsementsov: change subject, modified to freeze the chain,
do some fixes]
Message-Id: <20201216061703.70908-6-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Provide the possibility to pass the 'filter-node-name' parameter to the
block-stream job as it is done for the commit block job.
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[vsementsov: comment indentation, s/Since: 5.2/Since: 6.0/]
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201216061703.70908-5-vsementsov@virtuozzo.com>
[mreitz: s/commit/stream/]
Signed-off-by: Max Reitz <mreitz@redhat.com>
Provide API for the COR-filter removal. Also, drop the filter child
permissions for an inactive state when the filter node is being
removed.
To insert the filter, the block generic layer function
bdrv_insert_node() can be used.
The new function bdrv_cor_filter_drop() may be considered as an
intermediate solution before the QEMU permission update system has
overhauled. Then we are able to implement the API function
bdrv_remove_node() on the block generic layer.
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201216061703.70908-4-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Add support for the recently introduced functions
bdrv_co_preadv_part()
and
bdrv_co_pwritev_part()
to the COR-filter driver.
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201216061703.70908-2-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>