Commit Graph

301 Commits

Author SHA1 Message Date
Daniel P. Berrange c01c214b69 block: remove all encryption handling APIs
Now that all encryption keys must be provided upfront via
the QCryptoSecret API and associated block driver properties
there is no need for any explicit encryption handling APIs
in the block layer. Encryption can be handled transparently
within the block driver. We only retain an API for querying
whether an image is encrypted or not, since that is a
potentially useful piece of metadata to report to the user.

Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170623162419.26068-18-berrange@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-07-11 17:44:56 +02:00
Daniel P. Berrange 0cb8d47ba9 block: deprecate "encryption=on" in favor of "encrypt.format=aes"
Historically the qcow & qcow2 image formats supported a property
"encryption=on" to enable their built-in AES encryption. We'll
soon be supporting LUKS for qcow2, so need a more general purpose
way to enable encryption, with a choice of formats.

This introduces an "encrypt.format" option, which will later be
joined by a number of other "encrypt.XXX" options. The use of
a "encrypt." prefix instead of "encrypt-" is done to facilitate
mapping to a nested QAPI schema at later date.

e.g. the preferred syntax is now

  qemu-img create -f qcow2 -o encrypt.format=aes demo.qcow2

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170623162419.26068-8-berrange@redhat.com
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-07-11 17:44:55 +02:00
Manos Pitsidianakis f5a5ca7969 block: change variable names in BlockDriverState
Change the 'int count' parameter in *pwrite_zeros, *pdiscard related
functions (and some others) to 'int bytes', as they both refer to bytes.
This helps with code legibility.

Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr>
Message-id: 20170609101808.13506-1-el13635@mail.ntua.gr
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-06-26 14:54:46 +02:00
Paolo Bonzini b64bd51efa block: protect modification of dirty bitmaps with a mutex
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20170605123908.18777-17-pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
2017-06-16 07:55:00 +08:00
Paolo Bonzini 2119882c7e block: introduce dirty_bitmap_mutex
It protects only the list of dirty bitmaps; in the next patch we will
also protect their content.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20170605123908.18777-15-pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
2017-06-16 07:55:00 +08:00
Paolo Bonzini 3783fa3dd3 block: protect tracked_requests and flush_queue with reqs_lock
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20170605123908.18777-14-pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
2017-06-16 07:55:00 +08:00
Paolo Bonzini 47fec59941 block: access write_gen with atomics
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20170605123908.18777-13-pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
2017-06-16 07:55:00 +08:00
Paolo Bonzini f7946da274 block: use Stat64 for wr_highest_offset
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20170605123908.18777-12-pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
2017-06-16 07:55:00 +08:00
Paolo Bonzini 850d54a2a9 block: access io_plugged with atomic ops
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20170605123908.18777-7-pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
2017-06-16 07:55:00 +08:00
Paolo Bonzini e2a6ae7fe5 block: access wakeup with atomic ops
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20170605123908.18777-6-pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
2017-06-16 07:55:00 +08:00
Paolo Bonzini 20fc71b25c block: access serialising_in_flight with atomic ops
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20170605123908.18777-5-pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
2017-06-16 07:55:00 +08:00
Paolo Bonzini 414c2ec358 block: access quiesce_counter with atomic ops
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20170605123908.18777-3-pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
2017-06-16 07:55:00 +08:00
Paolo Bonzini d3faa13e5f block: access copy_on_read with atomic ops
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20170605123908.18777-2-pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
2017-06-16 07:55:00 +08:00
Juan Quintela 5e22479ae2 migration: Create include for migration snapshots
Start removing migration code from sysemu/sysemu.h.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2017-06-01 18:49:23 +02:00
Max Reitz 03c320d803 block/file-*: *_parse_filename() and colons
The file drivers' *_parse_filename() implementations just strip the
optional protocol prefix off the filename. However, for e.g.
"file:foo:bar", this would lead to "foo:bar" being stored as the BDS's
filename which looks like it should be managed using the "foo" protocol.
This is especially troublesome if you then try to resolve a backing
filename based on "foo:bar".

This issue can only occur if the stripped part is a relative filename
("file:/foo:bar" will be shortened to "/foo:bar" and having a slash
before the first colon means that "/foo" is not recognized as a protocol
part). Therefore, we can easily fix it by prepending "./" to such
filenames.

Before this patch:
$ ./qemu-img create -f qcow2 backing.qcow2 64M
Formatting 'backing.qcow2', fmt=qcow2 size=67108864 encryption=off
    cluster_size=65536 lazy_refcounts=off refcount_bits=16
$ ./qemu-img create -f qcow2 -b backing.qcow2 file🔝image.qcow2
Formatting 'file🔝image.qcow2', fmt=qcow2 size=67108864
    backing_file=backing.qcow2 encryption=off cluster_size=65536
    lazy_refcounts=off refcount_bits=16
$ ./qemu-io file🔝image.qcow2
can't open device file🔝image.qcow2: Could not open backing file:
    Unknown protocol 'top'

After this patch:
$ ./qemu-io file🔝image.qcow2
[no error]

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170522195217.12991-3-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-05-29 15:39:54 +02:00
Eric Blake 4c41cb4955 block: Update comments on BDRV_BLOCK_* meanings
We had some conflicting documentation: a nice 8-way table that
described all possible combinations of DATA, ZERO, and
OFFSET_VALID, contrasted with text that implied that OFFSET_VALID
always meant raw data could be read directly.  Furthermore, the
text refers a lot to bs->file, even though the interface was
updated back in 67a0fd2a to let the driver pass back a specific
BDS (not necessarily bs->file).  As the 8-way table is the
intended semantics, simplify the rest of the text to get rid of
the confusion.

ALLOCATED is always set by the block layer for convenience (drivers
do not have to worry about it).  RAW is used only internally, but
by more than the raw driver.  Document these additional items on
the driver callback.

Suggested-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170507000552.20847-4-eblake@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-05-11 14:28:06 +02:00
Kevin Wolf cfa1a5723f block: Drop permissions when migration completes
With image locking, permissions affect other qemu processes as well. We
want to be sure that the destination can run, so let's drop permissions
on the source when migration completes.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2017-05-11 12:08:24 +02:00
Kevin Wolf 4417ab7adf block: New BdrvChildRole.activate() for blk_resume_after_migration()
Instead of manually calling blk_resume_after_migration() in migration
code after doing bdrv_invalidate_cache_all(), integrate the BlockBackend
activation with cache invalidation into a single function. This is
achieved with a new callback in BdrvChildRole that is called by
bdrv_invalidate_cache_all().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2017-05-11 12:08:24 +02:00
Fam Zheng 5a9347c673 block: Add, parse and store "force-share" option
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-05-11 11:02:38 +02:00
Klim Kireev d4a7f45ec9 block: fix obvious coding style mistakes in block_int.h
Signed-off-by: Klim Kireev <proffk@virtuozzo.mipt.ru>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
Message-id: 1491405505-31620-2-git-send-email-den@openvz.org
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-04-28 16:02:03 +02:00
Max Reitz 4bff28b81a block: Add errp to BD.bdrv_truncate()
Add an Error parameter to the block drivers' bdrv_truncate() interface.
If a block driver does not set this in case of an error, the generic
bdrv_truncate() implementation will do so.

Where it is obvious, this patch also makes some block drivers set this
value.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170328205129.15138-4-mreitz@redhat.com
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-04-28 16:02:03 +02:00
Fam Zheng 78bbd910bb block: Make errp the last parameter of commit_active_start
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20170421122710.15373-9-famz@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2017-04-24 09:13:44 +02:00
Fam Zheng c1cef67251 block: Always call bdrv_child_check_perm first
bdrv_child_set_perm alone is not very usable because the caller must
call bdrv_child_check_perm first. This is already encapsulated
conveniently in bdrv_child_try_set_perm, so remove the other prototypes
from the header and fix the one wrong caller, block/mirror.c.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-03-17 12:54:06 +01:00
Kevin Wolf 5fe31c25cc block: Fix error handling in bdrv_replace_in_backing_chain()
When adding an Error parameter, bdrv_replace_in_backing_chain() would
become nothing more than a wrapper around change_parent_backing_link().
So make the latter public, renamed as bdrv_replace_node(), and remove
bdrv_replace_in_backing_chain().

Most of the callers just remove a node from the graph that they just
inserted, so they can use &error_abort, but completion of a mirror job
with 'replaces' set can actually fail.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2017-03-07 14:53:28 +01:00
Kevin Wolf 46181129ea block: Ignore multiple children in bdrv_check_update_perm()
change_parent_backing_link() will need to update multiple BdrvChild
objects at once. Checking permissions reference by reference doesn't
work because permissions need to be consistent only with all parents
moved to the new child.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2017-03-07 14:53:28 +01:00
Kevin Wolf 0db832f42e commit: Add filter-node-name to block-commit
Management tools need to be able to know about every node in the graph
and need a way to address them. Changing the graph structure was okay
because libvirt doesn't really manage the node level yet, but future
libvirt versions need to deal with both new and old version of qemu.

This new option to blockdev-commit allows the client to set a node-name
for the automatically inserted filter driver, and at the same time
serves as a witness for a future libvirt that this version of qemu does
automatically insert a filter driver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2017-02-28 20:47:50 +01:00
Kevin Wolf 6cdbceb12c mirror: Add filter-node-name to blockdev-mirror
Management tools need to be able to know about every node in the graph
and need a way to address them. Changing the graph structure was okay
because libvirt doesn't really manage the node level yet, but future
libvirt versions need to deal with both new and old version of qemu.

This new option to blockdev-mirror allows the client to set a node-name
for the automatically inserted filter driver, and at the same time
serves as a witness for a future libvirt that this version of qemu does
automatically insert a filter driver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
2017-02-28 20:47:50 +01:00
Kevin Wolf db95dbba3b block: BdrvChildRole.attach/detach() callbacks
Backing files are somewhat special compared to other kinds of children
because they are attached and detached using bdrv_set_backing_hd()
rather than the normal set of functions, which does a few more things
like setting backing blockers, toggling the BDRV_O_NO_BACKING flag,
setting parent_bs->backing_file, etc.

These special features are a reason why change_parent_backing_link()
can't handle backing files yet. With abstracting the additional features
into .attach/.detach callbacks, we get a step closer to a function that
can actually deal with this.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
2017-02-28 20:40:37 +01:00
Kevin Wolf 26de9438c1 block: Add BdrvChildRole.stay_at_node
When the parents' child links are updated in bdrv_append() or
bdrv_replace_in_backing_chain(), this should affect all child links of
BlockBackends or other nodes, but not on child links held for other
purposes (like for setting permissions). This patch allows to control
the behaviour per BdrvChildRole.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
2017-02-28 20:40:37 +01:00
Kevin Wolf b541155587 block: Add BdrvChildRole.get_parent_desc()
For meaningful error messages in the permission system, we need to get
some human-readable description of the parent of a BdrvChild.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
2017-02-28 20:40:37 +01:00
Kevin Wolf 39829a01ae block: Allow error return in BlockDevOps.change_media_cb()
Some devices allow a media change between read-only and read-write
media. They need to adapt the permissions in their .change_media_cb()
implementation, which can fail. So add an Error parameter to the
function.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
2017-02-28 20:40:36 +01:00
Kevin Wolf 91ef38257a vvfat: Implement .bdrv_child_perm()
vvfat is the last remaining driver that can have children, but doesn't
implement .bdrv_child_perm() yet. The default handlers aren't suitable
here, so let's implement a very simple driver-specific one that protects
the internal child from being used by other users as good as our
permissions permit.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
2017-02-28 20:40:36 +01:00
Kevin Wolf 6b1a044afb block: Default .bdrv_child_perm() for format drivers
Almost all format drivers have the same characteristics as far as
permissions are concerned: They have one or more children for storing
their own data and, more importantly, metadata (can be written to and
grow even without external write requests, must be protected against
other writers and present consistent data) and optionally a backing file
(this is just data, so like for a filter, it only depends on what the
parent nodes need).

This provides a default implementation that can be shared by most of
our format drivers.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2017-02-28 20:40:36 +01:00
Kevin Wolf 6a1b9ee152 block: Default .bdrv_child_perm() for filter drivers
Most filters need permissions related to read and write for their
children, but only if the node has a parent that wants to use the same
operation on the filter. The same is true for resize.

This adds a default implementation that simply forwards all necessary
permissions to all children of the node and leaves the other permissions
unchanged.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2017-02-28 20:40:36 +01:00
Kevin Wolf 33a610c398 block: Involve block drivers in permission granting
In many cases, the required permissions of one node on its children
depend on what its parents require from it. For example, the raw format
or most filter drivers only need to request consistent reads if that's
something that one of their parents wants.

In order to achieve this, this patch introduces two new BlockDriver
callbacks. The first one lets drivers first check (recursively) whether
the requested permissions can be set; the second one actually sets the
new permission bitmask.

Also add helper functions that drivers can use in their implementation
of the callbacks to update their permissions on a specific child.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2017-02-28 20:40:36 +01:00
Kevin Wolf d5e6f437c5 block: Let callers request permissions when attaching a child node
When attaching a node as a child to a new parent, the required and
shared permissions for this parent are checked against all other parents
of the node now, and an error is returned if there is a conflict.

This allows error returns to a function that previously always
succeeded, and the same is true for quite a few callers and their
callers. Converting all of them within the same patch would be too much,
so for now everyone tells that they don't need any permissions and allow
everyone else to do anything. This way we can use &error_abort initially
and convert caller by caller to pass actual permission requirements and
implement error handling.

All these places are marked with FIXME comments and it will be the job
of the next patches to clean them up again.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
2017-02-28 20:40:36 +01:00
Paolo Bonzini 91bcea4899 block: document fields protected by AioContext lock
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170213135235.12274-19-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-02-21 11:39:40 +00:00
Paolo Bonzini 8f90b5e91d block: get rid of bdrv_io_unplugged_begin/end
bdrv_io_plug and bdrv_io_unplug are only called (via their
BlockBackend equivalents) after starting asynchronous I/O.
bdrv_drain is not going to be called while they are running,
because---even if a coroutine runs for some reason---it will
only drain in the next iteration of the event loop through
bdrv_co_yield_to_drain.

So this mechanism is unnecessary, get rid of it.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20161129113334.605-1-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-01-16 13:25:17 +00:00
Eric Blake c1bb86cd8a block: Rename raw-{posix,win32} to file-*.c
These files deal with the file protocol, not the raw format (the
file protocol is often used with other formats, and the raw
format is not forced to use the file protocol).  Rename things
to make it a bit easier to follow.

Suggested-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-01-09 13:30:53 +01:00
John Snow 111049a4ec blockjob: refactor backup_start as backup_job_create
Refactor backup_start as backup_job_create, which only creates the job,
but does not automatically start it. The old interface, 'backup_start',
is not kept in favor of limiting the number of nearly-identical interfaces
that would have to be edited to keep up with QAPI changes in the future.

Callers that wish to synchronously start the backup_block_job can
instead just call block_job_start immediately after calling
backup_job_create.

Transactions are updated to use the new interface, calling block_job_start
only during the .commit phase, which helps prevent race conditions where
jobs may finish before we even finish building the transaction. This may
happen, for instance, during empty block backup jobs.

Reported-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 1478587839-9834-6-git-send-email-jsnow@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-11-14 22:47:34 -05:00
John Snow 8254b6d953 blockjob: centralize QMP event emissions
There's no reason to leave this to blockdev; we can do it in blockjobs
directly and get rid of an extra callback for most users.

All non-internal events, even those created outside of QMP, will
consistently emit events.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 1477584421-1399-5-git-send-email-jsnow@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-11-01 07:55:57 -04:00
John Snow 47970dfb0a Replication/Blockjobs: Create replication jobs as internal
Bubble up the internal interface to commit and backup jobs, then switch
replication tasks over to using this methodology.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 1477584421-1399-4-git-send-email-jsnow@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-11-01 07:55:57 -04:00
Paolo Bonzini c9d1a56174 block: only call aio_poll on the current thread's AioContext
aio_poll is not thread safe; for example bdrv_drain can hang if
the last in-flight I/O operation is completed in the I/O thread after
the main thread has checked bs->in_flight.

The bug remains latent as long as all of it is called within
aio_context_acquire/aio_context_release, but this will change soon.

To fix this, if bdrv_drain is called from outside the I/O thread,
signal the main AioContext through a dummy bottom half.  The event
loop then only runs in the I/O thread.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1477565348-5458-18-git-send-email-pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
2016-10-28 21:50:18 +08:00
Paolo Bonzini 9972354856 block: add BDS field to count in-flight requests
Unlike tracked_requests, this field also counts throttled requests,
and remains non-zero if an AIO operation needs a BH to be "really"
completed.

With this change, it is no longer necessary to have a dummy
BdrvTrackedRequest for requests that are never serialising, and
it is no longer necessary to poll the AioContext once after
bdrv_requests_pending(bs) returns false.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-Id: <1477565348-5458-5-git-send-email-pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
2016-10-28 21:50:18 +08:00
Kevin Wolf 16a389dc9e block: Introduce .bdrv_co_ioctl() driver callback
This allows drivers to implement ioctls in a coroutine-based way.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-10-27 19:05:23 +02:00
Yaowei Bai e7e4f9f950 block: mirror: fix wrong comment of mirror_start
Obviously, we should write to '@target'.

Signed-off-by: Yaowei Bai <baiyaowei@cmss.chinamobile.com>
Reviewed-by: Xiubo Li <lixiubo@cmss.chinamobile.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1473851019-7005-2-git-send-email-baiyaowei@cmss.chinamobile.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-28 11:21:46 +01:00
Ladi Prosek d4b84d564e Remove unused function declarations
Unused function declarations were found using a simple gcc plugin and
manually verified by grepping the sources.

Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2016-09-15 15:32:22 +03:00
Wen Congyang b49f7ead8d mirror: auto complete active commit
Auto complete mirror job in background to prevent from
blocking synchronously

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Signed-off-by: Wang WeiWei <wangww.fnst@cn.fujitsu.com>
Message-id: 1469602913-20979-7-git-send-email-xiecl.fnst@cn.fujitsu.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-13 11:00:56 +01:00
Pavel Butsykin 13b9414b57 drive-backup: added support for data compression
The idea is simple - backup is "written-once" data. It is written block
by block and it is large enough. It would be nice to save storage
space and compress it.

The patch adds a flag to the qmp/hmp drive-backup command which enables
block compression. Compression should be implemented in the format driver
to enable this feature.

There are some limitations of the format driver to allow compressed writes.
We can write data only once. Though for backup this is perfectly fine.
These limitations are maintained by the driver and the error will be
reported if we are doing something wrong.

Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Jeff Cody <jcody@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
CC: Eric Blake <eblake@redhat.com>
CC: John Snow <jsnow@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-09-05 19:06:48 +02:00
Pavel Butsykin 35fadca80e block: remove BlockDriver.bdrv_write_compressed
There are no block drivers left that implement the old
.bdrv_write_compressed interface, so it can be removed. Also now we have
no need to use the bdrv_pwrite_compressed function and we can remove it
entirely.

Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Jeff Cody <jcody@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
CC: Eric Blake <eblake@redhat.com>
CC: John Snow <jsnow@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-09-05 19:06:48 +02:00
Pavel Butsykin 29a298af9d block/io: reuse bdrv_co_pwritev() for write compressed
For bdrv_pwrite_compressed() it looks like most of the code creating
coroutine is duplicated in bdrv_prwv_co(). So we can just add a flag
(BDRV_REQ_WRITE_COMPRESSED) and use bdrv_prwv_co() as a generic one.
In the end we get coroutine oriented function for write compressed by using
bdrv_co_pwritev/blk_co_pwritev with BDRV_REQ_WRITE_COMPRESSED flag.

Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Jeff Cody <jcody@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
CC: Eric Blake <eblake@redhat.com>
CC: John Snow <jsnow@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-09-05 19:06:48 +02:00
Evgeny Yakovlev ce83ee57f6 block: fix deadlock in bdrv_co_flush
The following commit
    commit 3ff2f67a7c
    Author: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
    Date:   Mon Jul 18 22:39:52 2016 +0300
    block: ignore flush requests when storage is clean
has introduced a regression.

There is a problem that it is still possible for 2 requests to execute
in non sequential fashion and sometimes this results in a deadlock
when bdrv_drain_one/all are called for BDS with such stalled requests.

1. Current flushed_gen and flush_started_gen is 1.
2. Request 1 enters bdrv_co_flush to with write_gen 1 (i.e. the same
   as flushed_gen). It gets past flushed_gen != flush_started_gen and
   sets flush_started_gen to 1 (again, the same it was before).
3. Request 1 yields somewhere before exiting bdrv_co_flush
4. Request 2 enters bdrv_co_flush with write_gen 2. It gets past
   flushed_gen != flush_started_gen and sets flush_started_gen to 2.
5. Request 2 runs to completion and sets flushed_gen to 2
6. Request 1 is resumed, runs to completion and sets flushed_gen to 1.
   However flush_started_gen is now 2.

From here on out flushed_gen is always != to flush_started_gen and all
further requests will wait on flush_queue. This change replaces
flush_started_gen with an explicitly tracked active flush request.

Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Message-id: 1471457214-3994-2-git-send-email-den@openvz.org
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <famz@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-08-18 14:36:49 +01:00
Eric Blake b8d0a9804d block: Cater to iscsi with non-power-of-2 discard
Dell Equallogic iSCSI SANs have a very unusual advertised geometry:

$ iscsi-inq -e 1 -c $((0xb0)) iscsi://XXX/0
wsnz:0
maximum compare and write length:1
optimal transfer length granularity:0
maximum transfer length:0
optimal transfer length:0
maximum prefetch xdread xdwrite transfer length:0
maximum unmap lba count:30720
maximum unmap block descriptor count:2
optimal unmap granularity:30720
ugavalid:1
unmap granularity alignment:0
maximum write same length:30720

which says that both the maximum and the optimal discard size
is 15M.  It is not immediately apparent if the device allows
discard requests not aligned to the optimal size, nor if it
allows discards at a finer granularity than the optimal size.

I tried to find details in the SCSI Commands Reference Manual
Rev. A on what valid values of maximum and optimal sizes are
permitted, but while that document mentions a "Block Limits
VPD Page", I couldn't actually find documentation of that page
or what values it would have, or if a SCSI device has an
advertisement of its minimal unmap granularity.  So it is not
obvious to me whether the Dell Equallogic device is compliance
with the SCSI specification.

Fortunately, it is easy enough to support non-power-of-2 sizing,
even if it means we are less efficient than truly possible when
targetting that device (for example, it means that we refuse to
unmap anything that is not a multiple of 15M and aligned to a
15M boundary, even if the device truly does support a smaller
granularity where unmapping actually works).

Reported-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1469129688-22848-5-git-send-email-eblake@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-08-03 18:44:57 +02:00
Peter Maydell 61ead113ae Pull request
v2:
  * Resolved merge conflict with block/iscsi.c [Peter]
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXj6TkAAoJEJykq7OBq3PI15oIAL24OFnMTFOAWyY2h3yfzGwe
 Gn1qzFOTHCXgbgVsolVuZnFU/SDn9WzMS9y6o7jcJykv++FdMSYUO7paPQAg9etX
 o9KOgyfUUDhskbaXEbTKxhqy7UvcyJsZYmjN+TD5tupfnrqGlmu6rkUPpSwGo++G
 u7CkAZewJc5SvsCdQRe3N10LbG4xU/j06cjCIHWDPRr+AV9qvsb9KkIOg3wkrhWN
 R83yj+yX6vhfQouHLoOlh3RYZ5HNo020JPum0jT1tGcoshKoFyu0prSRqNVVMqTY
 BepvWMhXbiq219HnFaQBtnysve9gWix4WhuCHtaAlaZDTBh945ZztsW7w7dubtA=
 =7ics
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging

Pull request

v2:
 * Resolved merge conflict with block/iscsi.c [Peter]

# gpg: Signature made Wed 20 Jul 2016 17:20:52 BST
# gpg:                using RSA key 0x9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>"
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35  775A 9CA4 ABB3 81AB 73C8

* remotes/stefanha/tags/block-pull-request: (25 commits)
  raw_bsd: Convert to byte-based interface
  nbd: Convert to byte-based interface
  block: Kill .bdrv_co_discard()
  sheepdog: Switch .bdrv_co_discard() to byte-based
  raw_bsd: Switch .bdrv_co_discard() to byte-based
  qcow2: Switch .bdrv_co_discard() to byte-based
  nbd: Switch .bdrv_co_discard() to byte-based
  iscsi: Switch .bdrv_co_discard() to byte-based
  gluster: Switch .bdrv_co_discard() to byte-based
  blkreplay: Switch .bdrv_co_discard() to byte-based
  block: Add .bdrv_co_pdiscard() driver callback
  block: Convert .bdrv_aio_discard() to byte-based
  rbd: Switch rbd_start_aio() to byte-based
  raw-posix: Switch paio_submit() to byte-based
  block: Convert BB interface to byte-based discards
  block: Convert bdrv_aio_discard() to byte-based
  block: Switch BlockRequest to byte-based
  block: Convert bdrv_discard() to byte-based
  block: Convert bdrv_co_discard() to byte-based
  iscsi: Rely on block layer to break up large requests
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Conflicts:
	block/gluster.c
2016-07-21 11:00:36 +01:00
Eric Blake 02aefe43cb block: Kill .bdrv_co_discard()
Now that all drivers have a byte-based .bdrv_co_pdiscard(), we
no longer need to worry about the sector-based version.  We can
also relax our minimum alignment to 1 for drivers that support it.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-18-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20 14:24:25 +01:00
Eric Blake 47a5486d59 block: Add .bdrv_co_pdiscard() driver callback
There's enough drivers with a sector-based callback that it will
be easier to switch one at a time.  This patch adds a byte-based
callback, and then after all drivers are swapped, we'll drop the
sector-based callback.

[checkpatch doesn't like the space after coroutine_fn in
block_int.h, but it's consistent with the rest of the file]

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-10-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20 14:11:55 +01:00
Eric Blake 4da444a0bb block: Convert .bdrv_aio_discard() to byte-based
Another step towards byte-based interfaces everywhere.  Replace
the sector-based driver callback .bdrv_aio_discard() with a new
byte-based .bdrv_aio_pdiscard().  Only raw-posix and RBD drivers
are affected, so it was not worth splitting into multiple patches.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-9-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20 14:11:55 +01:00
Denis V. Lunev 6d07859926 dirty-bitmap: operate with int64_t amount
Underlying HBitmap operates even with uint64_t. Thus this change is safe.
This would be useful f.e. to mark entire bitmap dirty in one call.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1468503209-19498-2-git-send-email-den@openvz.org
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Jeff Cody <jcody@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-07-19 16:54:46 -04:00
Evgeny Yakovlev 3ff2f67a7c block: ignore flush requests when storage is clean
Some guests (win2008 server for example) do a lot of unnecessary
flushing when underlying media has not changed. This adds additional
overhead on host when calling fsync/fdatasync.

This change introduces a write generation scheme in BlockDriverState.
Current write generation is checked against last flushed generation to
avoid unnessesary flushes.

The problem with excessive flushing was found by a performance test
which does parallel directory tree creation (from 2 processes).
Results improved from 0.424 loops/sec to 0.432 loops/sec.
Each loop creates 10^3 directories with 10 files in each.

This affected some blkdebug testcases that were expecting error logs from
failure-injected flushes which are now skipped entirely
(tests 026 071 089).

This also affects the performance of block jobs and thus BLOCK_JOB_READY
events for driver-mirror and active block-commit commands now arrives
faster, before QMP send successfully returns to caller (tests 141 144).

Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1468870792-7411-5-git-send-email-den@openvz.org
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <famz@redhat.com>
CC: John Snow <jsnow@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
2016-07-18 18:19:01 -04:00
Alberto Garcia fd62c609ed commit: Add 'job-id' parameter to 'block-commit'
This patch adds a new optional 'job-id' parameter to 'block-commit',
allowing the user to specify the ID of the block job to be created.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-13 13:26:02 +02:00
Alberto Garcia 2323322ed0 stream: Add 'job-id' parameter to 'block-stream'
This patch adds a new optional 'job-id' parameter to 'block-stream',
allowing the user to specify the ID of the block job to be created.

The HMP 'block_stream' command remains unchanged.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-13 13:26:02 +02:00
Alberto Garcia 70559d499c backup: Add 'job-id' parameter to 'blockdev-backup' and 'drive-backup'
This patch adds a new optional 'job-id' parameter to 'blockdev-backup'
and 'drive-backup', allowing the user to specify the ID of the block
job to be created.

The HMP 'drive_backup' command remains unchanged.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-13 13:26:02 +02:00
Alberto Garcia 71aa98678c mirror: Add 'job-id' parameter to 'blockdev-mirror' and 'drive-mirror'
This patch adds a new optional 'job-id' parameter to 'blockdev-mirror'
and 'drive-mirror', allowing the user to specify the ID of the block
job to be created.

The HMP 'drive_mirror' command remains unchanged.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-13 13:26:02 +02:00
Alberto Garcia 29338003c9 stream: Fix prototype of stream_start()
'stream-start' has a parameter called 'backing-file', which is the
string to be written to bs->backing when the job finishes.

In the stream_start() implementation it is called 'backing_file_str',
but it the prototype in the header file it is called 'base_id'.

This patch fixes it so the name is the same in both cases and is
consistent with other cases (like commit_start()).

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-13 13:26:02 +02:00
Kevin Wolf a03ef88f77 block: Convert bdrv_co_preadv/pwritev to BdrvChild
This is the final patch for converting the common I/O path to take
a BdrvChild parameter instead of BlockDriverState.

The completion of this conversion means that all users that perform I/O
on an image need to actually hold a reference (in the form of BdrvChild,
possible as part of a BlockBackend) to that image. This also protects
against inconsistent use of BlockBackend vs. BlockDriverState functions
because direct use of a BlockDriverState isn't possible any more and
blk->root is private for block-backends.c.

In addition, we can now distinguish different users in the I/O path,
and the future op blockers work is going to add assertions based on
permissions stored in BdrvChild.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-05 16:46:27 +02:00
Eric Blake 5411541270 block: Use bool as appropriate for BDS members
Using int for values that are only used as booleans is confusing.
While at it, rearrange a couple of members so that all the bools
are contiguous.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-05 16:46:26 +02:00
Eric Blake a5b8dd2ce8 block: Move request_alignment into BlockLimit
It makes more sense to have ALL block size limit constraints
in the same struct.  Improve the documentation while at it.

Simplify a couple of conditionals, now that we have audited and
documented that request_alignment is always non-zero.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-05 16:46:26 +02:00
Eric Blake b9f7855a50 block: Switch discard length bounds to byte-based
Sector-based limits are awkward to think about; in our on-going
quest to move to byte-based interfaces, convert max_discard and
discard_alignment.  Rename them, using 'pdiscard' as an aid to
track which remaining discard interfaces need conversion, and so
that the compiler will help us catch the change in semantics
across any rebased code.  The BlockLimits type is now completely
byte-based; and in iscsi.c, sector_limits_lun2qemu() is no
longer needed.

pdiscard_alignment is made unsigned (we use power-of-2 alignments
as bitmasks, where unsigned is easier to think about) while
leaving max_pdiscard signed (since we still have an 'int'
interface); this is comparable to what commit cf081fc did for
write zeroes limits.  We may later want to make everything an
unsigned 64-bit limit - but that requires a bigger code audit.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-05 16:46:25 +02:00
Eric Blake 29cc6a6834 block: Wording tweaks to write zeroes limits
Improve the documentation of the write zeroes limits, to mention
additional constraints that drivers should observe.  Worth squashing
into commit cf081fca, if that hadn't been pushed already :)

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-05 16:46:25 +02:00
Eric Blake 5def6b80e1 block: Switch transfer length bounds to byte-based
Sector-based limits are awkward to think about; in our on-going
quest to move to byte-based interfaces, convert max_transfer_length
and opt_transfer_length.  Rename them (dropping the _length suffix)
so that the compiler will help us catch the change in semantics
across any rebased code, and improve the documentation.  Use unsigned
values, so that we don't have to worry about negative values and
so that bit-twiddling is easier; however, we are still constrained
by 2^31 of signed int in most APIs.

When a value comes from an external source (iscsi and raw-posix),
sanitize the results to ensure that opt_transfer is a power of 2.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-05 16:46:25 +02:00
Stefan Hajnoczi e8a095dadb block: use safe iteration over AioContext notifiers
It's possible that an AioContext notifier user was close to finishing
when .detach_aio_context() or .attached_aio_context() is called.  In
that case they may call bdrv_remove_aio_context_notifier() during the
callback.

Use safe iteration to avoid crashing when the notifier list is modified
during iteration.  We must not only handle the case where the current
aio notifier is removed during a callback but also the one where any
other aio notifier is removed.

The next patch adds an AioContext notifier for block jobs and they
really could be terminating just as .detach_aio_context() is invoked.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1466096189-6477-6-git-send-email-stefanha@redhat.com
2016-06-20 14:25:41 +01:00
Max Reitz 274fccee2b block/mirror: Fix target backing BDS
Currently, we are trying to move the backing BDS from the source to the
target in bdrv_replace_in_backing_chain() which is called from
mirror_exit(). However, mirror_complete() already tries to open the
target's backing chain with a call to bdrv_open_backing_file().

First, we should only set the target's backing BDS once. Second, the
mirroring block job has a better idea of what to set it to than the
generic code in bdrv_replace_in_backing_chain() (in fact, the latter's
conditions on when to move the backing BDS from source to target are not
really correct).

Therefore, remove that code from bdrv_replace_in_backing_chain() and
leave it to mirror_complete().

Depending on what kind of mirroring is performed, we furthermore want to
use different strategies to open the target's backing chain:

- If blockdev-mirror is used, we can assume the user made sure that the
  target already has the correct backing chain. In particular, we should
  not try to open a backing file if the target does not have any yet.

- If drive-mirror with mode=absolute-paths is used, we can and should
  reuse the already existing chain of nodes that the source BDS is in.
  In case of sync=full, no backing BDS is required; with sync=top, we
  just link the source's backing BDS to the target, and with sync=none,
  we use the source BDS as the target's backing BDS.
  We should not try to open these backing files anew because this would
  lead to two BDSs existing per physical file in the backing chain, and
  we would like to avoid such concurrent access.

- If drive-mirror with mode=existing is used, we have to use the
  information provided in the physical image file which means opening
  the target's backing chain completely anew, just as it has been done
  already.
  If the target's backing chain shares images with the source, this may
  lead to multiple BDSs per physical image file. But since we cannot
  reliably ascertain this case, there is nothing we can do about it.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20160610185750.30956-3-mreitz@redhat.com
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-06-16 15:20:37 +02:00
Kevin Wolf c9d20029f4 block: Remove bs->zero_beyond_eof
It is always true for open images now.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-06-16 15:19:56 +02:00
Kevin Wolf 1a8ae82217 block: Make bdrv_load/save_vmstate coroutine_fns
This allows drivers to share code between normal I/O and vmstate
accesses.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-06-16 15:19:56 +02:00
Kevin Wolf 5ddda0b8f0 block: Make .bdrv_load_vmstate() vectored
This brings it in line with .bdrv_save_vmstate().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-06-16 15:19:55 +02:00
Eric Blake c1499a5e73 block: Kill bdrv_co_write_zeroes()
Now that all drivers have been converted to a byte interface,
we no longer need a sector interface.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-06-08 10:21:08 +02:00
Eric Blake d05aa8bb4a block: Add .bdrv_co_pwrite_zeroes()
Update bdrv_co_do_write_zeroes() to be byte-based, and select
between the new byte-based bdrv_co_pwrite_zeroes() or the old
bdrv_co_write_zeroes().  The next patches will convert drivers,
then remove the old interface.

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-06-08 10:21:08 +02:00
Eric Blake cf081fca4e block: Track write zero limits in bytes
Another step towards removing sector-based interfaces: convert
the maximum write and minimum alignment values from sectors to
bytes.  Rename the variables to let the compiler check that all
users are converted to the new semantics.

The maximum remains an int as long as BDRV_REQUEST_MAX_SECTORS
is constrained by INT_MAX (this means that we can't even
support a 2G write_zeroes, but just under it) - changing
operation lengths to unsigned or to 64-bits is a much bigger
audit, and debatable if we even want to do it (since at the
core, a 32-bit platform will still have ssize_t as its
underlying limit on write()).

Meanwhile, alignment is changed to 'uint32_t', since it makes no
sense to have an alignment larger than the maximum write, and
less painful to use an unsigned type with well-defined behavior
in bit operations than to have to worry about what happens if
a driver mistakenly supplies a negative alignment.

Add an assert that no one was trying to use sectors to get a
write zeroes larger than 2G, and therefore that a later conversion
to bytes won't be impacted by keeping the limit at 32 bits.

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-06-08 10:21:08 +02:00
Kevin Wolf 36fe13317b block: Fix reconfiguring graph with drained nodes
When changing the BlockDriverState that a BdrvChild points to while the
node is currently drained, we must call the .drained_end() parent
callback. Conversely, when this means attaching a new node that is
already drained, we need to call .drained_begin().

bdrv_root_attach_child() takes now an opaque parameter, which is needed
because the callbacks must also be called if we're attaching a new child
to the BlockBackend when the root node is already drained, and they need
a way to identify the BlockBackend. Previously, child->opaque was set
too late and the callbacks would still see it as NULL.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-25 19:04:10 +02:00
Kevin Wolf 1f0c461b82 block: Remove BlockDriverState.blk
This patch removes the remaining users of bs->blk, which will allow us
to have multiple BBs on top of a single BDS. In the meantime, all checks
that are currently in place to prevent the user from creating such
setups can be switched to bdrv_has_blk() instead of accessing BDS.blk.

Future patches can allow them and e.g. enable users to mirror to a block
device that already has a BlockBackend on it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-05-19 16:45:31 +02:00
Kevin Wolf 4c265bf9f4 block: User BdrvChild callback for device name
In order to get rid of bs->blk for bdrv_get_device_name() and
bdrv_get_device_or_node_name(), ask all parents for their name and
simply pick the first one.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-05-19 16:45:31 +02:00
Kevin Wolf 5c8cab4808 block: Use BdrvChild callbacks for change_media/resize
We want to get rid of BlockDriverState.blk in order to allow multiple
BlockBackends per BDS. Converting the device callbacks in block.c (which
assume a single BlockBackend) to per-child callbacks gets us rid of the
first few instances.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-05-19 16:45:31 +02:00
Kevin Wolf 7ca7f0f6db block: Decouple throttling from BlockDriverState
This moves the throttling related part of the BDS life cycle management
to BlockBackend. The throttling group reference is now kept even when no
medium is inserted.

With this commit, throttling isn't disabled and then re-enabled any more
during graph reconfiguration. This fixes the temporary breakage of I/O
throttling when used with live snapshots or block jobs that manipulate
the graph.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-19 16:45:30 +02:00
Kevin Wolf c2066af051 block: Drain throttling queue with BdrvChild callback
This removes the last part of I/O throttling from block/io.c and moves
it to the BlockBackend.

Instead of having knowledge about throttling inside io.c, we can call a
BdrvChild callback .drained_begin/end, which happens to drain the
throttled requests for BlockBackend parents.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-19 16:45:30 +02:00
Kevin Wolf 22aa8b246a block: Introduce BdrvChild.opaque
BlockBackends use it to get a back pointer from BdrvChild to
BlockBackend in any BdrvChildRole callbacks.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-19 16:45:30 +02:00
Kevin Wolf 97148076e8 block: Move I/O throttling configuration functions to BlockBackend
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-19 16:45:30 +02:00
Kevin Wolf 27ccdd5259 block: Move throttling fields from BDS to BB
This patch changes where the throttling state is stored (used to be the
BlockDriverState, now it is the BlockBackend), but it doesn't actually
make it a BB level feature yet. For example, throttling is still
disabled when the BDS is detached from the BB.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-19 16:45:30 +02:00
Kevin Wolf 31dce3ccca block: throttle-groups: Use BlockBackend pointers internally
As a first step towards moving I/O throttling to the BlockBackend level,
this patch changes all pointers in struct ThrottleGroup from referencing
a BlockDriverState to referencing a BlockBackend.

This change is valid because we made sure that throttling can only be
enabled on BDSes which have a BB attached.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-19 16:45:29 +02:00
Wen Congyang e06018ad28 Add new block driver interface to add/delete a BDS's child
In some cases, we want to take a quorum child offline, and take
another child online.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 1462865799-19402-2-git-send-email-xiecl.fnst@cn.fujitsu.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-05-12 15:33:23 +02:00
Eric Blake 465fe887cc block: Honor BDRV_REQ_FUA during write_zeroes
The block layer has a couple of cases where it can lose
Force Unit Access semantics when writing a large block of
zeroes, such that the request returns before the zeroes
have been guaranteed to land on underlying media.

SCSI does not support FUA during WRITESAME(10/16); FUA is only
supported if it falls back to WRITE(10/16).  But where the
underlying device is new enough to not need a fallback, it
means that any upper layer request with FUA semantics was
silently ignoring BDRV_REQ_FUA.

Conversely, NBD has situations where it can support FUA but not
ZERO_WRITE; when that happens, the generic block layer fallback
to bdrv_driver_pwritev() (or the older bdrv_co_writev() in qemu
2.6) was losing the FUA flag.

The problem of losing flags unrelated to ZERO_WRITE has been
latent in bdrv_co_do_write_zeroes() since commit aa7bfbff, but
back then, it did not matter because there was no FUA flag.  It
became observable when commit 93f5e6d8 paved the way for flags
that can impact correctness, when we should have been using
bdrv_co_writev_flags() with modified flags.  Compare to commit
9eeb6dd, which got flag manipulation right in
bdrv_co_do_zero_pwritev().

Symptoms: I tested with qemu-io with default writethrough cache
(which is supposed to use FUA semantics on every write), and
targetted an NBD client connected to a server that intentionally
did not advertise NBD_FLAG_SEND_FUA.  When doing 'write 0 512',
the NBD client sent two operations (NBD_CMD_WRITE then
NBD_CMD_FLUSH) to get the fallback FUA semantics; but when doing
'write -z 0 512', the NBD client sent only NBD_CMD_WRITE.

The fix is do to a cleanup bdrv_co_flush() at the end of the
operation if any step in the middle relied on a BDS that does
not natively support FUA for that step (note that we don't
need to flush after every operation, if the operation is broken
into chunks based on bounce-buffer sizing).  Each BDS gains a
new flag .supported_zero_flags, which parallels the use of
.supported_write_flags but only when accessing a zero write
operation (the flags MUST be different, because of SCSI having
different semantics based on WRITE vs. WRITESAME; and also
because BDRV_REQ_MAY_UNMAP only makes sense on zero writes).

Also fix some documentation to describe -ENOTSUP semantics,
particularly since iscsi depends on those semantics.

Down the road, we may want to add a driver where its
.bdrv_co_pwritev() honors all three of BDRV_REQ_FUA,
BDRV_REQ_ZERO_WRITE, and BDRV_REQ_MAY_UNMAP, and advertise
this via bs->supported_write_flags for blocks opened by that
driver; such a driver should NOT supply .bdrv_co_write_zeroes
nor .supported_zero_flags.  But none of the drivers touched
in this patch want to do that (the act of writing zeroes is
different enough from normal writes to deserve a second
callback).

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:09 +02:00
Eric Blake 4df863f336 block: Make supported_write_flags a per-bds property
Pre-patch, .supported_write_flags lives at the driver level, which
means we are blindly declaring that all block devices using a
given driver will either equally support FUA, or that we need a
fallback at the block layer.  But there are drivers where FUA
support is a per-block decision: the NBD block driver is dependent
on the remote server advertising NBD_FLAG_SEND_FUA (and has
fallback code to duplicate the flush that the block layer would do
if NBD had not set .supported_write_flags); and the iscsi block
driver is dependent on the mode sense bits advertised by the
underlying device (and is currently silently ignoring FUA requests
if the underlying device does not support FUA).

The fix is to make supported flags as a per-BDS option, set during
.bdrv_open().  This patch moves the variable and fixes NBD and iscsi
to set it only conditionally; later patches will then further
simplify the NBD driver to quit duplicating work done at the block
layer, as well as tackle the fact that SCSI does not support FUA
semantics on WRITESAME(10/16) but only on WRITE(10/16).

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:09 +02:00
Janne Karhunen f249924e96 Allow users to specify the vmdk virtual hardware version.
Vmdk images have metadata to indicate the vmware virtual
hardware version image was created/tested to run with.
Allow users to specify that version via new 'hwversion'
option.

[ kwolf: Adjust qemu-iotests common.filter ]

Signed-off-by: Janne Karhunen <Janne.Karhunen@gmail.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:08 +02:00
Kevin Wolf e3ddef25e9 block: Remove BlockDriver.bdrv_read/write
There are no block drivers left that implement the old .bdrv_read/write
interface, so it can be removed now. This gets us rid of the
corresponding emulation functions, too.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-12 15:22:08 +02:00
Kevin Wolf 3fb06697ae block: Introduce .bdrv_co_preadv/pwritev BlockDriver function
Many parts of the block layer are already byte granularity. The block
driver interface, however, was still missing an interface that allows
making use of this. This patch introduces a new BlockDriver interface,
which is based on coroutines, vectored, has flags and uses a byte
granularity. This is now the preferred interface for new drivers.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-12 15:22:08 +02:00
Kevin Wolf cab3a3563c block: Rename bdrv_co_do_preadv/writev to bdrv_co_preadv/writev
It used to be an internal helper function just for implementing
bdrv_co_do_readv/writev(), but now that it's a public interface, it
deserves a name without "do" in it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-12 15:22:08 +02:00
Paolo Bonzini 6b98bd6495 block: plug whole tree at once, introduce bdrv_io_unplugged_begin/end
Extract the handling of io_plug "depth" from linux-aio.c and let the
main bdrv_drain loop do nothing but wait on I/O.

Like the two newly introduced functions, bdrv_io_plug and bdrv_io_unplug
now operate on all children.  The visit order is now symmetrical between
plug and unplug, making it possible for formats to implement plug/unplug.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:07 +02:00
Paolo Bonzini ce0f141259 block: introduce bdrv_no_throttling_begin/end
Extract the handling of throttling from bdrv_flush_io_queue.  These
new functions will soon become BdrvChildRole callbacks, as they can
be generalized to "beginning of drain" and "end of drain".

Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:07 +02:00
Kevin Wolf 93f5e6d88a block: Introduce bdrv_co_writev_flags()
This function will allow drivers to implement BDRV_REQ_FUA natively
instead of sending a separate flush after the write.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30 12:16:02 +02:00
Kevin Wolf bfd18d1e0b block: Move enable_write_cache to BB level
Whether a write cache is used or not is a decision that concerns the
user (e.g. the guest device) rather than the backend. It was already
logically part of the BB level as bdrv_move_feature_fields() always kept
it on top of the BDS tree; with this patch, the core of it (the actual
flag and the additional flushes) is also implemented there.

Direct callers of bdrv_open() must pass BDRV_O_CACHE_WB now if bs
doesn't have a BlockBackend attached.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30 12:16:02 +02:00
Pavel Dovgalyuk c32b82afaf block: add flush callback
This patch adds callback for flush request. This callback is responsible
for flushing whole block devices stack. bdrv_flush function does not
proceed to underlying devices. It should be performed by this callback
function, if needed.

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-30 12:12:15 +02:00