Commit Graph

3739 Commits

Author SHA1 Message Date
Alberto Garcia 13f893c47d qcow2: Rename l2_table in count_contiguous_clusters()
This function doesn't need any changes to support L2 slices, but since
it's now dealing with slices intead of full tables, the l2_table
variable is renamed for clarity.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 812b0c3505bb1687e51285dccf1a94f0cecb1f74.1517840877.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-02-13 17:00:00 +01:00
Alberto Garcia e4e7254829 qcow2: Rename l2_table in qcow2_alloc_compressed_cluster_offset()
This function doesn't need any changes to support L2 slices, but since
it's now dealing with slices instead of full tables, the l2_table
variable is renamed for clarity.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 0c5d4b9bf163aa3b49ec19cc512a50d83563f2ad.1517840877.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-02-13 17:00:00 +01:00
Alberto Garcia 13bec229d8 qcow2: Update qcow2_truncate() to support L2 slices
The qcow2_truncate() code is mostly independent from whether
we're using L2 slices or full L2 tables, but in full and
falloc preallocation modes new L2 tables are allocated using
qcow2_alloc_cluster_link_l2().  Therefore the code needs to be
modified to ensure that all nb_clusters that are processed in each
call can be allocated with just one L2 slice.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 1fd7d272b5e7b66254a090b74cf2bed1cc334c0e.1517840877.git.berto@igalia.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-02-13 17:00:00 +01:00
Alberto Garcia 415184f520 qcow2: Update expand_zero_clusters_in_l1() to support L2 slices
expand_zero_clusters_in_l1() expands zero clusters as a necessary step
to downgrade qcow2 images to a version that doesn't support metadata
zero clusters. This function takes an L1 table (which may or may not
be active) and iterates over all its L2 tables looking for zero
clusters.

Since we'll be loading L2 slices instead of full tables we need to add
an extra loop that iterates over all slices of each L2 table, and we
should also use the slice size when allocating the buffer used when
the L1 table is not active.

This function doesn't need any additional changes so apart from that
this patch simply updates the variable name from l2_table to l2_slice.

Finally, and since we have to touch the bdrv_read() / bdrv_write()
calls anyway, this patch takes the opportunity to replace them with
the byte-based bdrv_pread() / bdrv_pwrite().

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 43590976f730501688096cff103f2923b72b0f32.1517840877.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-02-13 17:00:00 +01:00
Alberto Garcia 226494ff69 qcow2: Prepare expand_zero_clusters_in_l1() for adding L2 slice support
Adding support for L2 slices to expand_zero_clusters_in_l1() needs
(among other things) an extra loop that iterates over all slices of
each L2 table.

Putting all changes in one patch would make it hard to read because
all semantic changes would be mixed with pure indentation changes.

To make things easier this patch simply creates a new block and
changes the indentation of all lines of code inside it. Thus, all
modifications in this patch are cosmetic. There are no semantic
changes and no variables are renamed yet. The next patch will take
care of that.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: c2ae9f31ed5b6e591477ad4654448badd1c89d73.1517840877.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-02-13 17:00:00 +01:00
Alberto Garcia 9b765486b7 qcow2: Read refcount before L2 table in expand_zero_clusters_in_l1()
At the moment it doesn't really make a difference whether we call
qcow2_get_refcount() before of after reading the L2 table, but if we
want to support L2 slices we'll need to read the refcount first.

This patch simply changes the order of those two operations to prepare
for that. The patch with the actual semantic changes will be easier to
read because of this.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 947a91d934053a2dbfef979aeb9568f57ef57c5d.1517840877.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-02-13 17:00:00 +01:00
Alberto Garcia 83ad165be7 qcow2: Update qcow2_update_snapshot_refcount() to support L2 slices
qcow2_update_snapshot_refcount() increases the refcount of all
clusters of a given snapshot. In order to do that it needs to load all
its L2 tables and iterate over their entries. Since we'll be loading
L2 slices instead of full tables we need to add an extra loop that
iterates over all slices of each L2 table.

This function doesn't need any additional changes so apart from that
this patch simply updates the variable name from l2_table to l2_slice.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 5f4db199b9637f4833b58487135124d70add8cf0.1517840877.git.berto@igalia.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-02-13 17:00:00 +01:00
Alberto Garcia ca62dd5c2b qcow2: Prepare qcow2_update_snapshot_refcount() for adding L2 slice support
Adding support for L2 slices to qcow2_update_snapshot_refcount() needs
(among other things) an extra loop that iterates over all slices of
each L2 table.

Putting all changes in one patch would make it hard to read because
all semantic changes would be mixed with pure indentation changes.

To make things easier this patch simply creates a new block and
changes the indentation of all lines of code inside it. Thus, all
modifications in this patch are cosmetic. There are no semantic
changes and no variables are renamed yet. The next patch will take
care of that.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 8ffaa5e55bd51121f80e498f4045b64902a94293.1517840877.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-02-13 17:00:00 +01:00
Alberto Garcia a9a9f8f0b6 qcow2: Update zero_single_l2() to support L2 slices
zero_single_l2() limits the number of clusters to be zeroed to the
amount that fits inside an L2 table. Since we'll be loading L2 slices
instead of full tables we need to update that limit. The function is
renamed to zero_in_l2_slice() for clarity.

Apart from that, this function doesn't need any additional changes, so
this patch simply updates the variable name from l2_table to l2_slice.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: ebc16e7e79fa6969d8975ef487d679794de4fbcc.1517840877.git.berto@igalia.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-02-13 17:00:00 +01:00
Alberto Garcia 21ab3add9c qcow2: Update discard_single_l2() to support L2 slices
discard_single_l2() limits the number of clusters to be discarded
to the amount that fits inside an L2 table. Since we'll be loading
L2 slices instead of full tables we need to update that limit. The
function is renamed to discard_in_l2_slice() for clarity.

Apart from that, this function doesn't need any additional changes, so
this patch simply updates the variable name from l2_table to l2_slice.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 1cb44a5b68be5334cb01b97a3db3a3c5a43396e5.1517840877.git.berto@igalia.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-02-13 17:00:00 +01:00
Alberto Garcia 6d99a34447 qcow2: Update handle_alloc() to support L2 slices
handle_alloc() loads an L2 table and limits the number of checked
clusters to the amount that fits inside that table. Since we'll be
loading L2 slices instead of full tables we need to update that limit.

Apart from that, this function doesn't need any additional changes, so
this patch simply updates the variable name from l2_table to l2_slice.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: b243299c7136f7014c5af51665431ddbf5e99afd.1517840877.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-02-13 17:00:00 +01:00
Alberto Garcia cde917662a qcow2: Update handle_copied() to support L2 slices
handle_copied() loads an L2 table and limits the number of checked
clusters to the amount that fits inside that table. Since we'll be
loading L2 slices instead of full tables we need to update that limit.

Apart from that, this function doesn't need any additional changes, so
this patch simply updates the variable name from l2_table to l2_slice.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 541ac001a7d6b86bab2392554bee53c2b312148c.1517840877.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-02-13 16:59:59 +01:00
Alberto Garcia a002c0b09d qcow2: Update qcow2_alloc_cluster_link_l2() to support L2 slices
There's a loop in this function that iterates over the L2 entries in a
table, so now we need to assert that it remains within the limits of
an L2 slice.

Apart from that, this function doesn't need any additional changes, so
this patch simply updates the variable name from l2_table to l2_slice.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: f9846a1c2efc51938e877e2a25852d9ab14797ff.1517840877.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-02-13 16:59:59 +01:00
Alberto Garcia fd630039c0 qcow2: Update qcow2_get_cluster_offset() to support L2 slices
qcow2_get_cluster_offset() checks how many contiguous bytes are
available at a given offset. The returned number of bytes is limited
by the amount that can be addressed without having to load more than
one L2 table.

Since we'll be loading L2 slices instead of full tables this patch
changes the limit accordingly using the size of the L2 slice for the
calculations instead of the full table size.

One consequence of this is that with small L2 slices operations such
as 'qemu-img map' will need to iterate in more steps because each
qcow2_get_cluster_offset() call will potentially return a smaller
number. However the code is already prepared for that so this doesn't
break semantics.

The l2_table variable is also renamed to l2_slice to reflect this, and
offset_to_l2_index() is replaced with offset_to_l2_slice_index().

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 6b602260acb33da56ed6af9611731cb7acd110eb.1517840877.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-02-13 16:59:59 +01:00
Alberto Garcia c03bfc5bba qcow2: Update get_cluster_table() to support L2 slices
This patch updates get_cluster_table() to return L2 slices instead of
full L2 tables.

The code itself needs almost no changes, it only needs to call
offset_to_l2_slice_index() instead of offset_to_l2_index(). This patch
also renames all the relevant variables and the documentation.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 64cf064c0021ba315d3f3032da0f95db1b615f33.1517840877.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-02-13 16:59:59 +01:00
Alberto Garcia 05f9ee4689 qcow2: Refactor get_cluster_table()
After the previous patch we're now always using l2_load() in
get_cluster_table() regardless of whether a new L2 table has to be
allocated or not.

This patch refactors that part of the code to use one single l2_load()
call.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: ce31758c4a1fadccea7a6ccb93951eb01d95fd4c.1517840877.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-02-13 16:59:59 +01:00
Alberto Garcia 3861946a5b qcow2: Update l2_allocate() to support L2 slices
This patch updates l2_allocate() to support the qcow2 cache returning
L2 slices instead of full L2 tables.

The old code simply gets an L2 table from the cache and initializes it
with zeroes or with the contents of an existing table. With a cache
that returns slices instead of tables the idea remains the same, but
the code must now iterate over all the slices that are contained in an
L2 table.

Since now we're operating with slices the function can no longer
return the newly-allocated table, so it's up to the caller to retrieve
the appropriate L2 slice after calling l2_allocate() (note that with
this patch the caller is still loading full L2 tables, but we'll deal
with that in a separate patch).

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20fc0415bf0e011e29f6487ec86eb06a11f37445.1517840877.git.berto@igalia.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-02-13 16:59:59 +01:00
Alberto Garcia 6580bb09ab qcow2: Prepare l2_allocate() for adding L2 slice support
Adding support for L2 slices to l2_allocate() needs (among other
things) an extra loop that iterates over all slices of a new L2 table.

Putting all changes in one patch would make it hard to read because
all semantic changes would be mixed with pure indentation changes.

To make things easier this patch simply creates a new block and
changes the indentation of all lines of code inside it. Thus, all
modifications in this patch are cosmetic. There are no semantic
changes and no variables are renamed yet. The next patch will take
care of that.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: d0d7dca8520db304524f52f49d8157595a707a35.1517840877.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-02-13 16:59:59 +01:00
Alberto Garcia e2b5713eb9 qcow2: Update l2_load() to support L2 slices
Each entry in the qcow2 L2 cache stores a full L2 table (which uses a
complete cluster in the qcow2 image). A cluster is usually too large
to be used efficiently as the size for a cache entry, so we want to
decouple both values by allowing smaller cache entries. Therefore the
qcow2 L2 cache will no longer return full L2 tables but slices
instead.

This patch updates l2_load() so it can handle L2 slices correctly.
Apart from the offset of the L2 table (which we already had) we also
need the guest offset in order to calculate which one of the slices
we need.

An L2 slice has currently the same size as an L2 table (one cluster),
so for now this function will load exactly the same data as before.

This patch also removes a stale comment about the return value being
a pointer to the L2 table. This function returns an error code since
55c17e9821.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: b830aa1fc5b6f8e3cb331d006853fe22facca847.1517840877.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-02-13 16:59:59 +01:00
Alberto Garcia 8f81817577 qcow2: Add offset_to_l2_slice_index()
Similar to offset_to_l2_index(), this function takes a guest offset
and returns the index in the L2 slice that contains its L2 entry.

An L2 slice has currently the same size as an L2 table (one cluster),
so both functions return the same value for now.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: a1c45c5c5a76146dd1712d8d1e7b409ad539c718.1517840877.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-02-13 16:59:59 +01:00
Alberto Garcia 3c2e511a24 qcow2: Add l2_slice_size field to BDRVQcow2State
The BDRVQcow2State structure contains an l2_size field, which stores
the number of 64-bit entries in an L2 table.

For efficiency reasons we want to be able to load slices instead of
full L2 tables, so we need to know how many entries an L2 slice can
hold.

An L2 slice is the portion of an L2 table that is loaded by the qcow2
cache. At the moment that cache can only load complete tables,
therefore an L2 slice has the same size as an L2 table (one cluster)
and l2_size == l2_slice_size.

Later we'll allow smaller slices, but until then we have to use this
new l2_slice_size field to make the rest of the code ready for that.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: adb048595f9fb5dfb110c802a8b3c3be3b937f37.1517840877.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-02-13 16:59:59 +01:00
Alberto Garcia 05b5b6ee54 qcow2: Add offset_to_l1_index()
Similar to offset_to_l2_index(), this function returns the index in
the L1 table for a given guest offset. This is only used in a couple
of places and it's not a particularly complex calculation, but it
makes the code a bit more readable.

Although in the qcow2_get_cluster_offset() case the old code was
taking advantage of the l1_bits variable, we're going to get rid of
the other uses of l1_bits in a later patch anyway, so it doesn't make
sense to keep it just for this.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: a5f626fed526b7459a0425fad06d823d18df8522.1517840877.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-02-13 16:59:59 +01:00
Alberto Garcia 6e6fa7605e qcow2: Remove BDS parameter from qcow2_cache_is_table_offset()
This function was only using the BlockDriverState parameter to pass it
to qcow2_cache_get_table_addr(). This is no longer necessary so this
parameter can be removed.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: eb0ed90affcf302e5a954bafb5931b5215483d3a.1517840877.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-02-13 16:59:59 +01:00
Alberto Garcia 77aadd7bed qcow2: Remove BDS parameter from qcow2_cache_discard()
This function was only using the BlockDriverState parameter to pass it
to qcow2_cache_get_table_idx() and qcow2_cache_table_release(). This
is no longer necessary so this parameter can be removed.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 9724f7e38e763ad3be32627c6b7fe8df9edb1476.1517840877.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-02-13 16:59:59 +01:00
Alberto Garcia b2f68bffab qcow2: Remove BDS parameter from qcow2_cache_clean_unused()
This function was only using the BlockDriverState parameter to pass it
to qcow2_cache_table_release(). This is no longer necessary so this
parameter can be removed.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: b74f17591af52f201de0ea3a3b2dd0a81932334d.1517840876.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-02-13 16:59:59 +01:00
Alberto Garcia e64d4072d1 qcow2: Remove BDS parameter from qcow2_cache_destroy()
This function was never using the BlockDriverState parameter so it can
be safely removed.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 49c74fe8b3aead9056e61a85b145ce787d06262b.1517840876.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-02-13 16:59:59 +01:00
Alberto Garcia 2013c3d44d qcow2: Remove BDS parameter from qcow2_cache_put()
This function was only using the BlockDriverState parameter to pass it
to qcow2_cache_get_table_idx(). This is no longer necessary so this
parameter can be removed.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 6f98155489054a457563da77cdad1a66ebb3e896.1517840876.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-02-13 16:59:59 +01:00
Alberto Garcia 2d135ee92d qcow2: Remove BDS parameter from qcow2_cache_entry_mark_dirty()
This function was only using the BlockDriverState parameter to pass it
to qcow2_cache_get_table_idx(). This is no longer necessary so this
parameter can be removed.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 5c40516a91782b083c1428b7b6a41bb9e2679bfb.1517840876.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-02-13 16:59:59 +01:00
Alberto Garcia ebe988f318 qcow2: Remove BDS parameter from qcow2_cache_table_release()
This function was only using the BlockDriverState parameter to get the
cache table size (since it was equal to the cluster size). This is no
longer necessary so this parameter can be removed.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 7c1b262344375d52544525f85bbbf0548d5ba575.1517840876.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-02-13 16:59:58 +01:00
Alberto Garcia b3b8b6d9c4 qcow2: Remove BDS parameter from qcow2_cache_get_table_idx()
This function was only using the BlockDriverState parameter to get the
cache table size (since it was equal to the cluster size). This is no
longer necessary so this parameter can be removed.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: da3575d47c9a181a2cfd4715e53dd84a2c651017.1517840876.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-02-13 16:59:58 +01:00
Alberto Garcia 9869b27bb5 qcow2: Remove BDS parameter from qcow2_cache_get_table_addr()
This function was only using the BlockDriverState parameter to get the
cache table size (since it was equal to the cluster size). This is no
longer necessary so this parameter can be removed.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: e1f943a9e89e1deb876f45de1bb22419ccdb6ad3.1517840876.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-02-13 16:59:58 +01:00
Alberto Garcia 03019d7314 qcow2: Add table size field to Qcow2Cache
The table size in the qcow2 cache is currently equal to the cluster
size. This doesn't allow us to use the cache memory efficiently,
particularly with large cluster sizes, so we need to be able to have
smaller cache tables that are independent from the cluster size. This
patch adds a new field to Qcow2Cache that we can use instead of the
cluster size.

The current table size is still being initialized to the cluster size,
so there are no semantic changes yet, but this patch will allow us to
prepare the rest of the code and simplify a few function calls.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 67a1bf9e55f417005c567bead95a018dc34bc687.1517840876.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-02-13 16:59:58 +01:00
Alberto Garcia d4c373b854 qcow2: Fix documentation of get_cluster_table()
This function has not been returning the offset of the L2 table since
commit 3948d1d487

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: b498733b6706a859a03678d74ecbd26aeba129aa.1517840876.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-02-13 16:59:58 +01:00
Vladimir Sementsov-Ogievskiy 3e99da5e76 block: maintain persistent disabled bitmaps
To maintain load/store disabled bitmap there is new approach:

 - deprecate @autoload flag of block-dirty-bitmap-add, make it ignored
 - store enabled bitmaps as "auto" to qcow2
 - store disabled bitmaps without "auto" flag to qcow2
 - on qcow2 open load "auto" bitmaps as enabled and others
   as disabled (except in_use bitmaps)

Also, adjust iotests 165 and 176 appropriately.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20180202160752.143796-1-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-02-13 16:59:58 +01:00
Max Reitz 74f1eabf9c sheepdog: Allow fully preallocated truncation
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-02-13 16:18:41 +01:00
Max Reitz 1a62baf62b sheepdog: Pass old and new size to sd_prealloc()
sd_prealloc() will now preallocate the area [old_size, new_size).  As
before, it rounds to buf_size and may thus overshoot and preallocate
areas that were not requested to be preallocated.  For image creation,
this is no change in behavior.  For truncation, this is in accordance
with the documentation for preallocated truncation.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-02-13 16:18:38 +01:00
Max Reitz 8b9ad56e9c sheepdog: Make sd_prealloc() take a BDS
We want to use this function in sd_truncate() later on, so taking a
filename is not exactly ideal.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-02-13 16:18:37 +01:00
Max Reitz c3132aff17 gluster: Add preallocated truncation
By using qemu_do_cluster_truncate() in qemu_cluster_truncate(), we now
automatically have preallocated truncation.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-02-13 16:18:35 +01:00
Max Reitz f3a33f795c gluster: Query current size in do_truncate()
Instead of expecting the current size to be 0, query it and allocate
only the area [current_size, offset) if preallocation is requested.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-02-13 16:18:33 +01:00
Max Reitz 36e87909f8 gluster: Pull truncation from qemu_gluster_create
Pull out the truncation code from the qemu_cluster_create() function so
we can later reuse it in qemu_gluster_truncate().

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-02-13 16:18:30 +01:00
Max Reitz 0375003df4 gluster: Move glfs_close() to create's clean-up
glfs_close() is a classical clean-up operation, as can be seen by the
fact that it is executed even if the truncation before it failed.
Also, moving it to clean-up makes it more clear that if it fails, we do
not want it to overwrite the current ret value if that signifies an
error already.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-02-13 16:18:25 +01:00
Alberto Garcia de7269d293 qcow2: Use g_try_realloc() in qcow2_expand_zero_clusters()
g_realloc() aborts the program if it fails to allocate the required
amount of memory. We want to detect that scenario and return an error
instead, so let's use g_try_realloc().

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-02-13 12:27:17 +01:00
Eric Blake e24d813b29 block: Simplify bdrv_can_write_zeroes_with_unmap()
We don't need the can_write_zeroes_with_unmap field in
BlockDriverInfo, because it is redundant information with
supported_zero_flags & BDRV_REQ_MAY_UNMAP.  Note that
BlockDriverInfo and supported_zero_flags are both per-device
settings, rather than global state about the driver as a
whole, which means one or both of these bits of information
can already be conditional.  Let's audit how they were set:

crypto: always setting can_write_ to false is pointless (the
struct starts life zero-initialized), no use of supported_

nbd: just recently fixed to set can_write_ if supported_
includes MAY_UNMAP (thus this commit effectively reverts
bca80059e and solves the problem mentioned there in a more
global way)

file-posix, iscsi, qcow2: can_write_ is conditional, while
supported_ was unconditional; but passing MAY_UNMAP would
fail with ENOTSUP if the condition wasn't met

qed: can_write_ is unconditional, but pwrite_zeroes lacks
support for MAY_UNMAP and supported_ is not set. Perhaps
support can be added later (since it would be similar to
qcow2), but for now claiming false is no real loss

all other drivers: can_write_ is not set, and supported_ is
either unset or a passthrough

Simplify the code by moving the conditional into
supported_zero_flags for all drivers, then dropping the
now-unused BDI field.  For callers that relied on
bdrv_can_write_zeroes_with_unmap(), we return the same
per-device settings for drivers that had conditions (no
observable change in behavior there); and can now return
true (instead of false) for drivers that support passthrough
(for example, the commit driver) which gives those drivers
the same fix as nbd just got in bca80059e.  For callers that
relied on supported_zero_flags, we now have a few more places
that can avoid a wasted call to pwrite_zeroes() that will
just fail with ENOTSUP.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180126193439.20219-1-eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-02-09 12:32:44 -06:00
Markus Armbruster 922a01a013 Move include qemu/option.h from qemu-common.h to actual users
qemu-common.h includes qemu/option.h, but most places that include the
former don't actually need the latter.  Drop the include, and add it
to the places that actually need it.

While there, drop superfluous includes of both headers, and
separate #include from file comment with a blank line.

This cleanup makes the number of objects depending on qemu/option.h
drop from 4545 (out of 4743) to 284 in my "build everything" tree.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180201111846.21846-20-armbru@redhat.com>
[Semantic conflict with commit bdd6a90a9e in block/nvme.c resolved]
2018-02-09 13:52:16 +01:00
Markus Armbruster bbcad965bf Drop superfluous includes of qapi/qmp/qjson.h
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180201111846.21846-19-armbru@redhat.com>
2018-02-09 13:52:15 +01:00
Markus Armbruster bd006b9818 Include qapi/qmp/qbool.h exactly where needed
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180201111846.21846-15-armbru@redhat.com>
2018-02-09 13:52:15 +01:00
Markus Armbruster 452fcdbc49 Include qapi/qmp/qdict.h exactly where needed
This cleanup makes the number of objects depending on qapi/qmp/qdict.h
drop from 4550 (out of 4743) to 368 in my "build everything" tree.
For qapi/qmp/qobject.h, the number drops from 4552 to 390.

While there, separate #include from file comment with a blank line.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180201111846.21846-13-armbru@redhat.com>
2018-02-09 13:52:15 +01:00
Markus Armbruster 47e6b297e7 Include qapi/qmp/qlist.h exactly where needed
This cleanup makes the number of objects depending on qapi/qmp/qlist.h
drop from 4551 (out of 4743) to 16 in my "build everything" tree.

While there, separate #include from file comment with a blank line.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180201111846.21846-12-armbru@redhat.com>
2018-02-09 13:52:15 +01:00
Markus Armbruster 15280c360e qdict qlist: Make most helper macros functions
The macro expansions of qdict_put_TYPE() and qlist_append_TYPE() need
qbool.h, qnull.h, qnum.h and qstring.h to compile.  We include qnull.h
and qnum.h in the headers, but not qbool.h and qstring.h.  Works,
because we include those wherever the macros get used.

Open-coding these helpers is of dubious value.  Turn them into
functions and drop the includes from the headers.

This cleanup makes the number of objects depending on qapi/qmp/qnum.h
from 4551 (out of 4743) to 46 in my "build everything" tree.  For
qapi/qmp/qnull.h, the number drops from 4552 to 21.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180201111846.21846-10-armbru@redhat.com>
2018-02-09 13:52:15 +01:00
Markus Armbruster 6b67395762 Eliminate qapi/qmp/types.h
qapi/qmp/types.h is a convenience header to include a number of
qapi/qmp/ headers.  Since we rarely need all of the headers
qapi/qmp/types.h includes, we bypass it most of the time.  Most of the
places that use it don't need all the headers, either.

Include the necessary headers directly, and drop qapi/qmp/types.h.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180201111846.21846-9-armbru@redhat.com>
2018-02-09 13:52:15 +01:00
Markus Armbruster a82400cf5c Drop superfluous includes of qapi/qmp/qerror.h
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180201111846.21846-6-armbru@redhat.com>
2018-02-09 13:51:35 +01:00
Markus Armbruster e688df6bc4 Include qapi/error.h exactly where needed
This cleanup makes the number of objects depending on qapi/error.h
drop from 1910 (out of 4743) to 1612 in my "build everything" tree.

While there, separate #include from file comment with a blank line,
and drop a useless comment on why qemu/osdep.h is included first.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180201111846.21846-5-armbru@redhat.com>
[Semantic conflict with commit 34e304e975 resolved, OSX breakage fixed]
2018-02-09 13:50:17 +01:00
Markus Armbruster 8f0a3716e4 Clean up includes
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.

This commit was created with scripts/clean-includes, with the change
to target/s390x/gen-features.c manually reverted, and blank lines
around deletions collapsed.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180201111846.21846-3-armbru@redhat.com>
2018-02-09 05:05:11 +01:00
Fam Zheng a3d9a352d4 block: Move NVMe constants to a separate header
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20180116060901.17413-8-famz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
2018-02-08 09:22:03 +08:00
Fam Zheng 9ed616129e block/nvme: Implement .bdrv_(un)register_buf
Forward these two calls to the IOVA manager.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20180116060901.17413-6-famz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
2018-02-08 09:22:03 +08:00
Fam Zheng 23d0ba9319 block: Introduce buf register API
Allow block driver to map and unmap a buffer for later I/O, as a performance
hint.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20180116060901.17413-5-famz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
2018-02-08 09:22:03 +08:00
Fam Zheng bdd6a90a9e block: Add VFIO based NVMe driver
This is a new protocol driver that exclusively opens a host NVMe
controller through VFIO. It achieves better latency than linux-aio by
completely bypassing host kernel vfs/block layer.

    $rw-$bs-$iodepth  linux-aio     nvme://
    ----------------------------------------
    randread-4k-1     10.5k         21.6k
    randread-512k-1   745           1591
    randwrite-4k-1    30.7k         37.0k
    randwrite-512k-1  1945          1980

    (unit: IOPS)

The driver also integrates with the polling mechanism of iothread.

This patch is co-authored by Paolo and me.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20180116060901.17413-4-famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
2018-02-08 09:22:03 +08:00
Paolo Bonzini 709f213214 curl: convert to CoQueue
Now that CoQueues can use a QemuMutex for thread-safety, there is no
need for curl to roll its own coroutine queue.  Coroutines can be
placed directly on the queue instead of using a list of CURLAIOCBs.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20180203153935.8056-6-pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
2018-02-08 09:22:03 +08:00
Murilo Opsfelder Araujo fbd5c4c0db block/ssh: fix possible segmentation fault when .desc is not null-terminated
This patch prevents a possible segmentation fault when .desc members are checked
against NULL.

The ssh_runtime_opts was added by commit
8a6a80896d ("block/ssh: Use QemuOpts for runtime
options").

This fix was inspired by
http://lists.nongnu.org/archive/html/qemu-devel/2018-01/msg00883.html.

Fixes: 8a6a80896d ("block/ssh: Use QemuOpts for runtime options")
Cc: Max Reitz <mreitz@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.vnet.ibm.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
2018-01-31 22:37:00 -05:00
Edgar Kaziakhmedov 9776f0db6a nbd: implement bdrv_get_info callback
Since mirror job supports efficient zero out target mechanism (see
in mirror_dirty_init()), implement bdrv_get_info to make it work
over NBD. Such improvement will allow using the largest chunk possible
and will decrease the number of NBD_CMD_WRITE_ZEROES requests on the wire.

Signed-off-by: Edgar Kaziakhmedov <edgar.kaziakhmedov@virtuozzo.com>
Message-Id: <20180118115158.17219-1-edgar.kaziakhmedov@virtuozzo.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2018-01-26 09:58:46 -06:00
Peter Maydell f78b6f9b11 Block layer patches
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJaZyzMAAoJEH8JsnLIjy/WSOYP/jHffS2fOHtdLQU42G76HN0K
 jkhSUE8cuKFzgxuJDJhv10AGsGZvDIaRhuumPIFArQkcEwsFDfd0UqzC5GkCnhfn
 6frVPsRSUp9BqXqha1+6vOgVRobdBXPpS25ERfanTsbu3aPEDRnGpxmpMvyyimft
 BTUnWNCg2lM6bojXrC6oy7MqUdi9p9PviMcQAfnN07SmGa+s6tS2Jc9znvZwgL06
 o+oPukWVTAiub5qcH18BLA3T8xcCXWANdY9pUnNj7mXHoxg3kYzzYBArYDh6Kyju
 BkSEML1kNcUACFAZ+LSqQpnoc8/5cP+jY5cOBGtUUgjZSns/xnAZxALltds0I4m3
 fqQM68oOTX7squAYAaKYVYMirime6aa2OAn2afxPJildPp8uH4lNust95yiUyyJQ
 oqA3zfAnP5FfmTnzjLG7smYlRUlcHp8eMPyOKHxp3BuqTMbWY5KQETyDMk3QVnZr
 7fSFIdT4sRTdroKXUKHHu3RLFyCo77EBovxY2oUtt6v43qxQhLx0IFwW6jHrcLK9
 ifLOr1CqdgwH/OU7h6rzoLcGLX5/eOTxwcCbU0kP2cx4E60VBXmSaDq9TiwQhbeV
 4HteS+EP6R0WpCiAvsFl2aUd6iwDRHeYt0aKpYyUuTVrW2mfmLPaQP+tJLjEoaHF
 H5HlbNWy2gFAB2uQtmOd
 =gjNZ
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches

# gpg: Signature made Tue 23 Jan 2018 12:38:36 GMT
# gpg:                using RSA key 0x7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream: (29 commits)
  iotests: Disable some tests for compat=0.10
  iotests: Split 177 into two parts for compat=0.10
  iotests: Make 059 pass on machines with little RAM
  iotests: Filter compat-dependent info in 198
  iotests: Make 191 work with qcow2 options
  iotests: Make 184 image-less
  iotests: Make 089 compatible with compat=0.10
  iotests: Fix 067 for compat=0.10
  iotests: Fix 059's reference output
  iotests: Fix 051 for compat=0.10
  iotests: Fix 020 for vmdk
  iotests: Skip 103 for refcount_bits=1
  iotests: Forbid 020 for non-file protocols
  iotests: Drop format-specific in _filter_img_info
  iotests: Fix _img_info for backslashes
  block/vmdk: Add blkdebug events
  block/qcow: Add blkdebug events
  qcow2: No persistent dirty bitmaps for compat=0.10
  block/vmdk: Fix , instead of ; at end of line
  qemu-iotests: Fix locking issue in 102
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-01-24 22:55:57 +00:00
Max Reitz 23c4b2a896 block/vmdk: Add blkdebug events
This is certainly not complete, but it includes at least write_aio and
read_aio.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20171123020832.8165-5-mreitz@redhat.com
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-01-23 12:34:43 +01:00
Max Reitz 0abb1475f8 block/qcow: Add blkdebug events
This is not necessarily complete, but it should include the most
important places.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20171123020832.8165-4-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-01-23 12:34:43 +01:00
Max Reitz c9ceb3ec8a qcow2: No persistent dirty bitmaps for compat=0.10
Persistent dirty bitmaps require a properly functioning
autoclear_features field, or we cannot track when an unsupporting
program might overwrite them.  Therefore, we cannot support them for
compat=0.10 images.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20171123020832.8165-3-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-01-23 12:34:42 +01:00
Max Reitz 3c363575dc block/vmdk: Fix , instead of ; at end of line
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20171123020832.8165-2-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-01-23 12:34:42 +01:00
Max Reitz ac5b787a6e qcow2: Repair unaligned preallocated zero clusters
We can easily repair unaligned preallocated zero clusters by discarding
them, so why not do it?

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20171110203759.14018-2-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-01-23 12:34:42 +01:00
Edgar Kaziakhmedov bcbb3866da block/parallels: add backing support to readv/writev
Since parallels format supports backing files, refine
readv/writev (allocate_clusters) to redirect read/write requests
to a backing file (if cluster is not available in the current bs).

Signed-off-by: Edgar Kaziakhmedov <edgar.kaziakhmedov@virtuozzo.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Klim Kireev <klim.kireev@virtuozzo.com>
Message-id: 20180112090122.1702-6-klim.kireev@virtuozzo.com
CC: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-01-22 14:02:33 +00:00
Klim Kireev 908b1c848e block/parallels: replace some magic numbers
Signed-off-by: Klim Kireev <klim.kireev@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Edgar Kaziakhmedov <edgar.kaziakhmedov@virtuozzo.com>
Message-id: 20180112090122.1702-5-klim.kireev@virtuozzo.com
CC: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-01-22 14:02:33 +00:00
Klim Kireev 90fe66f046 block/parallels: move some structures into header
To implement xml format, some defines and structures
from parallels.c are required.

Signed-off-by: Klim Kireev <klim.kireev@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Edgar Kaziakhmedov <edgar.kaziakhmedov@virtuozzo.com>
Message-id: 20180112090122.1702-4-klim.kireev@virtuozzo.com
CC: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-01-22 14:02:33 +00:00
Klim Kireev ed279a06c5 configure: add dependency
This dependency is required for adequate Parallels images support.
Typically the disk consists of several images which are glued by
XML disk descriptor. Also XML hides inside several important parameters
which are not available in the image header.

The patch also adds clause to checkpatch.pl to understand libxml2 types.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Klim Kireev <klim.kireev@virtuozzo.com>
Signed-off-by: Edgar Kaziakhmedov <edgar.kaziakhmedov@virtuozzo.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20180112090122.1702-3-klim.kireev@virtuozzo.com
CC: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-01-22 14:02:33 +00:00
Peter Lieven 79f9c75e17 block/iscsi: fix initialization of iTask in iscsi_co_get_block_status
in case of unaligned requests or on a target that does not support
block provisioning we leave iTask uninitialized and check iTask.task
for NULL later.

Fixes: e38bc23454
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1515425247-21730-1-git-send-email-pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-01-16 14:54:52 +01:00
Murilo Opsfelder Araujo c4365735a7 block/nbd: fix segmentation fault when .desc is not null-terminated
The find_desc_by_name() from util/qemu-option.c relies on the .name not being
NULL to call strcmp(). This check becomes unsafe when the list is not
NULL-terminated, which is the case of nbd_runtime_opts in block/nbd.c, and can
result in segmentation fault when strcmp() tries to access an invalid memory:

    #0 0x00007fff8c75f7d4 in __strcmp_power9 () from /lib64/libc.so.6
    #1 0x00000000102d3ec8 in find_desc_by_name (desc=0x1036d6f0, name=0x28e46670 "server.path") at util/qemu-option.c:166
    #2 0x00000000102d93e0 in qemu_opts_absorb_qdict (opts=0x28e47a80, qdict=0x28e469a0, errp=0x7fffec247c98) at util/qemu-option.c:1026
    #3 0x000000001012a2e4 in nbd_open (bs=0x28e42290, options=0x28e469a0, flags=24578, errp=0x7fffec247d80) at block/nbd.c:406
    #4 0x00000000100144e8 in bdrv_open_driver (bs=0x28e42290, drv=0x1036e070 <bdrv_nbd_unix>, node_name=0x0, options=0x28e469a0, open_flags=24578, errp=0x7fffec247f50) at block.c:1135
    #5 0x0000000010015b04 in bdrv_open_common (bs=0x28e42290, file=0x0, options=0x28e469a0, errp=0x7fffec247f50) at block.c:1395

>From gdb, the desc[i].name was not NULL and resulted in strcmp() accessing an
invalid memory:

    >>> p desc[5]
    $8 = {
      name = 0x1037f098 "R27A",
      type = 1561964883,
      help = 0xc0bbb23e <error: Cannot access memory at address 0xc0bbb23e>,
      def_value_str = 0x2 <error: Cannot access memory at address 0x2>
    }
    >>> p desc[6]
    $9 = {
      name = 0x103dac78 <__gcov0.do_qemu_init_bdrv_nbd_init> "\001",
      type = 272101528,
      help = 0x29ec0b754403e31f <error: Cannot access memory at address 0x29ec0b754403e31f>,
      def_value_str = 0x81f343b9 <error: Cannot access memory at address 0x81f343b9>
    }

This patch fixes the segmentation fault in strcmp() by adding a NULL element at
the end of nbd_runtime_opts.desc list, which is the common practice to most of
other structs like runtime_opts in block/null.c. Thus, the desc[i].name != NULL
check becomes safe because it will not evaluate to true when .desc list reached
its end.

Reported-by: R. Nageswara Sastry <nasastry@in.ibm.com>
Buglink: https://bugs.launchpad.net/qemu/+bug/1727259
Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.vnet.ibm.com>
Message-Id: <20180105133241.14141-2-muriloo@linux.vnet.ibm.com>
CC: qemu-stable@nongnu.org
Fixes: 7ccc44fd7d
Signed-off-by: Eric Blake <eblake@redhat.com>
2018-01-08 09:12:23 -06:00
Kevin Wolf 1a63a90750 block: Keep nodes drained between reopen_queue/multiple
The bdrv_reopen*() implementation doesn't like it if the graph is
changed between queuing nodes for reopen and actually reopening them
(one of the reasons is that queuing can be recursive).

So instead of draining the device only in bdrv_reopen_multiple(),
require that callers already drained all affected nodes, and assert this
in bdrv_reopen_queue().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2017-12-22 15:05:32 +01:00
Kevin Wolf 44487eb973 commit: Simplify reopen of base
Since commit bde70715, base is the only node that is reopened in
commit_start(). This means that the code, which still involves an
explicit BlockReopenQueue, can now be simplified by using bdrv_reopen().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2017-12-22 15:05:32 +01:00
Kevin Wolf d736f119da block: Allow graph changes in subtree drained section
We need to remember how many of the drain sections in which a node is
were recursive (i.e. subtree drain rather than node drain), so that they
can be correctly applied when children are added or removed during the
drained section.

With this change, it is safe to modify the graph even inside a
bdrv_subtree_drained_begin/end() section.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-12-22 15:05:32 +01:00
Kevin Wolf b016558590 block: Add bdrv_subtree_drained_begin/end()
bdrv_drained_begin() waits for the completion of requests in the whole
subtree, but it only actually keeps its immediate bs parameter quiesced
until bdrv_drained_end().

Add a version that keeps the whole subtree drained. As of this commit,
graph changes cannot be allowed during a subtree drained section, but
this will be fixed soon.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-12-22 15:05:32 +01:00
Kevin Wolf 0152bf400f block: Don't notify parents in drain call chain
This is in preparation for subtree drains, i.e. drained sections that
affect not only a single node, but recursively all child nodes, too.

Calling the parent callbacks for drain is pointless when we just came
from that parent node recursively and leads to multiple increases of
bs->quiesce_counter in a single drain call. Don't do it.

In order for this to work correctly, the parent callback must be called
for every bdrv_drain_begin/end() call, not only for the outermost one:

If we have a node N with two parents A and B, recursive draining of A
should cause the quiesce_counter of B to increase because its child N is
drained independently of B. If now B is recursively drained, too, A must
increase its quiesce_counter because N is drained independently of A
only now, even if N is going from quiesce_counter 1 to 2.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-12-22 15:05:32 +01:00
Kevin Wolf 0f11516894 block: Nested drain_end must still call callbacks
bdrv_do_drained_begin() restricts the call of parent callbacks and
aio_disable_external() to the outermost drain section, but the block
driver callbacks are always called. bdrv_do_drained_end() must match
this behaviour, otherwise nodes stay drained even if begin/end calls
were balanced.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-12-22 15:05:32 +01:00
Kevin Wolf 8119334918 block: Don't block_job_pause_all() in bdrv_drain_all()
Block jobs are already paused using the BdrvChildRole drain callbacks,
so we don't need an additional block_job_pause_all() call.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-12-22 15:05:32 +01:00
Kevin Wolf 7b6a3d3553 block: Make bdrv_drain() driver callbacks non-recursive
bdrv_drained_begin() doesn't increase bs->quiesce_counter recursively
and also doesn't notify other parent nodes of children, which both means
that the child nodes are not actually drained, and bdrv_drained_begin()
is providing useful functionality only on a single node.

To keep things consistent, we also shouldn't call the block driver
callbacks recursively.

A proper recursive drain version that provides an actually working
drained section for child nodes will be introduced later.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2017-12-22 15:05:31 +01:00
Kevin Wolf 9a7e86c804 block: Assert drain_all is only called from main AioContext
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2017-12-22 15:05:31 +01:00
Fam Zheng 8e77e0bceb block: Remove unused bdrv_requests_pending
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-12-22 15:05:31 +01:00
Edgar Kaziakhmedov 546a7dc40e qcow2: get rid of qcow2_backing_read1 routine
Since bdrv_co_preadv does all neccessary checks including
reading after the end of the backing file, avoid duplication
of verification before bdrv_co_preadv call.

Signed-off-by: Edgar Kaziakhmedov <edgar.kaziakhmedov@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-12-22 15:03:41 +01:00
Kevin Wolf 60369b86c4 block: Unify order in drain functions
Drain requests are propagated to child nodes, parent nodes and directly
to the AioContext. The order in which this happened was different
between all combinations of drain/drain_all and begin/end.

The correct order is to keep children only drained when their parents
are also drained. This means that at the start of a drained section, the
AioContext needs to be drained first, the parents second and only then
the children. The correct order for the end of a drained section is the
opposite.

This patch changes the three other functions to follow the example of
bdrv_drained_begin(), which is the only one that got it right.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-12-22 15:03:41 +01:00
Kevin Wolf 5280aa32e1 block: Don't wait for requests in bdrv_drain*_end()
The device is drained, so there is no point in waiting for requests at
the end of the drained section. Remove the bdrv_drain_recurse() calls
there.

The bdrv_drain_recurse() calls were introduced in commit 481cad48e5
in order to call the .bdrv_co_drain_end() driver callback. This is now
done by a separate bdrv_drain_invoke() call.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-12-22 15:03:41 +01:00
Kevin Wolf 99c05de918 block: bdrv_drain_recurse(): Remove unused begin parameter
Now that the bdrv_drain_invoke() calls are pulled up to the callers of
bdrv_drain_recurse(), the 'begin' parameter isn't needed any more.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-12-22 15:03:41 +01:00
Kevin Wolf 2da9b7d456 block: Call .drain_begin only once in bdrv_drain_all_begin()
bdrv_drain_all_begin() used to call the .bdrv_co_drain_begin() driver
callback inside its polling loop. This means that how many times it got
called for each node depended on long it had to poll the event loop.

This is obviously not right and results in nodes that stay drained even
after bdrv_drain_all_end(), which calls .bdrv_co_drain_begin() once per
node.

Fix bdrv_drain_all_begin() to call the callback only once, too.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-12-22 15:03:41 +01:00
Kevin Wolf db0289b9b2 block: Make bdrv_drain_invoke() recursive
This change separates bdrv_drain_invoke(), which calls the BlockDriver
drain callbacks, from bdrv_drain_recurse(). Instead, the function
performs its own recursion now.

One reason for this is that bdrv_drain_recurse() can be called multiple
times by bdrv_drain_all_begin(), but the callbacks may only be called
once. The separation is necessary to fix this bug.

The other reason is that we intend to go to a model where we call all
driver callbacks first, and only then start polling. This is not fully
achieved yet with this patch, as bdrv_drain_invoke() contains a
BDRV_POLL_WHILE() loop for the block driver callbacks, which can still
call callbacks for any unrelated event. It's a step in this direction
anyway.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-12-22 15:03:41 +01:00
Peter Lieven e38bc23454 block/iscsi: only report an iSCSI Failure if we don't handle it gracefully
we currently report an "iSCSI Failure" in iscsi_co_generic_cb if the task
hasn't completed with SCSI_STATUS_GOOD. However, we expect a failure in
some cases and handle it gracefully. This is the case for misaligned UNMAPs
and WRITESAME10/16 calls without UNMAP. In this case a failure in the
logs can be quite misleading.

While we are at it improve the logging to reveal which operation failed
at what LBA.

Signed-off-by: Peter Lieven <pl@kamp.de>
Message-Id: <1512733868-9009-3-git-send-email-pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-12-21 09:30:32 +01:00
Peter Lieven aef172ffdc block/iscsi: dont leave allocmap in an invalid state on UNMAP failure
we forgot to set the allocmap to invalid if an UNMAP call fails.

Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-Id: <1512733868-9009-2-git-send-email-pl@kamp.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-12-21 09:30:31 +01:00
Peter Maydell f1faf2d59c Pull request
v2:
  * Fixed incorrect virtio_blk_data_plane_create() local_err refactoring in
    "hw/block: Use errp directly rather than local_err" that broke virtio-blk
    over virtio-mmio [Peter]
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJaOSteAAoJEJykq7OBq3PIllkH/RkxTY6JIe9K8PRVsaAX2fRN
 edO/3E09KTQe9eHEixoMKOIyeKi3RPdipcktXIbdLIDEY4z4vELmQslTrxK/q+8J
 pccdwu+7tEXr14ciYSnq0m6ksvU5JHlJGyAJEvbCmLHE3dPJszABwT1XLLCb1C8s
 hSOr3nR/O2U3LHlq/FuvEUK8fohgKlECtE94V/DUWyC774iMw+9OdvTA0VQWYnN6
 B0gpYSn4AXmdt5HmpgCa+5rZrT2DjdwhtR9X+iOItPoXJPP81toUxvshLbTgdL54
 fSodd12Tbn2Pxr/osD1kwzM9z6oYX8Ay8YZTabODiFo20fhZKZ2wLpL4rrsNnBk=
 =Qcx2
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging

Pull request

v2:
 * Fixed incorrect virtio_blk_data_plane_create() local_err refactoring in
   "hw/block: Use errp directly rather than local_err" that broke virtio-blk
   over virtio-mmio [Peter]

# gpg: Signature made Tue 19 Dec 2017 15:08:14 GMT
# gpg:                using RSA key 0x9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>"
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35  775A 9CA4 ABB3 81AB 73C8

* remotes/stefanha/tags/block-pull-request: (23 commits)
  qemu-iotests: add 203 savevm with IOThreads test
  iothread: fix iothread_stop() race condition
  iotests: add VM.add_object()
  blockdev: add x-blockdev-set-iothread force boolean
  docs: mark nested AioContext locking as a legacy API
  block: avoid recursive AioContext acquire in bdrv_inactivate_all()
  virtio-blk: reject configs with logical block size > physical block size
  virtio-blk: make queue size configurable
  qemu-iotests: add 202 external snapshots IOThread test
  blockdev: add x-blockdev-set-iothread testing command
  iothread: add iothread_by_id() API
  block: drop unused BlockDirtyBitmapState->aio_context field
  block: don't keep AioContext acquired after internal_snapshot_prepare()
  block: don't keep AioContext acquired after blockdev_backup_prepare()
  block: don't keep AioContext acquired after drive_backup_prepare()
  block: don't keep AioContext acquired after external_snapshot_prepare()
  blockdev: hold AioContext for bdrv_unref() in external_snapshot_clean()
  qdev: drop unused #include "sysemu/iothread.h"
  dev-storage: Fix the unusual function name
  hw/block: Use errp directly rather than local_err
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

# Conflicts:
#	hw/core/qdev-properties-system.c
2017-12-20 11:30:55 +00:00
Peter Maydell 03c1c09d56 -----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJaOC2xAAoJEL2+eyfA3jBXeOcP/RUB2PDP6xC+gEtQ/pNTuT8a
 SY+2ZnxeCl/n6grW7Tf4IXwDZCTIM4+fthODJe3gmF0u8RkaEIm4Km4lf9Bc2gal
 2ZmZoXk00va4FHlv7CVMQV/kcl3BudUqA7bwlGm8HQAEFkYaYtSrpRIlCrvohQIX
 hb3hPme1Ia/GGz0W+QTli6bKxBdgXXO20ZOt03UkVRLBO5LN/U9VhHh2Uu1+QOaX
 bpuqDnfRmkZ5iKGSMkNLT81PAOSQelHz3eej8UhdQz2F1KU2RTWaZ6E3Y/Ky6qah
 OL5vmV4BJaOlX+c+Vkg97TxChccAJPb1TRXiUl1Ypo8YPOBnuJ0CSBbadI2CnFu4
 hAbGvzs77mOq1B3zY1gNMzVpK0R/nSPmXi63tY02VdedSbJ2s7dVyP49oV3Cko89
 8XGdUOD6fn5goaA+GwMPs6iQQZBSRCqX7L3tawIcHHqi09BG6D/MNqzcTV8DjJ9Z
 6UY1nGmPgO44fQmpkt+NJwYbEw8oENYEZZxsPupaFVqmIyPyjbhwV0ISLAGZtOWA
 lb4hETJ3a0dFhu6FawZyw2sDpTaSu+7AgIrBrmFFUUciYqWJr9xfAsvbFSqNARX2
 Msihq9T6oXfefO+d56C1AIJmIz+LL5KtAUfNaD9qLgHTcyMvfdPQJl5lLyXs7/NK
 RL6MRSe3KL5GQsga1nK7
 =8vnb
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/cody/tags/block-pull-request' into staging

# gpg: Signature made Mon 18 Dec 2017 21:05:53 GMT
# gpg:                using RSA key 0xBDBE7B27C0DE3057
# gpg: Good signature from "Jeffrey Cody <jcody@redhat.com>"
# gpg:                 aka "Jeffrey Cody <jeff@codyprime.org>"
# gpg:                 aka "Jeffrey Cody <codyprime@gmail.com>"
# Primary key fingerprint: 9957 4B4D 3474 90E7 9D98  D624 BDBE 7B27 C0DE 3057

* remotes/cody/tags/block-pull-request:
  block/curl: fix minor memory leaks
  block/curl: check error return of curl_global_init()
  block/sheepdog: code beautification
  block/sheepdog: remove spurious NULL check
  blockjob: kick jobs on set-speed
  backup: use copy_bitmap in incremental backup
  backup: simplify non-dirty bits progress processing
  backup: init copy_bitmap from sync_bitmap for incremental
  backup: move from done_bitmap to copy_bitmap
  hbitmap: add next_zero function

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-12-19 17:44:42 +00:00
Stefan Hajnoczi 78f1d3d6a6 coroutine: simplify co_aio_sleep_ns() prototype
The AioContext pointer argument to co_aio_sleep_ns() is only used for
the sleep timer.  It does not affect where the caller coroutine is
resumed.

Due to changes to coroutine and AIO APIs it is now possible to drop the
AioContext pointer argument.  This is safe to do since no caller has
specific requirements for which AioContext the timer must run in.

This patch drops the AioContext pointer argument and renames the
function to simplify the API.

Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Reported-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20171109102652.6360-1-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-12-19 09:25:27 +00:00
Jeff Cody 996922de45 block/curl: fix minor memory leaks
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
2017-12-18 15:44:39 -05:00
Jeff Cody 2d25964d18 block/curl: check error return of curl_global_init()
If curl_global_init() fails, per the documentation no other curl
functions may be called, so make sure to check the return value.

Also, some minor changes to the initialization latch variable 'inited':

- Make it static in the file, for clarity
- Change the name for clarity
- Make it a bool

Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
2017-12-18 15:42:07 -05:00
Jeff Cody d507c5f682 block/sheepdog: code beautification
No functional changes, just whitespace manipulation.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
2017-12-18 15:41:44 -05:00
Jeff Cody ac90dad94b block/sheepdog: remove spurious NULL check
'tag' is already checked in the lines immediately preceding this check,
and set to non-NULL if NULL.  No need to check again, it hasn't changed.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
2017-12-18 15:41:17 -05:00
Vladimir Sementsov-Ogievskiy 53f1c8794f backup: use copy_bitmap in incremental backup
We can use copy_bitmap instead of sync_bitmap. copy_bitmap is
initialized from sync_bitmap and it is more informative: we will not try
to process data, that is already in progress (by write notifier).

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 20171012135313.227864-6-vsementsov@virtuozzo.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2017-12-18 10:54:13 -05:00
Vladimir Sementsov-Ogievskiy 085bd08e6f backup: simplify non-dirty bits progress processing
Set fake progress for non-dirty clusters in copy_bitmap initialization,
to. It simplifies code and allows further refactoring.

This patch changes user's view of backup progress, but formally it
doesn't changed: progress hops are just moved to the beginning.

Actually it's just a point of view: when do we actually skip clusters?
We can say in the very beginning, that we skip these clusters and do
not think about them later.

Of course, if go through disk sequentially, it's logical to say, that
we skip clusters between copied portions to the left and to the right
of them. But even now copying progress is not sequential because of
write notifiers. Future patches will introduce new backup architecture
which will do copying in several coroutines in parallel, so it will
make no sense to publish fake progress by parts in parallel with
other copying requests.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 20171012135313.227864-5-vsementsov@virtuozzo.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2017-12-18 10:54:13 -05:00
Vladimir Sementsov-Ogievskiy 8cc6dc6215 backup: init copy_bitmap from sync_bitmap for incremental
We should not copy non-dirty clusters in write notifiers. So,
initialize copy_bitmap from sync_bitmap.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20171012135313.227864-4-vsementsov@virtuozzo.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2017-12-18 10:54:13 -05:00
Vladimir Sementsov-Ogievskiy a193b0f0a8 backup: move from done_bitmap to copy_bitmap
Use HBitmap copy_bitmap instead of done_bitmap. This is needed to
improve incremental backup in following patches and to unify backup
loop for full/incremental modes in future patches.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20171012135313.227864-3-vsementsov@virtuozzo.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2017-12-18 10:54:13 -05:00
Vladimir Sementsov-Ogievskiy 56207df55e hbitmap: add next_zero function
The function searches for next zero bit.
Also add interface for BdrvDirtyBitmap and unit test.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20171012135313.227864-2-vsementsov@virtuozzo.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2017-12-18 10:54:13 -05:00
Philippe Mathieu-Daudé 7d98febd67 block: remove "qemu/osdep.h" from header file
applied using ./scripts/clean-includes

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-12-18 17:07:02 +03:00
Peter Lieven f1a7ff770f block/nfs: fix nfs_client_open for filesize greater than 1TB
DIV_ROUND_UP(st.st_size, BDRV_SECTOR_SIZE) was overflowing ret (int) if
st.st_size is greater than 1TB.

Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-id: 1511798407-31129-1-git-send-email-pl@kamp.de
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-11-29 15:28:15 +01:00
Paolo Bonzini 5bf1d5a73a blockjob: remove clock argument from block_job_sleep_ns
All callers are using QEMU_CLOCK_REALTIME, and it will not be possible to
support more than one clock when block_job_sleep_ns switches to a single
timer stored in the BlockJob struct.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Tested-By: Jeff Cody <jcody@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-11-29 15:11:02 +01:00
Kevin Wolf 02d213009d block: Expect graph changes in bdrv_parent_drained_begin/end
The .drained_begin/end callbacks can (directly or indirectly via
aio_poll()) cause block nodes to be removed or the current BdrvChild to
point to a different child node.

Use QLIST_FOREACH_SAFE() to make sure we don't access invalid
BlockDriverStates or accidentally continue iterating the parents of the
new child node instead of the node we actually came from.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Tested-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-11-29 14:22:03 +01:00
Kevin Wolf 70a5afedd6 block: Error out on load_vm with active dirty bitmaps
Loading a snapshot invalidates the bitmap. Just marking all blocks dirty
is not a useful response in practice, instead the user needs to be aware
that we switch to a completely different state. If they are okay with
losing the dirty bitmap, they can just explicitly delete it.

This effectively reverts commit 04dec3c3ae.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
2017-11-21 14:48:23 +01:00
Kevin Wolf 2b624fe079 block: Add errp to bdrv_all_goto_snapshot()
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
2017-11-21 14:48:22 +01:00
Kevin Wolf 0b62bcbc61 block: Add errp to bdrv_snapshot_goto()
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
2017-11-21 14:48:22 +01:00
Kevin Wolf 1f4ad7d3b8 block: Don't request I/O permission with BDRV_O_NO_IO
'qemu-img info' makes sense even when BLK_PERM_CONSISTENT_READ cannot be
granted because of a block job in a running qemu process. It already
sets BDRV_O_NO_IO to indicate that it doesn't access the guest visible
data at all.

Check the BDRV_O_NO_IO flags in blk_new_open(), so that I/O related
permissions are not unnecessarily requested and 'qemu-img info' can work
even if BLK_PERM_CONSISTENT_READ cannot be granted.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
2017-11-21 14:48:22 +01:00
Peter Maydell 2e02083438 Block layer patches for 2.11.0-rc2
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJaDyNMAAoJEH8JsnLIjy/WP+oP/1G0r1b9PBtrEc+B9QG0sMh1
 qivbMRmNUFhuzYW8VBzb2Cpsl/lpe/VlAumf75RUACOh4ZfxkOhaYTMB5x9FHvoa
 5w35UC382zuXTMfeF4JufF5rVJdB+bB9THJOJ7cuaDQSHZNPo8uMmHrk4J5p5uuU
 IrOZLfqgQaWGdobGe6FPWy67z2PRWrzEI/Ogr8zwpa95GJhX3fY1CmGvt2MUBG+j
 cwdYcuX9pawFfjasCkfwYMzY7Uc9wvjIJYSe+WvrDrpjInISgxMOvXylzdWkpsGS
 rHnsLgRXnUE4BmOavzq+uvlo14hlrmdVn5UYlUkbJcx+Bkp7MZGuNjX6doimCdWg
 PpbQB3TIi7ce4v9cYWZSrzDDo6E/d1Evi3xMG0ZouU0ff1DPFIilm8lxb3xYia+b
 ItTLZ7yXWLijGhr4jTeGVhn0+zENiczyiQsL1sDdj83VnpAOGPNZTwtnNGug8ATJ
 VELwA7jlVEL47HYovQoUIFs0NA5xCbOQm5avS5gk44PB/dNkYGKdHhwgQUV/R36A
 lRNj7Vh7hwHSyHO/gKi3sqXTbNG/fTDSy1Nu1xc0pAROIKpc6AanP0mFn2DlZS35
 kyZbxMbuGUsDQ2j0pcGuP60O7+2CY4jlXRCXo6vaHzP3xDfOTx2L9UYVfk/dwNJu
 uYbnbXWv06igjm3Y/2Vv
 =ClZj
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches for 2.11.0-rc2

# gpg: Signature made Fri 17 Nov 2017 17:58:36 GMT
# gpg:                using RSA key 0x7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream: (25 commits)
  iotests: Make 087 pass without AIO enabled
  block: Make bdrv_next() keep strong references
  qcow2: Fix overly broad madvise()
  qcow2: Refuse to get unaligned offsets from cache
  qcow2: Add bounds check to get_refblock_offset()
  block: Guard against NULL bs->drv
  qcow2: Unaligned zero cluster in handle_alloc()
  qcow2: check_errors are fatal
  qcow2: reject unaligned offsets in write compressed
  iotests: Add test for failing qemu-img commit
  tests: Add check-qobject for equality tests
  iotests: Add test for non-string option reopening
  block: qobject_is_equal() in bdrv_reopen_prepare()
  qapi: Add qobject_is_equal()
  qapi/qlist: Add qlist_append_null() macro
  qapi/qnull: Add own header
  qcow2: fix image corruption on commit with persistent bitmap
  iotests: test clearing unknown autoclear_features by qcow2
  block: Fix permissions in image activation
  qcow2: fix image corruption after committing qcow2 image into base
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-11-17 19:08:07 +00:00
Max Reitz 5e003f17ec block: Make bdrv_next() keep strong references
On one hand, it is a good idea for bdrv_next() to return a strong
reference because ideally nearly every pointer should be refcounted.
This fixes intermittent failure of iotest 194.

On the other, it is absolutely necessary for bdrv_next() itself to keep
a strong reference to both the BB (in its first phase) and the BDS (at
least in the second phase) because when called the next time, it will
dereference those objects to get a link to the next one.  Therefore, it
needs these objects to stay around until then.  Just storing the pointer
to the next in the iterator is not really viable because that pointer
might become invalid as well.

Both arguments taken together means we should probably just invoke
bdrv_ref() and blk_ref() in bdrv_next().  This means we have to assert
that bdrv_next() is always called from the main loop, but that was
probably necessary already before this patch and judging from the
callers, it also looks to actually be the case.

Keeping these strong references means however that callers need to give
them up if they decide to abort the iteration early.  They can do so
through the new bdrv_next_cleanup() function.

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20171110172545.32609-1-mreitz@redhat.com
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-11-17 18:21:31 +01:00
Max Reitz 08546bcfb2 qcow2: Fix overly broad madvise()
@mem_size and @offset are both size_t, thus subtracting them from one
another will just return a big size_t if mem_size < offset -- even more
obvious here because the result is stored in another size_t.

Checking that result to be positive is therefore not sufficient to
exclude the case that offset > mem_size.  Thus, we currently sometimes
issue an madvise() over a very large address range.

This is triggered by iotest 163, but with -m64, this does not result in
tangible problems.  But with -m32, this test produces three segfaults,
all of which are fixed by this patch.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20171114184127.24238-1-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-11-17 18:21:31 +01:00
Max Reitz 4efb1f7c61 qcow2: Refuse to get unaligned offsets from cache
Instead of using an assertion, it is better to emit a corruption event
here.  Checking all offsets for correct alignment can be tedious and it
is easily possible to forget to do so.  qcow2_cache_do_get() is a
function every L2 and refblock access has to go through, so this is a
good central point to add such a check.

And for good measure, let us also add an assertion that the offset is
non-zero.  Making this a corruption event is not feasible, because a
zero offset usually means something special (such as the cluster is
unused), so all callers should be checking this anyway.  If they do not,
it is their fault, hence the assertion here.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20171110203111.7666-6-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-11-17 18:21:31 +01:00
Max Reitz 23482f8a60 qcow2: Add bounds check to get_refblock_offset()
Reported-by: R. Nageswara Sastry <nasastry@in.ibm.com>
Buglink: https://bugs.launchpad.net/qemu/+bug/1728661
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20171110203111.7666-5-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-11-17 18:21:31 +01:00
Max Reitz d470ad42ac block: Guard against NULL bs->drv
We currently do not guard everywhere against a NULL bs->drv where we
should be doing so.  Most of the places fixed here just do not care
about that case at all.

Some care implicitly, e.g. through a prior function call to
bdrv_getlength() which would always fail for an ejected BDS.  Add an
assert there to make it more obvious.

Other places seem to care, but do so insufficiently: Freeing clusters in
a qcow2 image is an error-free operation, but it may leave the image in
an unusable state anyway.  Giving qcow2_free_clusters() an error code is
not really viable, it is much easier to note that bs->drv may be NULL
even after a successful driver call.  This concerns bdrv_co_flush(), and
the way the check is added to bdrv_co_pdiscard() (in every iteration
instead of only once).

Finally, some places employ at least an assert(bs->drv); somewhere, that
may be reasonable (such as in the reopen code), but in
bdrv_has_zero_init(), it is definitely not.  Returning 0 there in case
of an ejected BDS saves us much headache instead.

Reported-by: R. Nageswara Sastry <nasastry@in.ibm.com>
Buglink: https://bugs.launchpad.net/qemu/+bug/1728660
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20171110203111.7666-4-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-11-17 18:21:31 +01:00
Max Reitz 93bbaf03ff qcow2: Unaligned zero cluster in handle_alloc()
We should check whether the cluster offset we are about to use is
actually valid; that is, whether it is aligned to cluster boundaries.

Reported-by: R. Nageswara Sastry <nasastry@in.ibm.com>
Buglink: https://bugs.launchpad.net/qemu/+bug/1728643
Buglink: https://bugs.launchpad.net/qemu/+bug/1728657
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20171110203111.7666-3-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-11-17 18:21:30 +01:00
Max Reitz 791fff504c qcow2: check_errors are fatal
When trying to repair a dirty image, qcow2_check() may apparently
succeed (no really fatal error occurred that would prevent the check
from continuing), but if check_errors in the result object is non-zero,
we cannot trust the image to be usable.

Reported-by: R. Nageswara Sastry <nasastry@in.ibm.com>
Buglink: https://bugs.launchpad.net/qemu/+bug/1728639
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20171110203111.7666-2-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-11-17 18:21:30 +01:00
Anton Nefedov 3e3b838ffe qcow2: reject unaligned offsets in write compressed
Misaligned compressed write is not supported.

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Message-id: 1510654613-47868-2-git-send-email-anton.nefedov@virtuozzo.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-11-17 18:21:30 +01:00
Eric Blake 4096974e18 qcow2: fix image corruption on commit with persistent bitmap
If an image contains persistent bitmaps, we cannot use the
fast path of bdrv_make_empty() to clear the image during
qemu-img commit, because that will lose the clusters related
to the bitmaps.

Also leave a comment in qcow2_read_extensions to remind future
feature additions to think about fast-path removal, since we
just barely fixed the same bug for LUKS encryption.

It's a pain that qemu-img has not yet been taught to manipulate,
or even at a very minimum display, information about persistent
bitmaps; instead, we have to use QMP commands.  It's also a
pain that only qeury-block and x-debug-block-dirty-bitmap-sha256
will allow bitmap introspection; but the former requires the
node to be hooked to a block device, and the latter is experimental.

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-11-17 18:21:01 +01:00
Eric Blake 08ace1d753 nbd: Don't crash when server reports NBD_CMD_READ failure
If a server fails a read, for example with EIO, but the connection
is still live, then we would crash trying to print a non-existent
error message in nbd_client_co_preadv().  For consistency, also
change the error printout in nbd_read_reply_entry(), although that
instance does not crash.  Bug introduced in commit f140e300.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171112013936.5942-1-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2017-11-17 08:02:45 -06:00
Daniel P. Berrange f06033295b qcow2: fix image corruption after committing qcow2 image into base
After committing the qcow2 image contents into the base image, qemu-img
will call bdrv_make_empty to drop the payload in the layered image.

When this is done for qcow2 images, it blows away the LUKS encryption
header, making the resulting image unusable. There are two codepaths
for emptying a qcow2 image, and the second (slower) codepath leaves
the LUKS header intact, so force use of that codepath.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-11-17 13:36:03 +01:00
Kevin Wolf 398e6ad014 block: Deprecate bdrv_set_read_only() and users
bdrv_set_read_only() is used by some block drivers to override the
read-only option given by the user. This is not how read-only images
generally work in QEMU: Instead of second guessing what the user really
meant (which currently includes making an image read-only even if the
user didn't only use the default, but explicitly said read-only=off), we
should error out if we can't provide what the user requested.

This adds deprecation warnings to all callers of bdrv_set_read_only() so
that the behaviour can be corrected after the usual deprecation period.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-11-17 13:35:59 +01:00
Daniel P. Berrange f66afbe26f qcow2: don't permit changing encryption parameters
Currently if trying to change encryption parameters on a qcow2 image, qemu-img
will abort. We already explicitly check for attempt to change encrypt.format
but missed other parameters like encrypt.key-secret. Rather than list each
parameter, just blacklist changing of all parameters with a 'encrypt.' prefix.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-11-17 13:35:59 +01:00
Wang Guang 611e0653ad replication: Fix replication open fail
replication_child_perm request write
permissions for all child which will lead bdrv_check_perm fail.
replication_child_perm() should request write
permissions only if it is writable itself.

Signed-off-by: Wang Guang <wang.guang55@zte.com.cn>
Signed-off-by: Wang Yong <wang.yong155@zte.com.cn>
Reviewed-by: Xie Changlong <xiechanglong@cmss.chinamobile.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-11-17 13:35:59 +01:00
Stefan Hajnoczi 341e0b5658 throttle-groups: forget timer and schedule next TGM on detach
tg->any_timer_armed[] must be cleared when detaching pending timers from
the AioContext.  Failure to do so leads to hung I/O because it looks
like there are still timers pending when in fact they have been removed.

Other ThrottleGroupMembers might have requests pending too so it's
necessary to schedule the next TGM so it can set a timer.

This patch fixes hung I/O when QEMU is launched with drives that are in
the same throttling group:

  (guest)$ dd if=/dev/zero of=/dev/vdb oflag=direct bs=512 &
  (guest)$ dd if=/dev/zero of=/dev/vdc oflag=direct bs=512 &
  (qemu) stop
  (qemu) cont
  ...I/O is stuck...

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20171116112150.27607-1-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-11-16 14:12:57 +00:00
Jeff Cody 1d0f37cf21 block/parallels: add migration blocker
Migration does not work for parallels, and has been broken for a while
(see patch 'block/parallels: Do not update header or truncate image when
 INMIGRATE').  The bdrv_invalidate_cache() method needs to be added for
migration to be supported.  Until this is done, prohibit migration.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 5e04a7c8a3089913fa58d484af42dab7993984ad.1510059970.git.jcody@redhat.com
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-11-14 18:06:26 +01:00
Jeff Cody 6c7d390b99 block/parallels: Do not update header or truncate image when INMIGRATE
If we write or modify the image file while the QEMU run state is
INMIGRATE, then the BDRV_O_INACTIVE BDS flag is set.  This will cause
an assert, since the image is marked inactive.  Make sure we obey this
flag.

Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Message-id: 3996c930fa8cde8570b7a63032720d76a28fd78b.1510059970.git.jcody@redhat.com
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-11-14 18:06:25 +01:00
Jeff Cody 7479bf07c4 block/vhdx.c: Don't blindly update the header
The VHDX specification requires that before user data modification of
the vhdx image, the VHDX header file and data GUIDs need to be updated.
In vhdx_open(), if the image is set to RDWR, we go ahead and update the
header.

However, just because the image is set to RDWR does not mean we can go
ahead and write at this point - specifically, if the QEMU run state is
INMIGRATE, the underlying file BS may be set to inactive via the BDS
open flag of BDRV_O_INACTIVE.  Attempting to write under this condition
will cause an assert in bdrv_co_pwritev().

We can alternatively latch the first time the image is written.  And lo
and behold, we do just that, via vhdx_user_visible_write() in
vhdx_co_writev().  This means the call to vhdx_update_headers() in
vhdx_open() is likely just vestigial, and can be removed.

Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Message-id: 659e4cdba6ef4c651737852777c8c93d27b38040.1510059970.git.jcody@redhat.com
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-11-14 18:06:25 +01:00
Vladimir Sementsov-Ogievskiy 04dec3c3ae block/snapshot: dirty all dirty bitmaps on snapshot-switch
Snapshot-switch actually changes active state of disk so it should
reflect on dirty bitmaps. Otherwise next incremental backup using
these bitmaps will be invalid.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20171023092945.54532-1-vsementsov@virtuozzo.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-11-14 18:06:25 +01:00
Alberto Garcia c9b83e9c23 qcow2: Assert that the crypto header does not overlap other metadata
The crypto header is initialized only when QEMU is creating a new
image, so there's no chance of this happening on a corrupted image.

If QEMU is really trying to allocate the header overlapping other
existing metadata sections then this is a serious bug in QEMU itself
so let's add an assertion.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: ae3d77f312fc0c5e0ac2bbd71676c0112eebe2e5.1509718618.git.berto@igalia.com
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-11-14 18:06:25 +01:00
Alberto Garcia 951053a9ec qcow2: Don't open images with header.refcount_table_clusters == 0
qcow2_do_open() is checking that header.refcount_table_clusters is not
too large, but it doesn't check that it's greater than zero. Apart
from the fact that an image like that is obviously corrupted, trying
to use it crashes QEMU since we end up with a null s->refcount_table
after qcow2_refcount_init().

These images can however be repaired, so allow opening them if the
BDRV_O_CHECK flag is set.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: f9750f50c80359babba11062e88f5075a47e8e16.1509718618.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-11-14 18:06:25 +01:00
Alberto Garcia 8aa34834d5 qcow2: Prevent allocating compressed clusters at offset 0
If the refcount data is corrupted then we can end up trying to
allocate a new compressed cluster at offset 0 in the image, triggering
an assertion in qcow2_alloc_bytes() that would crash QEMU:

  qcow2_alloc_bytes: Assertion `offset' failed.

This patch adds an explicit check for this scenario and a new test
case.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: fb53467cf48e95ff3330def1cf1003a5b862b7d9.1509718618.git.berto@igalia.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-11-14 18:06:25 +01:00
Alberto Garcia 9883975050 qcow2: Prevent allocating L2 tables at offset 0
If the refcount data is corrupted then we can end up trying to
allocate a new L2 table at offset 0 in the image, triggering an
assertion in the qcow2 cache that would crash QEMU:

  qcow2_cache_entry_mark_dirty: Assertion `c->entries[i].offset != 0' failed

This patch adds an explicit check for this scenario and a new test
case.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 92dac37191ae7844a2da22c122204eb493cc3133.1509718618.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-11-14 18:06:25 +01:00
Alberto Garcia 6bf45d59f9 qcow2: Prevent allocating refcount blocks at offset 0
Each entry in the qcow2 cache contains an offset field indicating the
location of the data in the qcow2 image. If the offset is 0 then it
means that the entry contains no data and is available to be used when
needed.

Because of that it is not possible to store in the cache the first
cluster of the qcow2 image (offset = 0). This is not a problem because
that cluster always contains the qcow2 header and we're not using this
cache for that.

However, if the qcow2 image is corrupted it can happen that we try to
allocate a new refcount block at offset 0, triggering this assertion
and crashing QEMU:

  qcow2_cache_entry_mark_dirty: Assertion `c->entries[i].offset != 0' failed

This patch adds an explicit check for this scenario and a new test
case.

This problem was originally reported here:

   https://bugs.launchpad.net/qemu/+bug/1728615

Reported-by: R.Nageswara Sastry <nasastry@in.ibm.com>
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 92a2fadd10d58b423f269c1d1a309af161cdc73f.1509718618.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-11-14 18:06:25 +01:00
Peter Maydell 191b5fbfa6 Pull request
The following disk I/O throttling fixes solve recent bugs.
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJaCsdYAAoJEJykq7OBq3PIZdMH/10xOuxOjvjxlJNkQquhrAmD
 y9dVj0jEtopdter/XR7ZCsww1UgxpIt8K43Dk1yWTKrm2bNN1v3cqemJV+UUTLFl
 LppKxt5Cm1JRKaCfN0hSwOp5pFJumzH6creVdQMQ3VNCSSw6xfV94pupaVE8at6D
 n4r3ZDF03ARETMJW7HY7QIFi1YVcfmi4wrx8rfhEGLZu06nHrtFQsDdH7SeErgXi
 wJh+ksji4EvX2xc54nhprCsc9HdzbfeBEYx6tdD0Uh3xm7xXd2oka5Rac74WuqYu
 B4aKwyFbvKZ0DYnENiOCkemTN51s+0GHLz43T92/DmQhJrBy8EU4TTCn73vgmto=
 =KnUT
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging

Pull request

The following disk I/O throttling fixes solve recent bugs.

# gpg: Signature made Tue 14 Nov 2017 10:37:12 GMT
# gpg:                using RSA key 0x9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>"
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35  775A 9CA4 ABB3 81AB 73C8

* remotes/stefanha/tags/block-pull-request:
  qemu-iotests: Test I/O limits with removable media
  block: Leave valid throttle timers when removing a BDS from a backend
  block: Check for inserted BlockDriverState in blk_io_limits_disable()
  throttle-groups: drain before detaching ThrottleState
  block: all I/O should be completed before removing throttle timers.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-11-14 16:11:19 +00:00
Alberto Garcia c89bcf3af0 block: Leave valid throttle timers when removing a BDS from a backend
If a BlockBackend has I/O limits set then its ThrottleGroupMember
structure uses the AioContext from its attached BlockDriverState.
Those two contexts must be kept in sync manually. This is not
ideal and will be fixed in the future by removing the throttling
configuration from the BlockBackend and storing it in an implicit
filter node instead, but for now we have to live with this.

When you remove the BlockDriverState from the backend then the
throttle timers are destroyed. If a new BlockDriverState is later
inserted then they are created again using the new AioContext.

There are a couple of problems with this:

   a) The code manipulates the timers directly, leaving the
      ThrottleGroupMember.aio_context field in an inconsisent state.

   b) If you remove the I/O limits (e.g by destroying the backend)
      when the timers are gone then throttle_group_unregister_tgm()
      will attempt to destroy them again, crashing QEMU.

While b) could be fixed easily by allowing the timers to be freed
twice, this would result in a situation in which we can no longer
guarantee that a valid ThrottleState has a valid AioContext and
timers.

This patch ensures that the timers and AioContext are always valid
when I/O limits are set, regardless of whether the BlockBackend has a
BlockDriverState inserted or not.

[Fixed "There'a" typo as suggested by Max Reitz <mreitz@redhat.com>
--Stefan]

Reported-by: sochin jiang <sochin.jiang@huawei.com>
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: e089c66e7c20289b046d782cea4373b765c5bc1d.1510339534.git.berto@igalia.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-11-13 15:43:49 +00:00
Alberto Garcia 48bf7ea81a block: Check for inserted BlockDriverState in blk_io_limits_disable()
When you set I/O limits using block_set_io_throttle or the command
line throttling.* options they are kept in the BlockBackend regardless
of whether a BlockDriverState is attached to the backend or not.

Therefore when removing the limits using blk_io_limits_disable() we
need to check if there's a BDS before attempting to drain it, else it
will crash QEMU. This can be reproduced very easily using HMP:

     (qemu) drive_add 0 if=none,throttling.iops-total=5000
     (qemu) drive_del none0

Reported-by: sochin jiang <sochin.jiang@huawei.com>
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 0d3a67ce8d948bb33e08672564714dcfb76a3d8c.1510339534.git.berto@igalia.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-11-13 14:38:46 +00:00
Stefan Hajnoczi dc868fb03b throttle-groups: drain before detaching ThrottleState
I/O requests hang after stop/cont commands at least since QEMU 2.10.0
with -drive iops=100:

  (guest)$ dd if=/dev/zero of=/dev/vdb oflag=direct count=1000
  (qemu) stop
  (qemu) cont
  ...I/O is stuck...

This happens because blk_set_aio_context() detaches the ThrottleState
while requests may still be in flight:

  if (tgm->throttle_state) {
      throttle_group_detach_aio_context(tgm);
      throttle_group_attach_aio_context(tgm, new_context);
  }

This patch encloses the detach/attach calls in a drained region so no
I/O request is left hanging.  Also add assertions so we don't make the
same mistake again in the future.

Reported-by: Yongxue Hong <yhong@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20171110151934.16883-1-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-11-13 14:02:09 +00:00
Zhengui 632a773543 block: all I/O should be completed before removing throttle timers.
In blk_remove_bs, all I/O should be completed before removing throttle
timers. If there has inflight I/O, removing throttle timers here will
cause the inflight I/O never return.
This patch add bdrv_drained_begin before throttle_timers_detach_aio_context
to let all I/O completed before removing throttle timers.

[Moved declaration of bs as suggested by Alberto Garcia
<berto@igalia.com>.
--Stefan]

Signed-off-by: Zhengui <lizhengui@huawei.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 1508564040-120700-1-git-send-email-lizhengui@huawei.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-11-13 14:02:05 +00:00
Eric Blake b4176cb314 nbd-client: Stricter enforcing of structured reply spec
Ensure that the server is not sending unexpected chunk lengths
for either the NONE or the OFFSET_DATA chunk, nor unexpected
hole length for OFFSET_HOLE.  This will flag any server as
broken that responds to a zero-length read with an OFFSET_DATA
(what our server currently does, but that's about to be fixed)
or with OFFSET_HOLE, even though we previously fixed our client
to never be able to send such a request over the wire.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171108215703.9295-7-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2017-11-09 10:22:26 -06:00
Eric Blake 9d8f818cde nbd-client: Short-circuit 0-length operations
The NBD spec was recently clarified to state that clients should
not send 0-length requests to the server, as the server behavior
is undefined [1].  We know that qemu-nbd's behavior is a successful
no-op (once it has filtered for read-only exports), but other NBD
implementations might return an error.  To avoid any questionable
server implementations, it is better to just short-circuit such
requests on the client side (we are relying on the block layer to
already filter out requests such as invalid offset, write to a
read-only volume, and so forth); do the short-circuit as late as
possible to still benefit from protections from assertions that
the block layer is not violating our assumptions.

[1] https://github.com/NetworkBlockDevice/nbd/commit/ee926037

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171108215703.9295-6-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2017-11-09 10:18:31 -06:00
Eric Blake 1104d83c72 nbd-client: Refuse read-only client with BDRV_O_RDWR
The NBD spec says that clients should not try to write/trim to
an export advertised as read-only by the server.  But we failed
to check that, and would allow the block layer to use NBD with
BDRV_O_RDWR even when the server is read-only, which meant we
were depending on the server sending a proper EPERM failure for
various commands, and also exposes a leaky abstraction: using
qemu-io in read-write mode would succeed on 'w -z 0 0' because
of local short-circuiting logic, but 'w 0 0' would send a
request over the wire (where it then depends on the server, and
fails at least for qemu-nbd but might pass for other NBD
implementations).

With this patch, a client MUST request read-only mode to access
a server that is doing a read-only export, or else it will get
a message like:

can't open device nbd://localhost:10809/foo: request for write access conflicts with read-only export

It is no longer possible to even attempt writes over the wire
(including the corner case of 0-length writes), because the block
layer enforces the explicit read-only request; this matches the
behavior of qcow2 when backed by a read-only POSIX file.

Fix several iotests to comply with the new behavior (since
qemu-nbd of an internal snapshot, as well as nbd-server-add over QMP,
default to a read-only export, we must tell blockdev-add/qemu-io to
set up a read-only client).

CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171108215703.9295-3-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2017-11-09 10:10:17 -06:00
Eric Blake e659fb3b99 nbd-client: Fix error message typos
Provide missing spaces that are required when using string
concatenation to break error messages across source lines.
Introduced in commit f140e300.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171108215703.9295-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2017-11-09 10:09:38 -06:00
Vladimir Sementsov-Ogievskiy f140e30003 nbd: Minimal structured read for client
Minimal implementation: for structured error only error_report error
message.

Note that test 83 is now more verbose, because the implementation
prints more warnings about unexpected communication errors; perhaps
future patches should tone things down by using trace messages
instead of traces, but the common case of successful communication
is no noisier than before.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171027104037.8319-13-eblake@redhat.com>
2017-10-30 21:48:41 +01:00
Vladimir Sementsov-Ogievskiy d2febedb45 nbd/client: prepare nbd_receive_reply for structured reply
In following patch nbd_receive_reply will be used both for simple
and structured reply header receiving.
NBDReply is altered into union of simple reply header and structured
reply chunk header, simple error translation moved to block/nbd-client
to be consistent with further structured reply error translation.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171027104037.8319-11-eblake@redhat.com>
2017-10-30 21:48:32 +01:00
Max Reitz 572b07bea1 qcow2: Always execute preallocate() in a coroutine
Some qcow2 functions (at least perform_cow()) expect s->lock to be
taken.  Therefore, if we want to make use of them, we should execute
preallocate() (as "preallocate_co") in a coroutine so that we can use
the qemu_co_mutex_* functions.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20171009215533.12530-3-mreitz@redhat.com
Cc: qemu-stable@nongnu.org
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-26 15:01:14 +02:00
Max Reitz e400ad1e1f qcow2: Fix unaligned preallocated truncation
A qcow2 image file's length is not required to have a length that is a
multiple of the cluster size.  However, qcow2_refcount_area() expects an
aligned value for its @start_offset parameter, so we need to round
@old_file_size up to the next cluster boundary.

Reported-by: Ping Li <pingl@redhat.com>
Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1414049
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20171009215533.12530-2-mreitz@redhat.com
Cc: qemu-stable@nongnu.org
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-26 15:01:14 +02:00
Max Reitz 233521b199 qcow2: Emit errp when truncating the image tail
bdrv_truncate() has an errp parameter which is always set when an error
occurs.  Let's use that instead of a plain strerror().

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20171009155431.14093-1-mreitz@redhat.com
Reviewed-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-26 15:01:14 +02:00
Alberto Garcia a35f87f50d qcow2: Use BDRV_SECTOR_BITS instead of its literal value
BDRV_SECTOR_BITS is defined to be 9 in block.h (and BDRV_SECTOR_SIZE
is calculated from that), but there are still a couple of places where
we are using the literal value instead of the macro.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 20171009153856.20387-1-berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-26 15:01:13 +02:00
Eric Blake 8cbf74b23c qcow2: Reduce is_zero() rounding
Now that bdrv_is_allocated accepts non-aligned inputs, we can
remove the TODO added in earlier refactoring.

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-26 14:45:57 +02:00
Eric Blake 88e63df214 block: Reduce bdrv_aligned_preadv() rounding
Now that bdrv_is_allocated accepts non-aligned inputs, we can
remove the TODO added in commit d6a644bb.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-26 14:45:57 +02:00
Eric Blake efa6e2ed64 block: Align block status requests
Any device that has request_alignment greater than 512 should be
unable to report status at a finer granularity; it may also be
simpler for such devices to be guaranteed that the block layer
has rounded things out to the granularity boundary (the way the
block layer already rounds all other I/O out).  Besides, getting
the code correct for super-sector alignment also benefits us
for the fact that our public interface now has byte granularity,
even though none of our drivers have byte-level callbacks.

Add an assertion in blkdebug that proves that the block layer
never requests status of unaligned sections, similar to what it
does on other requests (while still keeping the generic helper
in place for when future patches add a throttle driver).  Note
that iotest 177 already covers this (it would fail if you use
just the blkdebug.c hunk without the io.c changes).  Meanwhile,
we can drop assertions in callers that no longer have to pass
in sector-aligned addresses.

There is a mid-function scope added for 'count' and 'longret',
for a couple of reasons: first, an upcoming patch will add an
'if' statement that checks whether a driver has an old- or
new-style callback, and can conveniently use the same scope for
less indentation churn at that time.  Second, since we are
trying to get rid of sector-based computations, wrapping things
in a scope makes it easier to group and see what will be
deleted in a final cleanup patch once all drivers have been
converted to the new-style callback.

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-26 14:45:57 +02:00
Eric Blake 3182664220 block: Convert bdrv_get_block_status_above() to bytes
We are gradually moving away from sector-based interfaces, towards
byte-based.  In the common case, allocation is unlikely to ever use
values that are not naturally sector-aligned, but it is possible
that byte-based values will let us be more precise about allocation
at the end of an unaligned file that can do byte-based access.

Changing the name of the function from bdrv_get_block_status_above()
to bdrv_block_status_above() ensures that the compiler enforces that
all callers are updated.  Likewise, since it a byte interface allows
an offset mapping that might not be sector aligned, split the mapping
out of the return value and into a pass-by-reference parameter.  For
now, the io.c layer still assert()s that all uses are sector-aligned,
but that can be relaxed when a later patch implements byte-based
block status in the drivers.

For the most part this patch is just the addition of scaling at the
callers followed by inverse scaling at bdrv_block_status(), plus
updates for the new split return interface.  But some code,
particularly bdrv_block_status(), gets a lot simpler because it no
longer has to mess with sectors.  Likewise, mirror code no longer
computes s->granularity >> BDRV_SECTOR_BITS, and can therefore drop
an assertion about alignment because the loop no longer depends on
alignment (never mind that we don't really have a driver that
reports sub-sector alignments, so it's not really possible to test
the effect of sub-sector mirroring).  Fix a neighboring assertion to
use is_power_of_2 while there.

For ease of review, bdrv_get_block_status() was tackled separately.

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-26 14:45:57 +02:00
Eric Blake 5b648c67e3 block: Switch bdrv_co_get_block_status_above() to byte-based
We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Convert another internal
type (no semantic change), and rename it to match the corresponding
public function rename.

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-26 14:45:57 +02:00
Eric Blake 7ddb99b9dc block: Switch bdrv_common_block_status_above() to byte-based
We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Convert another internal
function (no semantic change).

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-26 14:45:57 +02:00
Eric Blake 4bcd936e47 block: Switch BdrvCoGetBlockStatusData to byte-based
We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Convert another internal
type (no semantic change), and rename it to match the corresponding
public function rename.

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-26 14:45:57 +02:00
Eric Blake 2e8bc7874b block: Switch bdrv_co_get_block_status() to byte-based
We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Convert another internal
function (no semantic change); and as with its public counterpart,
rename to bdrv_co_block_status() and split the offset return, to
make the compiler enforce that we catch all uses.  For now, we
assert that callers and the return value still use aligned data,
but ultimately, this will be the function where we hand off to a
byte-based driver callback, and will eventually need to add logic
to ensure we round calls according to the driver's
request_alignment then touch up the result handed back to the
caller, to start permitting a caller to pass unaligned offsets.

Note that we are now prepared to accepts 'bytes' larger than INT_MAX;
this is okay as long as we clamp things internally before violating
any 32-bit limits, and makes no difference to how a client will
use the information (clients looping over the entire file must
already be prepared for consecutive calls to return the same status,
as drivers are already free to return shorter-than-maximal status
due to any other convenient split points, such as when the L2 table
crosses cluster boundaries in qcow2).

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-26 14:45:57 +02:00
Eric Blake 237d78f8fc block: Convert bdrv_get_block_status() to bytes
We are gradually moving away from sector-based interfaces, towards
byte-based.  In the common case, allocation is unlikely to ever use
values that are not naturally sector-aligned, but it is possible
that byte-based values will let us be more precise about allocation
at the end of an unaligned file that can do byte-based access.

Changing the name of the function from bdrv_get_block_status() to
bdrv_block_status() ensures that the compiler enforces that all
callers are updated.  For now, the io.c layer still assert()s that
all callers are sector-aligned, but that can be relaxed when a later
patch implements byte-based block status in the drivers.

There was an inherent limitation in returning the offset via the
return value: we only have room for BDRV_BLOCK_OFFSET_MASK bits, which
means an offset can only be mapped for sector-aligned queries (or,
if we declare that non-aligned input is at the same relative position
modulo 512 of the answer), so the new interface also changes things to
return the offset via output through a parameter by reference rather
than mashed into the return value.  We'll have some glue code that
munges between the two styles until we finish converting all uses.

For the most part this patch is just the addition of scaling at the
callers followed by inverse scaling at bdrv_block_status(), coupled
with the tweak in calling convention.  But some code, particularly
bdrv_is_allocated(), gets a lot simpler because it no longer has to
mess with sectors.

For ease of review, bdrv_get_block_status_above() will be tackled
separately.

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-26 14:45:57 +02:00
Eric Blake 7286d6106f block: Switch bdrv_make_zero() to byte-based
We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Change the internal
loop iteration of zeroing a device to track by bytes instead of
sectors (although we are still guaranteed that we iterate by steps
that are sector-aligned).

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-26 14:45:57 +02:00
Eric Blake f06f6b66c7 qcow2: Switch is_zero_sectors() to byte-based
We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Convert another internal
function (no semantic change), and rename it to is_zero() in the
process.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-26 14:45:57 +02:00
Eric Blake 7cfd527525 block: Make bdrv_round_to_clusters() signature more useful
In the process of converting sector-based interfaces to bytes,
I'm finding it easier to represent a byte count as a 64-bit
integer at the block layer (even if we are internally capped
by SIZE_MAX or even INT_MAX for individual transactions, it's
still nicer to not have to worry about truncation/overflow
issues on as many variables).  Update the signature of
bdrv_round_to_clusters() to uniformly use int64_t, matching
the signature already chosen for bdrv_is_allocated and the
fact that off_t is also a signed type, then adjust clients
according to the required fallout (even where the result could
now exceed 32 bits, no client is directly assigning the result
into a 32-bit value without breaking things into a loop first).

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-26 14:45:57 +02:00
Eric Blake c9ce8c4da6 block: Add flag to avoid wasted work in bdrv_is_allocated()
Not all callers care about which BDS owns the mapping for a given
range of the file, or where the zeroes lie within that mapping.  In
particular, bdrv_is_allocated() cares more about finding the
largest run of allocated data from the guest perspective, whether
or not that data is consecutive from the host perspective, and
whether or not the data reads as zero.  Therefore, doing subsequent
refinements such as checking how much of the format-layer
allocation also satisfies BDRV_BLOCK_ZERO at the protocol layer is
wasted work - in the best case, it just costs extra CPU cycles
during a single bdrv_is_allocated(), but in the worst case, it
results in a smaller *pnum, and forces callers to iterate through
more status probes when visiting the entire file for even more
extra CPU cycles.

This patch only optimizes the block layer (no behavior change when
want_zero is true, but skip unnecessary effort when it is false).
Then when subsequent patches tweak the driver callback to be
byte-based, we can also pass this hint through to the driver.

Tweak BdrvCoGetBlockStatusData to declare arguments in parameter
order, rather than mixing things up (minimizing padding is not
necessary here).

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-26 14:45:57 +02:00
Eric Blake 298a1665a2 block: Allow NULL file for bdrv_get_block_status()
Not all callers care about which BDS owns the mapping for a given
range of the file.  This patch merely simplifies the callers by
consolidating the logic in the common call point, while guaranteeing
a non-NULL file to all the driver callbacks, for no semantic change.
The only caller that does not care about pnum is bdrv_is_allocated,
as invoked by vvfat; we can likewise add assertions that the rest
of the stack does not have to worry about a NULL pnum.

Furthermore, this will also set the stage for a future cleanup: when
a caller does not care about which BDS owns an offset, it would be
nice to allow the driver to optimize things to not have to return
BDRV_BLOCK_OFFSET_VALID in the first place.  In the case of fragmented
allocation (for example, it's fairly easy to create a qcow2 image
where consecutive guest addresses are not at consecutive host
addresses), the current contract requires bdrv_get_block_status()
to clamp *pnum to the limit where host addresses are no longer
consecutive, but allowing a NULL file means that *pnum could be
set to the full length of known-allocated data.

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-26 14:45:57 +02:00
Peter Maydell 79b2a13aa8 nbd patches for 2017-10-14
- Marc-André Lureau - NBD: use g_new() family of functions
 - Vladimir Sementsov-Ogievskiy - first half of 00/13 nbd minimal structured read
 -----BEGIN PGP SIGNATURE-----
 Comment: Public key at http://people.redhat.com/eblake/eblake.gpg
 
 iQEcBAABCAAGBQJZ4q4XAAoJEKeha0olJ0NqEooH/R8NKYACELA39xrLdEMUQuZY
 1Lm3/OtpBIICKx7OiZ7LniqApAI++FgjNxOf6PAfNG0TmEA+wMFaZ6NJEdi9DAmv
 kJVLsxiqKLDD+WIKMq5XfZQoFMJ8rV8W2/BYx9cF3Pl4KMT20qDsumsncZJ7DGOR
 jjsbAI8Q6g45VBx6TJbxXiTMDj87nIyNaydAGzRQTmEHtnmh8mllPiuEhJu24l6G
 7CQKfcu4/7Te/5PvJIPn7CxHdVjLYalgWDRkU3kXcwmO8vGQEkYoiHPoc8lGsGtw
 oXJ2YIODYBIjeICkF0/PjT9aoeJQG8EuHR1hT0CW5dVBZz/DlVP/j+EZ6IDV/8k=
 =ud0Z
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2017-10-14' into staging

nbd patches for 2017-10-14

- Marc-André Lureau - NBD: use g_new() family of functions
- Vladimir Sementsov-Ogievskiy - first half of 00/13 nbd minimal structured read

# gpg: Signature made Sun 15 Oct 2017 01:38:47 BST
# gpg:                using RSA key 0xA7A16B4A2527436A
# gpg: Good signature from "Eric Blake <eblake@redhat.com>"
# gpg:                 aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>"
# gpg:                 aka "[jpeg image of size 6874]"
# Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2  F3AA A7A1 6B4A 2527 436A

* remotes/ericb/tags/pull-nbd-2017-10-14:
  nbd: header constants indenting
  nbd/server: simplify reply transmission
  nbd/server: refactor nbd_co_send_simple_reply parameters
  nbd/server: do not use NBDReply structure
  nbd/server: structurize simple reply header sending
  nbd: rename some simple-request related objects to be _simple_
  block/nbd-client: refactor nbd_co_receive_reply
  block/nbd-client: assert qiov len once in nbd_co_request
  NBD: use g_new() family of functions

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-10-16 15:54:42 +01:00
Manos Pitsidianakis b867eaa17b block/throttle.c: add bdrv_co_drain_begin/end callbacks
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-10-13 12:38:41 +01:00
Manos Pitsidianakis f8ea8dacf0 block: rename bdrv_co_drain to bdrv_co_drain_begin
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-10-13 12:38:41 +01:00
Manos Pitsidianakis 481cad48e5 block: add bdrv_co_drain_end callback
BlockDriverState has a bdrv_co_drain() callback but no equivalent for
the end of the drain. The throttle driver (block/throttle.c) needs a way
to mark the end of the drain in order to toggle io_limits_disabled
correctly, thus bdrv_co_drain_end is needed.

Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-10-13 12:38:41 +01:00
Vladimir Sementsov-Ogievskiy ed397b2fe7 block/nbd-client: refactor nbd_co_receive_reply
Pass handle parameter directly, as the whole request isn't needed.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171012095319.136610-3-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2017-10-12 16:20:27 -05:00
Vladimir Sementsov-Ogievskiy 4bfe4478d1 block/nbd-client: assert qiov len once in nbd_co_request
Also improve the assertion: check that qiov is NULL for other commands
than CMD_READ and CMD_WRITE.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171012095319.136610-2-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2017-10-12 16:19:35 -05:00
Eduardo Habkost e5766d6ec7 config: qemu_config_parse() return number of config groups
Change qemu_config_parse() to return the number of config groups
in success and -EINVAL on error. This will allow callers of
qemu_config_parse() to check if something was really loaded from
the config file.

All existing callers of qemu_config_parse() and
qemu_read_config_file() only check if the return value was
negative, so the change shouldn't affect them.

Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20171004025043.3788-2-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-10-09 23:21:52 -03:00
Vladimir Sementsov-Ogievskiy ce960aa906 block/mirror: check backing in bdrv_mirror_top_flush
Backing may be zero after failed bdrv_append in mirror_start_job,
which leads to SIGSEGV.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20170929152255.5431-1-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-06 16:30:48 +02:00
Pavel Butsykin 163bc39d2c qcow2: truncate the tail of the image file after shrinking the image
Now after shrinking the image, at the end of the image file, there might be a
tail that probably will never be used. So we can find the last used cluster and
cut the tail.

Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20170929121613.25997-3-pbutsykin@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-06 16:30:48 +02:00
Pavel Butsykin 76a2a30a99 qcow2: fix return error code in qcow2_truncate()
Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170929121613.25997-2-pbutsykin@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-06 16:30:48 +02:00
Vladimir Sementsov-Ogievskiy 18775ff326 block/mirror: check backing in bdrv_mirror_top_refresh_filename
Backing may be zero after failed bdrv_attach_child in
bdrv_set_backing_hd, which leads to SIGSEGV.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20170928120300.58164-1-vsementsov@virtuozzo.com
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-06 16:30:47 +02:00
Daniel P. Berrange d67a6b09b4 block: support passthrough of BDRV_REQ_FUA in crypto driver
The BDRV_REQ_FUA flag can trivially be allowed in the crypt driver
as a passthrough to the underlying block driver.

Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170927125340.12360-7-berrange@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-06 16:30:47 +02:00
Daniel P. Berrange 4609742a49 block: convert qcrypto_block_encrypt|decrypt to take bytes offset
Instead of sector offset, take the bytes offset when encrypting
or decrypting data.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170927125340.12360-6-berrange@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-06 16:30:47 +02:00
Daniel P. Berrange a73466fbad block: convert crypto driver to bdrv_co_preadv|pwritev
Make the crypto driver implement the bdrv_co_preadv|pwritev
callbacks, and also use bdrv_co_preadv|pwritev for I/O
with the protocol driver beneath. This replaces sector based
I/O with byte based I/O, and allows us to stop assuming the
physical sector size matches the encryption sector size.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170927125340.12360-5-berrange@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-06 16:30:47 +02:00
Daniel P. Berrange 31376555c7 block: fix data type casting for crypto payload offset
The crypto APIs report the offset of the data payload as an uint64_t
type, but the block driver is casting to size_t or ssize_t which will
potentially truncate.

Most of the block APIs use int64_t for offsets meanwhile, so even if
using uint64_t in the crypto block driver we are still at risk of
truncation.

Change the block crypto driver to use uint64_t, but add asserts that
the value is less than INT64_MAX.

Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170927125340.12360-4-berrange@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-06 16:30:47 +02:00
Daniel P. Berrange 161253e2d0 block: use 1 MB bounce buffers for crypto instead of 16KB
Using 16KB bounce buffers creates a significant performance
penalty for I/O to encrypted volumes on storage which high
I/O latency (rotating rust & network drives), because it
triggers lots of fairly small I/O operations.

On tests with rotating rust, and cache=none|directsync,
write speed increased from 2MiB/s to 32MiB/s, on a par
with that achieved by the in-kernel luks driver. With
other cache modes the in-kernel driver is still notably
faster because it is able to report completion of the
I/O request before any encryption is done, while the
in-QEMU driver must encrypt the data before completion.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170927125340.12360-2-berrange@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-06 16:30:47 +02:00
Eric Blake cb2e28780c block: Perform copy-on-read in loop
Improve our braindead copy-on-read implementation.  Pre-patch,
we have multiple issues:
- we create a bounce buffer and perform a write for the entire
request, even if the active image already has 99% of the
clusters occupied, and really only needs to copy-on-read the
remaining 1% of the clusters
- our bounce buffer was as large as the read request, and can
needlessly exhaust our memory by using double the memory of
the request size (the original request plus our bounce buffer),
rather than a capped maximum overhead beyond the original
- if a driver has a max_transfer limit, we are bypassing the
normal code in bdrv_aligned_preadv() that fragments to that
limit, and instead attempt to read the entire buffer from the
driver in one go, which some drivers may assert on
- a client can request a large request of nearly 2G such that
rounding the request out to cluster boundaries results in a
byte count larger than 2G.  While this cannot exceed 32 bits,
it DOES have some follow-on problems:
-- the call to bdrv_driver_pread() can assert for exceeding
BDRV_REQUEST_MAX_BYTES, if the driver is old and lacks
.bdrv_co_preadv
-- if the buffer is all zeroes, the subsequent call to
bdrv_co_do_pwrite_zeroes is a no-op due to a negative size,
which means we did not actually copy on read

Fix all of these issues by breaking up the action into a loop,
where each iteration is capped to sane limits.  Also, querying
the allocation status allows us to optimize: when data is
already present in the active layer, we don't need to bounce.

Note that the code has a telling comment that copy-on-read
should probably be a filter driver rather than a bolt-on hack
in io.c; but that remains a task for another day.

CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake d855ebcd3c block: Add blkdebug hook for copy-on-read
Make it possible to inject errors on writes performed during a
read operation due to copy-on-read semantics.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake 9cdcfd9f7a block: Uniform handling of 0-length bdrv_get_block_status()
Handle a 0-length block status request up front, with a uniform
return value claiming the area is not allocated.

Most callers don't pass a length of 0 to bdrv_get_block_status()
and friends; but it definitely happens with a 0-length read when
copy-on-read is enabled.  While we could audit all callers to
ensure that they never make a 0-length request, and then assert
that fact, it was just as easy to fix things to always report
success (as long as the callers are careful to not go into an
infinite loop).  However, we had inconsistent behavior on whether
the status is reported as allocated or defers to the backing
layer, depending on what callbacks the driver implements, and
possibly wasting quite a few CPU cycles to get to that answer.
Consistently reporting unallocated up front doesn't really hurt
anything, and makes it easier both for callers (0-length requests
now have well-defined behavior) and for drivers (drivers don't
have to deal with 0-length requests).

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Kevin Wolf bde70715b6 commit: Remove overlay_bs
We don't need to make any assumptions about the graph layout above the
top node of the commit operation any more. Remove the use of
bdrv_find_overlay() and related variables from the commit job code.

bdrv_drop_intermediate() doesn't use the 'active' parameter any more, so
we can just drop it.

The overlay node was previously added to the block job to get a
BLK_PERM_GRAPH_MOD. We really need to respect those permissions in
bdrv_drop_intermediate() now, but as long as we haven't figured out yet
how BLK_PERM_GRAPH_MOD is actually supposed to work, just leave a TODO
comment there.

With this change, it is now possible to perform another block job on an
overlay node without conflicts. qemu-iotests 030 is changed accordingly.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2017-10-06 16:28:58 +02:00
Kevin Wolf 61f09cea01 commit: Support multiple roots above top node
This changes the commit block job to support operation in a graph where
there is more than a single active layer that references the top node.

This involves inserting the commit filter node not only on the path
between the given active node and the top node, but between the top node
and all of its parents.

On completion, bdrv_drop_intermediate() must consider all parents for
updating the backing file link. These parents may be backing files
themselves and as such read-only; reopen them temporarily if necessary.
Previously this was achieved by the bdrv_reopen() calls in the commit
block job that made overlay_bs read-write for the whole duration of the
block job, even though write access is only needed on completion.

Now that we consider all parents, overlay_bs is meaningless. It is left
in place in this commit, but we'll remove it soon.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake ca75962244 dirty-bitmap: Convert internal hbitmap size/granularity
Now that all callers are using byte-based interfaces, there's no
reason for our internal hbitmap to remain with sector-based
granularity.  It also simplifies our internal scaling, since we
already know that hbitmap widens requests out to granularity
boundaries.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake 0fdf1a4f68 dirty-bitmap: Switch bdrv_set_dirty() to bytes
Both callers already had bytes available, but were scaling to
sectors.  Move the scaling to internal code.  In the case of
bdrv_aligned_pwritev(), we are now passing the exact offset
rather than a rounded sector-aligned value, but that's okay
as long as dirty bitmap widens start/bytes to granularity
boundaries.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake 49d741b504 qcow2: Switch store_bitmap_data() to byte-based iteration
Now that we have adjusted the majority of the calls this function
makes to be byte-based, it is easier to read the code if it makes
passes over the image using bytes rather than sectors.

iotests 165 was rather weak - on a default 64k-cluster image, where
bitmap granularity also defaults to 64k bytes, a single cluster of
the bitmap table thus covers (64*1024*8) bits which each cover 64k
bytes, or 32G of image space.  But the test only uses a 1G image,
so it cannot trigger any more than one loop of the code in
store_bitmap_data(); and it was writing to the first cluster.  In
order to test that we are properly aligning which portions of the
bitmap are being written to the file, we really want to test a case
where the first dirty bit returned by bdrv_dirty_iter_next() is not
aligned to the start of a cluster, which we can do by modifying the
test to write data that doesn't happen to fall in the first cluster
of the image.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake ab94db6f76 qcow2: Switch load_bitmap_data() to byte-based iteration
Now that we have adjusted the majority of the calls this function
makes to be byte-based, it is easier to read the code if it makes
passes over the image using bytes rather than sectors.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake b85ee45334 qcow2: Switch qcow2_measure() to byte-based iteration
This is new code, but it is easier to read if it makes passes over
the image using bytes rather than sectors (and will get easier in
the future when bdrv_get_block_status is converted to byte-based).

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake 23ca459a45 mirror: Switch mirror_dirty_init() to byte-based iteration
Now that we have adjusted the majority of the calls this function
makes to be byte-based, it is easier to read the code if it makes
passes over the image using bytes rather than sectors.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake e0d7f73e63 dirty-bitmap: Change bdrv_[re]set_dirty_bitmap() to use bytes
Some of the callers were already scaling bytes to sectors; others
can be easily converted to pass byte offsets, all in our shift
towards a consistent byte interface everywhere.  Making the change
will also make it easier to write the hold-out callers to use byte
rather than sectors for their iterations; it also makes it easier
for a future dirty-bitmap patch to offload scaling over to the
internal hbitmap.  Although all callers happen to pass
sector-aligned values, make the internal scaling robust to any
sub-sector requests.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake 3b5d4df0c6 dirty-bitmap: Change bdrv_get_dirty_locked() to take bytes
Half the callers were already scaling bytes to sectors; the other
half can eventually be simplified to use byte iteration.  Both
callers were already using the result as a bool, so make that
explicit.  Making the change also makes it easier for a future
dirty-bitmap patch to offload scaling over to the internal hbitmap.

Remember, asking whether a byte is dirty is effectively asking
whether the entire granularity containing the byte is dirty, since
we only track dirtiness by granularity.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake 9a46dba7b7 dirty-bitmap: Change bdrv_get_dirty_count() to report bytes
Thanks to recent cleanups, all callers were scaling a return value
of sectors into bytes; do the scaling internally instead.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake f798184cfd dirty-bitmap: Change bdrv_dirty_iter_next() to report byte offset
Thanks to recent cleanups, most callers were scaling a return value
of sectors into bytes (the exception, in qcow2-bitmap, will be
converted to byte-based iteration later).  Update the interface to
do the scaling internally instead.

In qcow2-bitmap, the code was specifically checking for an error
return of -1.  To avoid a regression, we either have to make sure
we continue to return -1 (rather than a scaled -512) on error, or
we have to fix the caller to treat all negative values as error
rather than just one magic value.  It's easy enough to make both
changes at the same time, even though either one in isolation
would work.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake 715a74d819 dirty-bitmap: Set iterator start by offset, not sector
All callers to bdrv_dirty_iter_new() passed 0 for their initial
starting point, drop that parameter.

Most callers to bdrv_set_dirty_iter() were scaling a byte offset to
a sector number; the exception qcow2-bitmap will be converted later
to use byte rather than sector iteration.  Move the scaling to occur
internally to dirty bitmap code instead, so that callers now pass
in bytes.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake c7e7c87ac8 qcow2: Switch sectors_covered_by_bitmap_cluster() to byte-based
We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Change the qcow2 bitmap
helper function sectors_covered_by_bitmap_cluster(), renaming it
to bytes_covered_by_bitmap_cluster() in the process.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake 86f6ae67e1 dirty-bitmap: Change bdrv_dirty_bitmap_*serialize*() to take bytes
Right now, the dirty-bitmap code exposes the fact that we use
a scale of sector granularity in the underlying hbitmap to anything
that wants to serialize a dirty bitmap.  It's nicer to uniformly
expose bytes as our dirty-bitmap interface, matching the previous
change to bitmap size.  The only caller to serialization is currently
qcow2-cluster.c, which becomes a bit more verbose because it is still
tracking sectors for other reasons, but a later patch will fix that
to more uniformly use byte offsets everywhere.  Likewise, within
dirty-bitmap, we have to add more assertions that we are not
truncating incorrectly, which can go away once the internal hbitmap
is byte-based rather than sector-based.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake 993e6525bf dirty-bitmap: Track bitmap size by bytes
We are still using an internal hbitmap that tracks a size in sectors,
with the granularity scaled down accordingly, because it lets us
use a shortcut for our iterators which are currently sector-based.
But there's no reason we can't track the dirty bitmap size in bytes,
since it is (mostly) an internal-only variable (remember, the size
is how many bytes are covered by the bitmap, not how many bytes the
bitmap occupies).  A later cleanup will convert dirty bitmap
internals to be entirely byte-based, eliminating the intermediate
sector rounding added here; and technically, since bdrv_getlength()
already rounds up to sectors, our use of DIV_ROUND_UP is more for
theoretical completeness than for any actual rounding.

Use is_power_of_2() while at it, instead of open-coding that.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake ebfcd2e75f dirty-bitmap: Change bdrv_dirty_bitmap_size() to report bytes
We're already reporting bytes for bdrv_dirty_bitmap_granularity();
mixing bytes and sectors in our return values is a recipe for
confusion.  A later cleanup will convert dirty bitmap internals
to be entirely byte-based, but in the meantime, we should report
the bitmap size in bytes.

The only external caller in qcow2-bitmap.c is temporarily more verbose
(because it is still using sector-based math), but will later be
switched to track progress by bytes instead of sectors.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00