nvme_init_queue() doesn't require BlockDriverState anymore.
Replace it by BDRVNVMeState to simplify.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-12-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
qemu_try_blockalign() is a generic API that call back to the
block driver to return its page alignment. As we call from
within the very same driver, we already know to page alignment
stored in our state. Remove indirections and use the value from
BDRVNVMeState.
This change is required to later remove the BlockDriverState
argument, to make nvme_init_queue() per hardware, and not per
block driver.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-11-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
In the next commit we'll get rid of qemu_try_blockalign().
To ease review, first replace qemu_try_blockalign0() by explicit
calls to qemu_try_blockalign() and memset().
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-10-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We allocate an unique chunk of memory then use it for two
different structures. By using an union, we make it clear
the data is overlapping (and we can remove the casts).
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-9-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We are going to modify the code in the next commit. Renaming
the 'resp' variable to 'id' first makes the next commit easier
to review. No logical changes.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-8-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Rearrange nvme_add_io_queue() by using a common error path.
This will be proven useful in few commits where we add IRQ
notification to the IO queues.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-7-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Do not use the same error message for different failures.
Display a different error whether it is the CQ or the SQ.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-6-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Use definitions instead of '0' or '1' indexes. Also this will
be useful when using multi-queues later.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-5-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
As nvme_create_queue_pair() is allowed to fail, replace the
alloc() calls by try_alloc() to avoid aborting QEMU.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-4-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Avoid further processing if TRACE_NVME_SUBMIT_COMMAND_RAW is
not enabled. This is an untested intend of performance optimization.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-3-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Use self-explicit SCALE_MS definition instead of magic value.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-2-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This makes nbd's connection_co yield during reconnects, so that
reconnect doesn't block the main thread. This is very important in
case of an unavailable nbd server host: connect() call may take a long
time, blocking the main thread (and due to reconnect, it will hang
again and again with small gaps of working time during pauses between
connection attempts).
Realization notes:
- We don't want to implement non-blocking connect() over non-blocking
socket, because getaddrinfo() doesn't have portable non-blocking
realization anyway, so let's just use a thread for both getaddrinfo()
and connect().
- We can't use qio_channel_socket_connect_async (which behaves
similarly and starts a thread to execute connect() call), as it's relying
on someone iterating main loop (g_main_loop_run() or something like
this), which is not always the case.
- We can't use thread_pool_submit_co API, as thread pool waits for all
threads to finish (but we don't want to wait for blocking reconnect
attempt on shutdown.
So, we just create the thread by hand. Some additional difficulties
are:
- We want our connect to avoid blocking drained sections and aio context
switches. To achieve this, we make it possible to "cancel" synchronous
wait for the connect (which is a coroutine yield actually), still,
the thread continues in background, and if successful, its result may be
reused on next reconnect attempt.
- We don't want to wait for reconnect on shutdown, so there is
CONNECT_THREAD_RUNNING_DETACHED thread state, which means that the block
layer is no longer interested in a result, and thread should close new
connected socket on finish and free the state.
How to reproduce the bug, fixed with this commit:
1. Create an image on node1:
qemu-img create -f qcow2 xx 100M
2. Start NBD server on node1:
qemu-nbd xx
3. Start vm with second nbd disk on node2, like this:
./x86_64-softmmu/qemu-system-x86_64 -nodefaults -drive \
file=/work/images/cent7.qcow2 -drive file=nbd+tcp://192.168.100.2 \
-vnc :0 -qmp stdio -m 2G -enable-kvm -vga std
4. Access the vm through vnc (or some other way?), and check that NBD
drive works:
dd if=/dev/sdb of=/dev/null bs=1M count=10
- the command should succeed.
5. Now, let's trigger nbd-reconnect loop in Qemu process. For this:
5.1 Kill NBD server on node1
5.2 run "dd if=/dev/sdb of=/dev/null bs=1M count=10" in the guest
again. The command should fail and a lot of error messages about
failing disk may appear as well.
Now NBD client driver in Qemu tries to reconnect.
Still, VM works well.
6. Make node1 unavailable on NBD port, so connect() from node2 will
last for a long time:
On node1 (Note, that 10809 is just a default NBD port):
sudo iptables -A INPUT -p tcp --dport 10809 -j DROP
After some time the guest hangs, and you may check in gdb that Qemu
hangs in connect() call, issued from the main thread. This is the
BUG.
7. Don't forget to drop iptables rule from your node1:
sudo iptables -D INPUT -p tcp --dport 10809 -j DROP
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200812145237.4396-1-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: minor wording and formatting tweaks]
Signed-off-by: Eric Blake <eblake@redhat.com>
The NVM Express specification generally uses 'zeroes' and not 'zeros',
so let us align with it.
Cc: Fam Zheng <fam@euphon.net>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Add missing fields in the Identify Controller and Identify Namespace
data structures to bring them in line with NVMe v1.3.
This also adds data structures and defines for SGL support which
requires a couple of trivial changes to the nvme block driver as well.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Acked-by: Fam Zheng <fam@euphon.net>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Message-Id: <20200706061303.246057-2-its@irrelevant.dk>
Remove superfluous breaks, as there is a "return" before them.
Signed-off-by: Liao Pingfang <liao.pingfang@zte.com.cn>
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1594631107-36574-1-git-send-email-wang.yi59@zte.com.cn>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
The qcow2 driver needs the zlib dependency. While emulators
provided it through the migration code, this is not true of
the tools. Move the dependency from the qcow1 rule directly
into block_ss so that it is included unconditionally.
Fixes build with --disable-qcow1.
Reported-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Cc: qemu-block@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Move typedef closer to the type check macros, to make it easier
to convert the code to OBJECT_DEFINE_TYPE() in the future.
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Tested-By: Roman Bolshakov <r.bolshakov@yadro.com>
Message-Id: <20200825192110.3528606-17-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
This function is only used by qcow2_expand_zero_clusters() to
downgrade a qcow2 image to a previous version. This would require
transforming all extended L2 entries into normal L2 entries but this
is not a simple task and there are no plans to implement this at the
moment.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <15e65112b4144381b4d8c0bdf8fb76b0d813e3d1.1594396418.git.berto@igalia.com>
[mreitz: Fixed comment style]
Signed-off-by: Max Reitz <mreitz@redhat.com>
Traditional qcow2 images don't allow preallocation if a backing file
is set. This is because once a cluster is allocated there is no way to
tell that its data should be read from the backing file.
Extended L2 entries have individual allocation bits for each
subcluster, and therefore it is perfectly possible to have an
allocated cluster with all its subclusters unallocated.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <6d5b0f38e7dc5f2f31d8cab1cb92044e9909aece.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Now that the implementation of subclusters is complete we can finally
add the necessary options to create and read images with this feature,
which we call "extended L2 entries".
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <6476caaa73216bd05b7bb2d504a20415e1665176.1594396418.git.berto@igalia.com>
[mreitz: %s/5\.1/5.2/; fixed 302's and 303's reference output]
Signed-off-by: Max Reitz <mreitz@redhat.com>
This field allows us to indicate that the L2 metadata update does not
come from a write request with actual data but from a preallocation
request.
For traditional images this does not make any difference, but for
images with extended L2 entries this means that the clusters are
allocated normally in the L2 table but individual subclusters are
marked as unallocated.
This will allow preallocating images that have a backing file.
There is one special case: when we resize an existing image we can
also request that the new clusters are preallocated. If the image
already had a backing file then we have to hide any possible stale
data and zero out the new clusters (see commit 955c7d6687 for more
details).
In this case the subclusters cannot be left as unallocated so the L2
bitmap must be updated.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <960d4c444a4f5a870e2b47e5da322a73cd9a2f5a.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Extended L2 entries are bigger than normal L2 entries so this has an
impact on the amount of metadata needed for a qcow2 file.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <7efae2efd5e36b42d2570743a12576d68ce53685.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
This works now at the subcluster level and pwrite_zeroes_alignment is
updated accordingly.
qcow2_cluster_zeroize() is turned into qcow2_subcluster_zeroize() with
the following changes:
- The request can now be subcluster-aligned.
- The cluster-aligned body of the request is still zeroized using
zero_in_l2_slice() as before.
- The subcluster-aligned head and tail of the request are zeroized
with the new zero_l2_subclusters() function.
There is just one thing to take into account for a possible future
improvement: compressed clusters cannot be partially zeroized so
zero_l2_subclusters() on the head or the tail can return -ENOTSUP.
This makes the caller repeat the *complete* request and write actual
zeroes to disk. This is sub-optimal because
1) if the head area was compressed we would still be able to use
the fast path for the body and possibly the tail.
2) if the tail area was compressed we are writing zeroes to the
head and the body areas, which are already zeroized.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <17e05e2ee7e12f10dcf012da81e83ebe27eb3bef.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
The bdrv_co_pwrite_zeroes() call here fills complete clusters with
zeroes, but it can happen that some subclusters are not part of the
write request or the copy-on-write. This patch makes sure that only
the affected subclusters are overwritten.
A potential improvement would be to also fill with zeroes the other
subclusters if we can guarantee that we are not overwriting existing
data. However this would waste more disk space, so we should first
evaluate if it's really worth doing.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <b3dc97e8e2240ddb5191a4f930e8fc9653f94621.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Compressed clusters always have the bitmap part of the extended L2
entry set to 0.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <04455b3de5dfeb9d1cfe1fc7b02d7060a6e09710.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
The L2 bitmap needs to be updated after each write to indicate what
new subclusters are now allocated. This needs to happen even if the
cluster was already allocated and the L2 entry was otherwise valid.
In some cases however a write operation doesn't need change the L2
bitmap (because all affected subclusters were already allocated). This
is detected in calculate_l2_meta(), and qcow2_alloc_cluster_link_l2()
is never called in those cases.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <0875620d49f44320334b6a91c73b3f301f975f38.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
The offset field of an uncompressed cluster's L2 entry must be aligned
to the cluster size, otherwise it is invalid. If the cluster has no
data then it means that the offset points to a preallocation, so we
can clear the offset field without affecting the guest-visible data.
This is what 'qemu-img check' does when run in repair mode.
On traditional qcow2 images this can only happen when QCOW_OFLAG_ZERO
is set, and repairing such entries turns the clusters from ZERO_ALLOC
into ZERO_PLAIN.
Extended L2 entries have no ZERO_ALLOC clusters and no QCOW_OFLAG_ZERO
but the idea is the same: if none of the subclusters are allocated
then we can clear the offset field and leave the bitmap untouched.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <9f4ed1d0a34b0a545b032c31ecd8c14734065342.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Two things need to be taken into account here:
1) With full_discard == true the L2 entry must be cleared completely.
This also includes the L2 bitmap if the image has extended L2
entries.
2) With full_discard == false we have to make the discarded cluster
read back as zeroes. With normal L2 entries this is done with the
QCOW_OFLAG_ZERO bit, whereas with extended L2 entries this is done
with the individual 'all zeroes' bits for each subcluster.
Note however that QCOW_OFLAG_ZERO is not supported in v2 qcow2
images so, if there is a backing file, discard cannot guarantee
that the image will read back as zeroes. If this is important for
the caller it should forbid it as qcow2_co_pdiscard() does (see
80f5c01183 for more details).
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <5ef8274e628aa3ab559bfac467abf488534f2b76.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
The QCOW_OFLAG_ZERO bit that indicates that a cluster reads as
zeroes is only used in standard L2 entries. Extended L2 entries use
individual 'all zeroes' bits for each subcluster.
This must be taken into account when updating the L2 entry and also
when deciding that an existing entry does not need to be updated.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <b61d61606d8c9b367bd641ab37351ddb9172799a.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
The logic of this function remains pretty much the same, except that
it uses count_contiguous_subclusters(), which combines the logic of
count_contiguous_clusters() / count_contiguous_clusters_unallocated()
and checks individual subclusters.
qcow2_cluster_to_subcluster_type() is not necessary as a separate
function anymore so it's inlined into its caller.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <d2193fd48653a350d80f0eca1c67b1d9053fb2f3.1594396418.git.berto@igalia.com>
[mreitz: Initialize expected_type to anything]
Signed-off-by: Max Reitz <mreitz@redhat.com>
If an image has subclusters then there are more copy-on-write
scenarios that we need to consider. Let's say we have a write request
from the middle of subcluster #3 until the end of the cluster:
1) If we are writing to a newly allocated cluster then we need
copy-on-write. The previous contents of subclusters #0 to #3 must
be copied to the new cluster. We can optimize this process by
skipping all leading unallocated or zero subclusters (the status of
those skipped subclusters will be reflected in the new L2 bitmap).
2) If we are overwriting an existing cluster:
2.1) If subcluster #3 is unallocated or has the all-zeroes bit set
then we need copy-on-write (on subcluster #3 only).
2.2) If subcluster #3 was already allocated then there is no need
for any copy-on-write. However we still need to update the L2
bitmap to reflect possible changes in the allocation status of
subclusters #4 to #31. Because of this, this function checks
if all the overwritten subclusters are already allocated and
in this case it returns without creating a new QCowL2Meta
structure.
After all these changes l2meta_cow_start() and l2meta_cow_end()
are not necessarily cluster-aligned anymore. We need to update the
calculation of old_start and old_end in handle_dependencies() to
guarantee that no two requests try to write on the same cluster.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <4292dd56e4446d386a2fe307311737a711c00708.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
When dealing with subcluster types there is a new value called
QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC that has no equivalent in
QCow2ClusterType.
This patch handles that value in all places where subcluster types
are processed.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <bf09e2e2439a468a901bb96ace411eed9ee50295.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
In order to support extended L2 entries some functions of the qcow2
driver need to start dealing with subclusters instead of clusters.
qcow2_get_host_offset() is modified to return the subcluster type
instead of the cluster type, and all callers are updated to replace
all values of QCow2ClusterType with their QCow2SubclusterType
equivalents.
This patch only changes the data types, there are no semantic changes.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <f6c29737c295f32cbee74c903c30b01820363b34.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
This function returns an integer that can be either an error code or a
cluster type (a value from the QCow2ClusterType enum).
We are going to start using subcluster types instead of cluster types
in some functions so it's better to use the exact data types instead
of integers for clarity and in order to detect errors more easily.
This patch makes qcow2_get_host_offset() return 0 on success and
puts the returned cluster type in a separate parameter. There are no
semantic changes.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <396b6eab1859a271551dcd7dcba77f8934aa3c3f.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
This helper function tells us if a cluster is allocated (that is,
there is an associated host offset for it).
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <6d8771c5c79cbdc6c519875a5078e1cc85856d63.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
There are situations in which we want to know how many contiguous
subclusters of the same type there are in a given cluster. This can be
done by simply iterating over the subclusters and repeatedly calling
qcow2_get_subcluster_type() for each one of them.
However once we determined the type of a subcluster we can check the
rest efficiently by counting the number of adjacent ones (or zeroes)
in the bitmap. This is what this function does.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <db917263d568ec6ffb4a41cac3c9100f96bf6c18.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
This patch adds QCow2SubclusterType, which is the subcluster-level
version of QCow2ClusterType. All QCOW2_SUBCLUSTER_* values have the
the same meaning as their QCOW2_CLUSTER_* equivalents (when they
exist). See below for details and caveats.
In images without extended L2 entries clusters are treated as having
exactly one subcluster so it is possible to replace one data type with
the other while keeping the exact same semantics.
With extended L2 entries there are new possible values, and every
subcluster in the same cluster can obviously have a different
QCow2SubclusterType so functions need to be adapted to work on the
subcluster level.
There are several things that have to be taken into account:
a) QCOW2_SUBCLUSTER_COMPRESSED means that the whole cluster is
compressed. We do not support compression at the subcluster
level.
b) There are two different values for unallocated subclusters:
QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN which means that the whole
cluster is unallocated, and QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC
which means that the cluster is allocated but the subcluster is
not. The latter can only happen in images with extended L2
entries.
c) QCOW2_SUBCLUSTER_INVALID is used to detect the cases where an L2
entry has a value that violates the specification. The caller is
responsible for handling these situations.
To prevent compatibility problems with images that have invalid
values but are currently being read by QEMU without causing side
effects, QCOW2_SUBCLUSTER_INVALID is only returned for images
with extended L2 entries.
qcow2_cluster_to_subcluster_type() is added as a separate function
from qcow2_get_subcluster_type(), but this is only temporary and both
will be merged in a subsequent patch.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <26ef38e270f25851c98b51278852b4c4a7f97e69.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Extended L2 entries are 128-bit wide: 64 bits for the entry itself and
64 bits for the subcluster allocation bitmap.
In order to support them correctly get/set_l2_entry() need to be
updated so they take the entry width into account in order to
calculate the correct offset.
This patch also adds the get/set_l2_bitmap() functions that are
used to access the bitmaps. For convenience we allow calling
get_l2_bitmap() on images without subclusters. In this case the
returned value is always 0 and has no meaning.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <6ee0f81ae3329c991de125618b3675e1e46acdbb.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
qcow2 images with subclusters have 128-bit L2 entries. The first 64
bits contain the same information as traditional images and the last
64 bits form a bitmap with the status of each individual subcluster.
Because of that we cannot assume that L2 entries are sizeof(uint64_t)
anymore. This function returns the proper value for the image.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <d34d578bd0380e739e2dde3e8dd6187d3d249fa9.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Like offset_into_cluster() and size_to_clusters(), but for
subclusters.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <3cc2390dcdef3d234d47c741b708bd8734490862.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
For a given offset, return the subcluster number within its cluster
(i.e. with 32 subclusters per cluster it returns a number between 0
and 31).
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <56e3e4ac0d827c6a2f5f259106c5ddb7c4ca2653.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
This patch adds the following new fields to BDRVQcow2State:
- subclusters_per_cluster: Number of subclusters in a cluster
- subcluster_size: The size of each subcluster, in bytes
- subcluster_bits: No. of bits so 1 << subcluster_bits = subcluster_size
Images without subclusters are treated as if they had exactly one
subcluster per cluster (i.e. subcluster_size = cluster_size).
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <55bfeac86b092fa2c9d182a95cbeb479ff7eca4f.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
This function will be used by the qcow2 code to check if an image has
subclusters or not.
At the moment this simply returns false. Once all patches needed for
subcluster support are ready then QEMU will be able to create and
read images with subclusters and this function will return the actual
value.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <905526221083581a1b7057bca1585487661c5c13.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
The size of an L2 entry is 64 bits, but if we want to have subclusters
we need extended L2 entries. This means that we have to access L2
tables and slices differently depending on whether an image has
extended L2 entries or not.
This patch replaces all l2_slice[] accesses with calls to
get_l2_entry() and set_l2_entry().
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <9586363531fec125ba1386e561762d3e4224e9fc.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
When writing to a qcow2 file there are two functions that take a
virtual offset and return a host offset, possibly allocating new
clusters if necessary:
- handle_copied() looks for normal data clusters that are already
allocated and have a reference count of 1. In those clusters we
can simply write the data and there is no need to perform any
copy-on-write.
- handle_alloc() looks for clusters that do need copy-on-write,
either because they haven't been allocated yet, because their
reference count is != 1 or because they are ZERO_ALLOC clusters.
The ZERO_ALLOC case is a bit special because those are clusters that
are already allocated and they could perfectly be dealt with in
handle_copied() (as long as copy-on-write is performed when required).
In fact, there is extra code specifically for them in handle_alloc()
that tries to reuse the existing allocation if possible and frees them
otherwise.
This patch changes the handling of ZERO_ALLOC clusters so the
semantics of these two functions are now like this:
- handle_copied() looks for clusters that are already allocated and
which we can overwrite (NORMAL and ZERO_ALLOC clusters with a
reference count of 1).
- handle_alloc() looks for clusters for which we need a new
allocation (all other cases).
One important difference after this change is that clusters found
in handle_copied() may now require copy-on-write, but this will be
necessary anyway once we add support for subclusters.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <eb17fc938f6be7be2e8d8ff42763d2c19241f866.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
We are going to need it in other places.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <65e5d9627ca2ebe7e62deaeddf60949c33067d9d.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
handle_alloc() creates a QCowL2Meta structure in order to update the
image metadata and perform the necessary copy-on-write operations.
This patch moves that code to a separate function so it can be used
from other places.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <e5bc4a648dac31972bfa7a0e554be8064be78799.1594396418.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>