Commit Graph

57 Commits

Author SHA1 Message Date
Eric Blake
8341f00dc2 block: Allow BDRV_REQ_FUA through blk_pwrite()
We have several block drivers that understand BDRV_REQ_FUA,
and emulate it in the block layer for the rest by a full flush.
But without a way to actually request BDRV_REQ_FUA during a
pass-through blk_pwrite(), FUA-aware block drivers like NBD are
forced to repeat the emulation logic of a full flush regardless
of whether the backend they are writing to could do it more
efficiently.

This patch just wires up a flags argument; followup patches
will actually make use of it in the NBD driver and in qemu-io.

Signed-off-by: Eric Blake <eblake@redhat.com>
Acked-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:08 +02:00
Kevin Wolf
72e775c7d9 block: Always set writeback mode in blk_new_open()
All callers of blk_new_open() either don't rely on the WCE bit set after
blk_new_open() because they explicitly set it anyway, or they pass
BDRV_O_CACHE_WB unconditionally.

This patch changes blk_new_open() so that it always enables writeback
mode and asserts that BDRV_O_CACHE_WB is clear. For those callers that
used to pass BDRV_O_CACHE_WB unconditionally, the flag is removed now.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30 12:16:01 +02:00
Markus Armbruster
da34e65cb4 include/qemu/osdep.h: Don't include qapi/error.h
Commit 57cb38b included qapi/error.h into qemu/osdep.h to get the
Error typedef.  Since then, we've moved to include qemu/osdep.h
everywhere.  Its file comment explains: "To avoid getting into
possible circular include dependencies, this file should not include
any other QEMU headers, with the exceptions of config-host.h,
compiler.h, os-posix.h and os-win32.h, all of which are doing a
similar job to this file and are under similar constraints."
qapi/error.h doesn't do a similar job, and it doesn't adhere to
similar constraints: it includes qapi-types.h.  That's in excess of
100KiB of crap most .c files don't actually need.

Add the typedef to qemu/typedefs.h, and include that instead of
qapi/error.h.  Include qapi/error.h in .c files that need it and don't
get it now.  Include qapi-types.h in qom/object.h for uint16List.

Update scripts/clean-includes accordingly.  Update it further to match
reality: replace config.h by config-target.h, add sysemu/os-posix.h,
sysemu/os-win32.h.  Update the list of includes in the qemu/osdep.h
comment quoted above similarly.

This reduces the number of objects depending on qapi/error.h from "all
of them" to less than a third.  Unfortunately, the number depending on
qapi-types.h shrinks only a little.  More work is needed for that one.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
[Fix compilation without the spice devel packages. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 22:20:15 +01:00
Max Reitz
efaa7c4eeb blockdev: Split monitor reference from BB creation
Before this patch, blk_new() automatically assigned a name to the new
BlockBackend and considered it referenced by the monitor. This patch
removes the implicit monitor_add_blk() call from blk_new() (and
consequently the monitor_remove_blk() call from blk_delete(), too) and
thus blk_new() (and related functions) no longer take a BB name
argument.

In fact, there is only a single point where blk_new()/blk_new_open() is
called and the new BB is monitor-owned, and that is in blockdev_init().
Besides thus relieving us from having to invent names for all of the BBs
we use in qemu-img, this fixes a bug where qemu cannot create a new
image if there already is a monitor-owned BB named "image".

If a BB and its BDS tree are created in a single operation, as of this
patch the BDS tree will be created before the BB is given a name
(whereas it was the other way around before). This results in minor
change to the output of iotest 087, whose reference output is amended
accordingly.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:56 +01:00
Kevin Wolf
10bf03af12 vhdx: Use BB functions in .bdrv_create()
All users of the block layers are supposed to go through a BlockBackend.
The .bdrv_create() implementation is one such user, so this patch
converts it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:43 +01:00
Kevin Wolf
6340472c54 block: Use writeback in .bdrv_create() implementations
There's no reason to use a writethrough cache mode while creating an
image.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:43 +01:00
Max Reitz
04a3615860 vhdx: Simplify vhdx_set_shift_bits()
For values which are powers of two (and we do assume all of these to
be), sizeof(x) * 8 - 1 - clz(x) == ctz(x). Therefore, use ctz().

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 1450451066-13335-3-git-send-email-mreitz@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-02-29 14:54:31 -05:00
Max Reitz
939901dcd2 vhdx: DIV_ROUND_UP() in vhdx_calc_bat_entries()
We have DIV_ROUND_UP(), so we can use it to produce more easily readable
code. It may be slower than the bit shifting currently performed
(because it actually performs a division), but since
vhdx_calc_bat_entries() is never used in a hot path, this is completely
fine.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 1450451066-13335-2-git-send-email-mreitz@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-02-29 14:54:31 -05:00
Peter Maydell
80c71a241a block: Clean up includes
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.

This commit was created with scripts/clean-includes.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-01-20 13:36:23 +01:00
Kevin Wolf
9a4f4c3156 block: Convert bs->file to BdrvChild
This patch removes the temporary duplication between bs->file and
bs->file_child by converting everything to BdrvChild.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-10-16 15:34:29 +02:00
Max Reitz
6ebf9aa2ef block: Drop drv parameter from bdrv_open()
Now that this parameter is effectively unused, we can drop it and just
pass NULL on to bdrv_open_inherit().

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-09-14 16:51:36 +02:00
Daniel P. Berrange
a8f15a2775 maint: remove double semicolons in many files
A number of source files have statements accidentally
terminated by a double semicolon - eg 'foo = bar;;'.
This is harmless but a mistake none the less.

The tcg/ia64/tcg-target.c file is whitelisted because
it has valid use of ';;' in a comment containing assembly
code.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2015-09-11 10:21:38 +03:00
Alberto Garcia
81e5f78a9f block: use bdrv_get_device_or_node_name() in error messages
There are several error messages that identify a BlockDriverState by
its device name. However those errors can be produced in nodes that
don't have a device name associated.

In those cases we should use bdrv_get_device_or_node_name() to fall
back to the node name and produce a more meaningful message. The
messages are also updated to use the more generic term 'node' instead
of 'device'.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 9823a1f0514fdb0692e92868661c38a9e00a12d6.1428485266.git.berto@igalia.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-04-28 15:36:09 +02:00
Kevin Wolf
d1a126c53d vhdx: Fix zero-fill iov length
Fix the length of the zero-fill for the back, which was accidentally
using the same value as for the front. This is caught by qemu-iotests
033.

For consistency, change the code for the front as well to use the length
stored in the iov (it is the same value, copied four lines above).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Jeff Cody <jcody@redhat.com>
2015-04-28 15:36:09 +02:00
Jeff Cody
cdf9634bdf block: vhdx - force FileOffsetMB field to '0' for certain block states
The v1.0.0 spec calls out PAYLOAD_BLOCK_ZERO FileOffsetMB field as being
'reserved'.  In practice, this means that Hyper-V will fail to read a
disk image with PAYLOAD_BLOCK_ZERO block states with a FileOffsetMB
value other than 0.

The other states that indicate a block that is not there
(PAYLOAD_BLOCK_UNDEFINED, PAYLOAD_BLOCK_NOT_PRESENT,
 PAYLOAD_BLOCK_UNMAPPED) have multiple options for what FileOffsetMB may
be set to, and '0' is explicitly called out as an option.

For all the above states, we will also just set the FileOffsetMB value
to 0.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: a9fe92f53f07e6ab1693811e4312c0d1e958500b.1421787566.git.jcody@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2015-01-23 12:41:32 -05:00
Jeff Cody
85b712c9d5 block: vhdx - set .bdrv_has_zero_init to bdrv_has_zero_init_1
Now that new VHDX images will default to BAT block states of
PAYLOAD_BLOCK_ZERO, we can indicate that VHDX has zero init.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 5e582703e36450b9ca939e2e5c9fa3930030f7fe.1418018421.git.jcody@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-12-12 16:35:35 +00:00
Jeff Cody
30af51ce7f block: vhdx - change .vhdx_create default block state to ZERO
The VHDX spec specifies that the default new block state is
PAYLOAD_BLOCK_NOT_PRESENT for a dynamic VHDX image, and
PAYLOAD_BLOCK_FULLY_PRESENT for a fixed VHDX image.

However, in order to create space-efficient VHDX images with qemu-img
convert, it is desirable to be able to set has_zero_init to true for
VHDX.

There is currently an option when creating VHDX images, to use block
state ZERO for new blocks.  However, this currently defaults to 'off'.
In order to be able to eventually set has_zero_init to true for VHDX,
this needs to default to 'on'.

This patch changes the default to 'on', and provides some help
information to warn against setting it to 'off' when using qemu-img
convert.

[Max Reitz pointed out that a full stop was missing at the end of the
VHDX_BLOCK_OPT_ZERO option help text.  I have added it.
--Stefan]

Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 85164899eacc86e150c3ceba793cf93b398dedd7.1418018421.git.jcody@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-12-12 15:42:49 +00:00
Jeff Cody
a9d1e9daa5 block: vhdx - update PAYLOAD_BLOCK_UNMAPPED value to match 1.00 spec
The 0.95 VHDX spec defined PAYLOAD_BLOCK_UNMAPPED to be 5.  The 1.00
VHDX spec redefines PAYLOAD_BLOCK_UNMAPPED to be 3 instead.

The original value of 5 is now an undefined state in the spec, but it
should be safe to treat it the same and return zeros for data read.
This way, we can maintain compatibility with any images out in the wild
that may have been created in accordance to the 0.95 spec.

Reported-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 8a4d2da73a8dbc04cde62bea782fc09ff84b1cf1.1418018421.git.jcody@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-12-12 15:42:22 +00:00
Jeff Cody
0571df44a1 block: vhdx - remove redundant comments
Minor cleanup.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: e8718ae3fd3e40a527e46a00e394973fbaab4d53.1418018421.git.jcody@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-12-12 15:42:22 +00:00
Markus Armbruster
bfb197e0d9 block: Eliminate BlockDriverState member device_name[]
device_name[] can become non-empty only in bdrv_new_root() and
bdrv_move_feature_fields().  The latter is used only to undo damage
done by bdrv_swap().  The former is called only by blk_new_with_bs().
Therefore, when a BlockDriverState's device_name[] is non-empty, then
it's been created with a BlockBackend, and vice versa.  Furthermore,
blk_new_with_bs() keeps the two names equal.

Therefore, device_name[] is redundant.  Eliminate it.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-20 13:41:26 +02:00
Peter Maydell
380f649e02 -----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
 
 iQEcBAABAgAGBQJUIAsHAAoJEJykq7OBq3PIyR4H/ikrs35Boxv08mR8rTyfpSnQ
 ERQhMnKWIe3cy5pzNG5TKiPliljF0FnkNC3KmBLU5TsoqXeW76WJF//Db5hNnTzG
 FAIeJu2RUSqhjqoz5K6TNYOJGH2XQ+/EZbMyrIeLBwYFn0gFMvZJOVYgpBWP0QTQ
 7sImrlxihUalwwL/6twfE6s5aA12DXN8hlC57u+9nvf+5ocaDyOJ7jUBU2EEhSNr
 TzDTO3gTCSEmnDriwKi3m3mIW/y7kLTXWyGolprZ0UpRyYRSmjcfgBHu8l0X8NCv
 lKkrYuE4V3QIzJk4BWbZQSPjoDlLbH9gnq3H+VwMxMZDxDKtAqAJRw/3H4yWhZ8=
 =JJEF
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging

# gpg: Signature made Mon 22 Sep 2014 12:41:59 BST using RSA key ID 81AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>"

* remotes/stefanha/tags/block-pull-request: (59 commits)
  block: Always compile virtio-blk dataplane
  vring: Better error handling if num is too large
  virtio: Import virtio_vring.h
  async: aio_context_new(): Handle event_notifier_init failure
  block: vhdx - fix reading beyond pointer during image creation
  block: delete cow block driver
  block/archipelago: Fix typo in qemu_archipelago_truncate()
  ahci: Add test_identify case to ahci-test.
  ahci: Add test_hba_enable to ahci-test.
  ahci: Add test_hba_spec to ahci-test.
  ahci: properly shadow the TFD register
  ahci: add test_pci_enable to ahci-test.
  ahci: Add test_pci_spec to ahci-test.
  ahci: MSI capability should be at 0x80, not 0x50.
  ahci: Adding basic functionality qtest.
  layout: Add generators for refcount table and blocks
  fuzz: Add fuzzing functions for entries of refcount table and blocks
  docs: List all image elements currently supported by the fuzzer
  qapi/block-core: Add "new" qcow2 options
  qcow2: Add overlap-check.template option
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-09-23 12:08:55 +01:00
Jeff Cody
e91a8b2fef block: vhdx - fix reading beyond pointer during image creation
In vhdx_create_metadata(), we allocate 40 bytes to entry_buffer for
the various metadata table entries.  However, we write out 64kB from
that buffer into the new file.  Only write out the correct 40 bytes.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-22 11:39:46 +01:00
Peter Maydell
c2ebb05e25 block/vhdx.c: Mark parent_vhdx_guid variable as unused
The parent_vhdx_guid variable is defined but never used, which provokes
complaints from newer versions of clang. Since the variable definition
is here acting as documentation of the image format, mark it with the
'unused' attribute to keep the compiler happy rather than simply
deleting it.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-22 11:35:02 +01:00
Adelina Tuvenie
a011898d25 block: allow creation of fixed vhdx images
When trying to create a fixed vhd image qemu-img will return the
following error:

 qemu-img: test.vhdx: Could not create image: Cannot allocate memory

This happens because of a incorrect check in vhdx.c. Specifficaly,
in vhdx_create_bat(), after allocating memory for the BAT entry,
there is a check to determine if the allocation was unsuccsessful.
The error comes from the fact that it checks if s->bat isn't NULL,
which is true in case of succsessful allocation,  and exits with
error ENOMEM.

Signed-off-by: Adelina Tuvenie <atuvenie@cloudbasesolutions.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-09-22 12:09:31 +04:00
Hu Tao
c2eb918e32 block: round up file size to nearest sector
Currently the file size requested by user is rounded down to nearest
sector, causing the actual file size could be a bit less than the size
user requested. Since some formats (like qcow2) record virtual disk
size in bytes, this can make the last few bytes cannot be accessed.

This patch fixes it by rounding up file size to nearest sector so that
the actual file size is no less than the requested file size.

Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-09-12 15:43:06 +02:00
Markus Armbruster
5839e53bbc block: Use g_new() & friends where that makes obvious sense
g_new(T, n) is neater than g_malloc(sizeof(T) * n).  It's also safer,
for two reasons.  One, it catches multiplication overflowing size_t.
Two, it returns T * rather than void *, which lets the compiler catch
more type errors.

Patch created with Coccinelle, with two manual changes on top:

* Add const to bdrv_iterate_format() to keep the types straight

* Convert the allocation in bdrv_drop_intermediate(), which Coccinelle
  inexplicably misses

Coccinelle semantic patch:

    @@
    type T;
    @@
    -g_malloc(sizeof(T))
    +g_new(T, 1)
    @@
    type T;
    @@
    -g_try_malloc(sizeof(T))
    +g_try_new(T, 1)
    @@
    type T;
    @@
    -g_malloc0(sizeof(T))
    +g_new0(T, 1)
    @@
    type T;
    @@
    -g_try_malloc0(sizeof(T))
    +g_try_new0(T, 1)
    @@
    type T;
    expression n;
    @@
    -g_malloc(sizeof(T) * (n))
    +g_new(T, n)
    @@
    type T;
    expression n;
    @@
    -g_try_malloc(sizeof(T) * (n))
    +g_try_new(T, n)
    @@
    type T;
    expression n;
    @@
    -g_malloc0(sizeof(T) * (n))
    +g_new0(T, n)
    @@
    type T;
    expression n;
    @@
    -g_try_malloc0(sizeof(T) * (n))
    +g_try_new0(T, n)
    @@
    type T;
    expression p, n;
    @@
    -g_realloc(p, sizeof(T) * (n))
    +g_renew(T, p, n)
    @@
    type T;
    expression p, n;
    @@
    -g_try_realloc(p, sizeof(T) * (n))
    +g_try_renew(T, p, n)

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-20 11:51:28 +02:00
Kevin Wolf
a67e128a4f vhdx: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.

This patch addresses the allocations in the vhdx block driver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-08-15 15:07:16 +02:00
Jeff Cody
4f75b52a07 block: VHDX endian fixes
This patch contains several changes for endian conversion fixes for
VHDX, particularly for big-endian machines (multibyte values in VHDX are
all on disk in LE format).

Tests were done with existing qemu-iotests on an IBM POWER7 (8406-71Y).
This includes sample images created by Hyper-V, both with dirty logs and
without.

In addition, VHDX image files created (and written to) on a BE machine
were tested on a LE machine, and vice-versa.

Reported-by: Markus Armburster <armbru@redhat.com>
Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-15 15:07:14 +02:00
Chunyan Liu
c282e1fdf7 cleanup QEMUOptionParameter
Now that all backend drivers are using QemuOpts, remove all
QEMUOptionParameter related codes.

Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
Signed-off-by: Chunyan Liu <cyliu@suse.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-06-16 17:23:21 +08:00
Chunyan Liu
5366092c7a vhdx.c: replace QEMUOptionParameter with QemuOpts
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
Signed-off-by: Chunyan Liu <cyliu@suse.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-06-16 17:23:21 +08:00
Chunyan Liu
83d0521a1e change block layer to support both QemuOpts and QEMUOptionParamter
Change block layer to support both QemuOpts and QEMUOptionParameter.
After this patch, it will change backend drivers one by one. At the end,
QEMUOptionParameter will be removed and only QemuOpts is kept.

Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
Signed-off-by: Chunyan Liu <cyliu@suse.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-06-16 17:23:20 +08:00
Jeff Cody
6906046169 block: vhdx - account for identical header sections
The VHDX spec v1.00 declares that "a header is current if it is the only
valid header or if it is valid and its SequenceNumber field is greater
than the other header’s SequenceNumber field. The parser must only use
data from the current header. If there is no current header, then the
VHDX file is corrupt."

However, the Disk2VHD tool from Microsoft creates a VHDX image file that
has 2 identical headers, including matching checksums and matching
sequence numbers.  Likely, as a shortcut the tool is just writing the
header twice, for the active and inactive headers, during the image
creation.  Technically, this should be considered a corrupt VHDX file
(at least per the 1.00 spec, and that is how we currently treat it).

But in order to accomodate images created with Disk2VHD, we can safely
create an exception for this case.  If we find identical sequence
numbers, then we check the VHDXHeader-sized chunks of each 64KB header
sections (we won't rely just on the crc32c to indicate the headers are
the same).  If they are identical, then we go ahead and use the first
one.

Reported-by: Nerijus Baliūnas <nerijus@users.sourceforge.net>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-05-19 11:36:48 +02:00
Jeff Cody
1d7678dec4 vhdx: Bounds checking for block_size and logical_sector_size (CVE-2014-0148)
Other variables (e.g. sectors_per_block) are calculated using these
variables, and if not range-checked illegal values could be obtained
causing infinite loops and other potential issues when calculating
BAT entries.

The 1.00 VHDX spec requires BlockSize to be min 1MB, max 256MB.
LogicalSectorSize is required to be either 512 or 4096 bytes.

Reported-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-04-01 14:19:09 +02:00
Paolo Bonzini
6890aad46b vhdx: correctly propagate errors
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-02-21 21:02:23 +01:00
Max Reitz
2e40134bfd block: Make bdrv_file_open() static
Add the bdrv_open() option BDRV_O_PROTOCOL which results in passing the
call to bdrv_file_open(). Additionally, make bdrv_file_open() static and
therefore bdrv_open() the only way to call it.

Consequently, all existing calls to bdrv_file_open() have to be adjusted
to use bdrv_open() with the BDRV_O_PROTOCOL flag instead.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-02-21 21:02:22 +01:00
Markus Armbruster
f50159fa9b block/vhdx: Error checking fixes
Errors are inadvertently ignored in a few places.  Has always been
broken.  Spotted by Coverity.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-01-31 22:05:03 +01:00
Max Reitz
72daa72eee block: Allow reference for bdrv_file_open()
Allow specifying a reference to an existing block device (by name) for
bdrv_file_open() instead of a filename and/or options.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-01-22 12:07:17 +01:00
Jeff Cody
7e30e6a674 block: vhdx - improve error message, and .bdrv_check implementation
If there is a dirty log file to be replayed in a VHDX image, it is
replayed in .vhdx_open().  However, if the file is opened read-only,
then a somewhat cryptic error message results.

This adds a more helpful error message for the user.  If an image file
contains a log to be replayed, and is opened read-only, the user is
instructed to run 'qemu-img check -r all' on the image file.

Running qemu-img check -r all will cause the image file to be opened
r/w, which will replay the log file.  If a log file replay is detected,
this is flagged, and bdrv_check will increase the corruptions_fixed
count for the image.

[Fixed typo in error message that was pointed out by Eric Blake
<eblake@redhat.com>.
--Stefan]

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-12-20 09:11:58 +01:00
Paolo Bonzini
95de6d7078 block drivers: add discard/write_zeroes properties to bdrv_get_info implementation
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-12-03 15:26:49 +01:00
Paolo Bonzini
97b00e2851 vpc, vhdx: add get_info
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-12-03 15:26:49 +01:00
Jeff Cody
3412f7b1bd block: vhdx - add .bdrv_create() support
This adds support for VHDX image creation, for images of type "Fixed"
and "Dynamic".  "Differencing" types (i.e., VHDX images with backing
files) are currently not supported.

Options for image creation include:
    * log size:
        The size of the journaling log for VHDX.  Minimum is 1MB,
        and it must be a multiple of 1MB. Invalid log sizes will be
        silently fixed by rounding up to the nearest MB.

        Default is 1MB.

    * block size:
        This is the size of a payload block.  The range is 1MB to 256MB,
        inclusive, and must be a multiple of 1MB as well.  Invalid sizes
        and multiples will be silently fixed.  If '0' is passed, then
        a sane size is chosen (depending on virtual image size).

        Default is 0 (Auto-select).

    * subformat:
        - "dynamic"
            An image without data pre-allocated.
        - "fixed"
            An image with data pre-allocated.

        Default is "dynamic"

When creating the image file, the lettered sections are created:

-----------------------------------------------------------------.
|   (A)    |   (B)    |    (C)    |     (D)       |     (E)
|  File ID |  Header1 |  Header 2 |  Region Tbl 1 |  Region Tbl 2
|          |          |           |               |
.-----------------------------------------------------------------.
0         64KB      128KB       192KB           256KB          320KB

.---- ~ ----------- ~ ------------ ~ ---------------- ~ -----------.
|     (F)     |     (G)       |    (H)    |
| Journal Log |  BAT / Bitmap |  Metadata |  .... data ......
|             |               |           |
.---- ~ ----------- ~ ------------ ~ ---------------- ~ -----------.
1MB         (var.)          (var.)      (var.)

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-11-07 13:58:59 +01:00
Jeff Cody
1e74a971cb block: vhdx - break out code operations to functions
This is preperation for vhdx_create().  The ability to write headers,
and calculate the number of BAT entries will be needed within the
create() functions, so move this relevant code into helper functions.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-11-07 13:58:59 +01:00
Jeff Cody
c325ee1de8 block: vhdx - move more endian translations to vhdx-endian.c
In preparation for vhdx_create(), move more endian translation
functions out to vhdx-endian.c.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-11-07 13:58:59 +01:00
Jeff Cody
0b7da092b4 block: vhdx - remove BAT file offset bit shifting
Bit shifting can be fun, but in this case it was unnecessary.  The
upper 44 bits of the 64-bit BAT entry is specifies the File Offset,
so we shifted the bits to get access to the value.

However, per the spec the value is in MB.  So we dutifully shifted back
to the left by 20 bits, to convert to a true uint64_t file offset.

This replaces those steps with just a bit mask, to get rid of the lower
20 bits instead.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-11-07 13:58:59 +01:00
Jeff Cody
d92aa8833c block: vhdx write support
This adds support for writing to VHDX image files, using coroutines.
Writes into the BAT table goes through the VHDX log.  Currently, BAT
table writes occur when expanding a dynamic VHDX file, and allocating a
new BAT entry.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-11-07 13:58:59 +01:00
Jeff Cody
1a848fd451 block: vhdx - add region overlap detection for image files
Regions in the image file cannot overlap - the log, region tables,
and metdata must all be unique and non-overlapping.

This adds region checking by means of a QLIST; there can be a variable
number of regions and metadata (there may be metadata or region tables
that we do not recognize / know about, but are not required).

This adds the capability to register a region for later checking, and
to check against registered regions for any overlap.

Also, if neither the BAT or Metadata region tables are found, return
error.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-11-07 13:58:59 +01:00
Jeff Cody
0a43a1b5d7 block: vhdx - log parsing, replay, and flush support
This adds support for VHDX v0 logs, as specified in Microsoft's
VHDX Specification Format v1.00:
https://www.microsoft.com/en-us/download/details.aspx?id=34750

The following support is added:

* Log parsing, and validation - validate that an existing log
  is correct.

* Log search - search through an existing log, to find any valid
  sequence of entries.

* Log replay and flush - replay an existing log, and flush/clear
  the log when complete.

The VHDX log is a circular buffer, with elements (sectors) of 4KB.

A log entry is a variably-length number of sectors, that is
comprised of a header and 'descriptors', that describe each sector.

A log may contain multiple entries, know as a log sequence.  In a log
sequence, each log entry immediately follows the previous entry, with an
incrementing sequence number.  There can only ever be one active and
valid sequence in the log.

Each log entry must match the file log GUID in order to be valid (along
with other criteria).  Once we have flushed all valid log entries, we
marked the file log GUID to be zero, which indicates a buffer with no
valid entries.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-11-07 13:58:58 +01:00
Jeff Cody
c46415afc2 block: vhdx code movement - move vhdx_close() above vhdx_open()
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-11-07 13:58:58 +01:00
Jeff Cody
c3906c5e82 block: vhdx - update log guid in header, and first write tracker
Allow tracking of first file write in the VHDX image, as well as
the ability to update the GUID in the header.  This is in preparation
for log support.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-11-07 13:58:58 +01:00
Jeff Cody
0f48e8f097 block: vhdx - break endian translation functions out
This moves the endian translation functions out from the vhdx.c source,
into a separate source file. In addition to the previously defined
endian functions, new endian translation functions for log support are
added as well.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-11-07 13:58:58 +01:00