Commit Graph

235 Commits

Author SHA1 Message Date
Anthony Liguori 687db4ed2e block-verify: fix 32-bit build
Reported-by: Peter Lemenkov <lemenkov@gmail.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-09-22 14:46:33 -05:00
Stefan Hajnoczi d9d334176c blkverify: Add block driver for verifying I/O
The blkverify block driver makes investigating image format data
corruption much easier.  A raw image initialized with the same contents
as the test image (e.g. qcow2 file) must be provided.  The raw image
mirrors read/write operations and is used to verify that data read from
the test image is correct.

See docs/blkverify.txt for more information.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-09-21 17:00:53 +02:00
Kevin Wolf 6f5f060b73 qcow2: Avoid bounce buffers for AIO write requests
qcow2 used to use bounce buffers for any AIO requests. This does not only imply
unnecessary copying, but also unbounded allocations which should be avoided.

This patch removes bounce buffers from the normal AIO write path. Encrypted
images continue to use a bounce buffer, however with constant size.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-09-21 15:39:43 +02:00
Kevin Wolf bd28f83565 qcow2: Avoid bounce buffers for AIO read requests
qcow2 used to use bounce buffers for any AIO requests. This does not only imply
unnecessary copying, but also unbounded allocations which should be avoided.

This patch removes bounce buffers from the normal AIO read path, and constrains
them to a constant size for encrypted images.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-09-21 15:39:42 +02:00
Kevin Wolf 9f8e668eb1 qcow2: Get rid of additional sync on COW
We always have a sync for the refcount update when a new cluster is
allocated. If we move this past the COW, we can save an additional sync.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-09-21 15:39:42 +02:00
Kevin Wolf 29216ed14f qcow2: Move sync out of qcow2_alloc_clusters
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-09-21 15:39:42 +02:00
Kevin Wolf 1c4c28149f qcow2: Move sync out of update_refcount
Note that the flush is omitted intentionally in qcow2_free_clusters. If
anything, we can leak clusters here if we lose the writes.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-09-21 15:39:42 +02:00
Kevin Wolf c01828fb51 qcow2: Move sync out of write_refcount_block_entries
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-09-21 15:39:42 +02:00
Laurent Vivier c2e2872bf4 nbd: correctly manage default port
block/nbd.c: use default port number when none is specified
qemu-nbd.c:  use IANA-assigned port number: 10809

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-09-21 15:39:42 +02:00
Christoph Hellwig 581b9e29f3 raw-posix: handle > 512 byte alignment correctly
Replace the hardcoded handling of 512 byte alignment with bs->buffer_alignment
to handle larger sector size devices correctly.

Note that we can not rely on it to be initialize in bdrv_open, so deal
with the worst case there.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-09-21 15:39:42 +02:00
Kevin Wolf a655211ac6 vvfat: Use cache=unsafe
The qcow file used for write support in vvfat is a temporary file,
so we can use cache=unsafe there. Without this, write support is just
too slow to be of any use.

Signed-off-by: Kevin Wolf <mail@kevin-wolf.de>
2010-09-21 15:39:42 +02:00
Kevin Wolf 9217e26f43 vvfat: Fix double free for opening the image rw
Allocation and deallocation of bs->opaque is not in the control of a
block driver. Therefore it should not set bs->opaque to a data structure
used by another bs, or closing the image will lead to a double free.

Signed-off-by: Kevin Wolf <mail@kevin-wolf.de>
2010-09-21 15:39:42 +02:00
Kevin Wolf ac48e389d0 vvfat: Fix segfault on write to read-only disk
vvfat tries to set the readonly flag in its open function, but nowadays
this is overwritted with the readonly=... command line option. Check in
bdrv_write if the vvfat was opened read-only and return an error in this
case.

Without this check, vvfat tries to access the qcow bs, which is NULL
without enabled write support.

Signed-off-by: Kevin Wolf <mail@kevin-wolf.de>
2010-09-21 15:39:42 +02:00
Blue Swirl 95ee3914bf blkdebug: fix enum comparison
The signedness of enum types depend on the compiler implementation.
Therefore the check for negative values may or may not be meaningful.

Fix by explicitly casting to a signed integer.

Since the values are also checked earlier against event_names
table, this is an internal error. Change the 'if' to 'assert'.

This also avoids a warning with GCC flag -Wtype-limits.

Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2010-09-18 05:53:15 +00:00
Anthony Liguori 8b33d9eeba Revert "Make default invocation of block drivers safer (v3)"
This reverts commit 79368c81bf.

Conflicts:

	block.c

I haven't been able to come up with a solution yet for the corruption caused by
unaligned requests from the IDE disk so revert until a solution can be written.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-09-08 17:09:15 -05:00
Kevin Wolf 7ec5e6a4ca qcow2: Remove unnecessary flush after L2 write
When a new cluster was allocated, we only need a flush after the write to the
L2 table if it was a COW and we need to decrease the refcounts of the old
clusters.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-09-08 12:39:24 +02:00
Bernhard Kohl 05acda4d16 raw-posix: improve detection of scsi-generic devices
Allow symbolic links which point to /dev/sgX devices.

Signed-off-by: Bernhard Kohl <bernhard.kohl@nsn.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-09-08 12:39:19 +02:00
Kevin Wolf 897804d629 raw-posix: Don't use file name for host_cdrom detection on Linux
On Linux, we have code to detect CD-ROMs using an ioctl. We shouldn't lose
anything but false positives by removing the check for a /dev/cd* path.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-09-08 12:39:16 +02:00
Laurent Vivier 1d45f8b542 nbd: Introduce NBD named exports.
This patch allows to connect Qemu using NBD protocol to an nbd-server
using named exports.

For instance, if on the host "isoserver", in /etc/nbd-server/config, you have:

[generic]
[debian-500-ppc-netinst]
        exportname = /ISO/debian-500-powerpc-netinst.iso
[Fedora-10-ppc-netinst]
        exportname = /ISO/Fedora-10-ppc-netinst.iso

You can connect to it, using:

    qemu -cdrom nbd:isoserver:exportname=debian-500-ppc-netinst
    qemu -cdrom nbd:isoserver:exportname=Fedora-10-ppc-netinst

NOTE: you need at least nbd-server 2.9.18

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-08-30 18:29:22 +02:00
Loïc Minier 2aa326be0d vvfat: fat_chksum(): fix access above array bounds
Signed-off-by: Loïc Minier <loic.minier@linaro.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-08-30 18:29:22 +02:00
Izumi Tsutsui 010cb2b314 sheepdog: remove unnecessary includes
"qemu_socket.h" includes all necessary files and
including <netinet/tcp.h> without <netinet/in.h>
could cause errors on some systems.

Signed-off-by: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-08-30 18:29:22 +02:00
Kevin Wolf 336c1c1255 block: Fix bdrv_has_zero_init
Assuming that any image on a block device is not properly zero-initialized is
actually wrong: Only raw images have this problem. Any other image format
shouldn't care about it, they initialize everything properly themselves.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-08-03 15:57:22 +02:00
Stefan Weil 2ee9fb4801 block: Replace u_int8_t, u_int16_t, u_int32_t, u_int64_t by standard int types
There is no need to have a second set of integral types.
Replace them by the standard types from stdint.h.

Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2010-07-25 16:59:38 +02:00
Anthony Liguori 79368c81bf Make default invocation of block drivers safer (v3)
CVE-2008-2004 described a vulnerability in QEMU whereas a malicious user could
trick the block probing code into accessing arbitrary files in a guest.  To
mitigate this, we added an explicit format parameter to -drive which disabling
block probing.

Fast forward to today, and the vast majority of users do not use this parameter.
libvirt does not use this by default nor does virt-manager.

Most users want block probing so we should try to make it safer.

This patch adds some logic to the raw device which attempts to detect a write
operation to the beginning of a raw device.  If the first 4 bytes happen to
match an image file that has a backing file that we support, it scrubs the
signature to all zeros.  If a user specifies an explicit format parameter, this
behavior is disabled.

I contend that while a legitimate guest could write such a signature to the
header, we would behave incorrectly anyway upon the next invocation of QEMU.
This simply changes the incorrect behavior to not involve a security
vulnerability.

I've tested this pretty extensively both in the positive and negative case.  I'm
not 100% confident in the block layer's ability to deal with zero sized writes
particularly with respect to the aio functions so some additional eyes would be
appreciated.

Even in the case of a single sector write, we have to make sure to invoked the
completion from a bottom half so just removing the zero sized write is not an
option.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-07-15 08:17:06 -05:00
MORITA Kazutaka 6defcc3784 sheepdog: fix compile error on systems without TCP_CORK
WIN32 is not only the system which doesn't have TCP_CORK (e.g. OS X).

Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2010-07-07 20:54:56 +03:00
MORITA Kazutaka 33b1db1c88 block: add sheepdog driver for distributed storage support
Sheepdog is a distributed storage system for QEMU. It provides highly
available block level storage volumes to VMs like Amazon EBS.  This
patch adds a qemu block driver for Sheepdog.

Sheepdog features are:
- No node in the cluster is special (no metadata node, no control
  node, etc)
- Linear scalability in performance and capacity
- No single point of failure
- Autonomous management (zero configuration)
- Useful volume management support such as snapshot and cloning
- Thin provisioning
- Autonomous load balancing

The more details are available at the project site:
    http://www.osrg.net/sheepdog/

Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-07-06 17:05:50 +02:00
Markus Armbruster 65d21bc73b raw-posix: Fix test for host CD-ROM
raw_pread_aligned() retries up to two times if the block device backs
a virtual CD-ROM (a drive with media=cdrom and if=ide, scsi, xen or
none).  This makes no sense.  Whether retrying reads can correct read
errors can only depend on what we're reading, not on how the result
gets used.  We need to check what whether we're reading from a
physical CD-ROM or floppy here.

I doubt retrying is useful even then.  Left for another day.

Impact:

* Virtual CD-ROM backed by host_cdrom behaves the same.

* Virtual CD-ROM backed by file or host_device no longer retries.

* A drive backed by host_cdrom now retries even if it's not a virtual
  CD-ROM.

* Any drive backed by host_floppy now retries.

While there, clean up gratuitous use of goto.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-07-06 17:05:49 +02:00
Kevin Wolf 9ac228e02c qcow2/vdi: Change check to distinguish error cases
This distinguishes between harmless leaks and real corruption. Hopefully users
better understand what qemu-img check wants to tell them.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-07-06 17:05:49 +02:00
Kevin Wolf 8db520cee8 blkdebug: Initialize state as 1
state = 0 in rules means that the rule is valid for any state. Therefore it's
impossible to have a rule that works only in the initial state. This changes
the initial state from 0 to 1 to make this possible.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-07-02 13:18:02 +02:00
Kevin Wolf 698f0d52cd blkdebug: Free QemuOpts after having read the config
Forgetting to free them means that the next instance inherits all rules and
gets its own rules only additionally.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-07-02 13:18:02 +02:00
Kevin Wolf 327cdad416 blkdebug: Fix set_state_opts definition
The list head was initialized to point to the wrong list, so all actions ended
up being handled as inject-error even if they were set-state in fact.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-07-02 13:18:02 +02:00
Kevin Wolf 19dbcbf7cc qcow2: Fix error handling during metadata preallocation
People were wondering why qemu-img check failed after they tried to preallocate
a large qcow2 file and ran out of disk space.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-07-02 13:18:01 +02:00
Kevin Wolf f74550fd53 qcow2: Don't try to check tables that couldn't be loaded
Trying to check them leads to a second error message which is more confusing
than helpful:

    Can't get refcount for cluster 0: Invalid argument
    ERROR cluster 0 refcount=-22 reference=1

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-06-22 14:38:02 +02:00
Kevin Wolf 6882c8fa78 qcow2: Fix qemu-img check segfault on corrupted images
With corrupted images, we can easily get an cluster index that exceeds the
array size of the temporary refcount table.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-06-22 14:38:02 +02:00
Kevin Wolf 078a458e07 vpc: Use bdrv_(p)write_sync for metadata writes
Use bdrv_(p)write_sync to ensure metadata integrity in case of a crash.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-06-22 14:38:02 +02:00
Kevin Wolf b8852e87d9 vmdk: Use bdrv_(p)write_sync for metadata writes
Use bdrv_(p)write_sync to ensure metadata integrity in case of a crash.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-06-22 14:38:02 +02:00
Kevin Wolf 8b3b720620 qcow2: Use bdrv_(p)write_sync for metadata writes
Use bdrv_(p)write_sync to ensure metadata integrity in case of a crash.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-06-22 14:38:02 +02:00
Kevin Wolf 5e5557d970 qcow: Use bdrv_(p)write_sync for metadata writes
Use bdrv_(p)write_sync to ensure metadata integrity in case of a crash.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-06-22 14:38:02 +02:00
Kevin Wolf b0ad5a455d cow: Use bdrv_(p)write_sync for metadata writes
Use bdrv_(p)write_sync to ensure metadata integrity in case of a crash.
While at it, correct the wrong usage of errno.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-06-22 14:38:02 +02:00
Christoph Hellwig 2063392ae5 cow: use qemu block API
Use bdrv_pwrite to access the backing device instead of pread, and
convert the driver to implementing the bdrv_open method which gives
it an already opened BlockDriverState for the underlying device.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-06-15 09:41:59 +02:00
Christoph Hellwig 893a9cb47c cow: stop using mmap
We don't have an equivalent to mmap in the qemu block API, so read and
write the bitmap directly.  At least in the dumb implementation added
in this patch this is a lot less efficient, but it means cow can also
work on windows, and over nbd or curl.  And it fixes qemu-iotests testcase
012 which did not work properly due to issues with read-only mmap access.

In addition we can also get rid of the now unused get_mmap_addr function.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-06-15 09:41:59 +02:00
Christoph Hellwig 122bb9e32d cow: use pread/pwrite
Use pread/pwrite instead of lseek + read/write in preparation of using the
qemu block API.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-06-15 09:41:59 +02:00
Kevin Wolf 68dba0bf45 qcow2: Restore L1 entry on l2_allocate failure
If writing the L1 table to disk failed, we need to restore its old content in
memory to avoid inconsistencies.

Reported-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-06-15 09:41:58 +02:00
Kevin Wolf e14e8ba5d0 qcow2: Return real error code in load_refcount_block
This fixes load_refcount_block which completely ignored the return value of
write_refcount_block and always returned -EIO for bdrv_pwrite failure.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-06-15 09:41:58 +02:00
Kevin Wolf 2eaa8f6338 qcow2: Allow alloc_clusters_noref to return errors
Currently it would consider blocks for which get_refcount fails used. However,
it's unlikely that get_refcount would succeed for the next cluster, so it's not
really helpful. Return an error instead.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-06-15 09:41:58 +02:00
Kevin Wolf 018faafdbd qcow2: Allow get_refcount to return errors
get_refcount might need to load a refcount block from disk, so errors may
happen. Return the error code instead of assuming a refcount of 1 and change
the callers to respect error return values.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-06-15 09:41:58 +02:00
Kevin Wolf 6c6ea921ff vpc: Read/write multiple sectors at once
This changes the vpc block driver (for VHD) to read/write multiple sectors at
once instead of doing a request for each single sector.

Before this, running qemu-iotests for VPC took ages, now it's actually quite
reasonable to run it always (down from ~1 hour to 40 seconds for me).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-06-15 09:41:58 +02:00
Anthony Liguori a4673e2762 Merge remote branch 'kwolf/for-anthony' into staging
Conflicts:
	hw/pc.c
2010-06-14 10:33:36 -05:00
Paul Brook 11165820d1 Move stdbool.h
Move inclusion of stdbool.h to common header files, instead of including
in an ad-hoc manner.

Signed-off-by: Paul Brook <paul@codesourcery.com>
2010-06-13 19:00:50 +01:00
Jes Sorensen 9040385dcc Cleanup: raw-posix.c: Be more consistent using BDRV_SECTOR_SIZE instead of 512
Clean up raw-posix.c to be more consistent using BDRV_SECTOR_SIZE
instead of hard coded 512 values.

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-06-04 11:43:39 +02:00