Commit Graph

2868 Commits

Author SHA1 Message Date
Paolo Bonzini
1919631e6b block: explicitly acquire aiocontext in bottom halves that need it
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170213135235.12274-15-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-02-21 11:39:39 +00:00
Paolo Bonzini
9d45665448 block: explicitly acquire aiocontext in callbacks that need it
This covers both file descriptor callbacks and polling callbacks,
since they execute related code.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170213135235.12274-14-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-02-21 11:39:36 +00:00
Paolo Bonzini
2f47da5f7f block: explicitly acquire aiocontext in timers that need it
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170213135235.12274-13-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-02-21 11:14:08 +00:00
Paolo Bonzini
b20123a28b qed: introduce qed_aio_start_io and qed_aio_next_io_cb
qed_aio_start_io and qed_aio_next_io will not have to acquire/release
the AioContext, while qed_aio_next_io_cb will.  Split the functionality
and gain a little type-safety in the process.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170213135235.12274-11-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-02-21 11:14:08 +00:00
Paolo Bonzini
e5c67ab552 blkdebug: reschedule coroutine on the AioContext it is running on
Keep the coroutine on the same AioContext.  Without this change,
there would be a race between yielding the coroutine and reentering it.
While the race cannot happen now, because the code only runs from a single
AioContext, this will change with multiqueue support in the block layer.

While doing the change, replace custom bottom half with aio_co_schedule.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170213135235.12274-10-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-02-21 11:14:08 +00:00
Paolo Bonzini
ff82911cd3 nbd: convert to use qio_channel_yield
In the client, read the reply headers from a coroutine, switching the
read side between the "read header" coroutine and the I/O coroutine that
reads the body of the reply.

In the server, if the server can read more requests it will create a new
"read request" coroutine as soon as a request has been read.  Otherwise,
the new coroutine is created in nbd_request_put.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170213135235.12274-8-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-02-21 11:14:08 +00:00
Paolo Bonzini
35f106e684 block-backend: allow blk_prw from coroutine context
qcow2_create2 calls this.  Do not run a nested event loop, as that
breaks when aio_co_wake tries to queue the coroutine on the co_queue_wakeup
list of the currently running one.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20170213135235.12274-4-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-02-21 11:14:07 +00:00
Paolo Bonzini
c2b38b277a block: move AioContext, QEMUTimer, main-loop to libqemuutil
AioContext is fairly self contained, the only dependency is QEMUTimer but
that in turn doesn't need anything else.  So move them out of block-obj-y
to avoid introducing a dependency from io/ to block-obj-y.

main-loop and its dependency iohandler also need to be moved, because
later in this series io/ will call iohandler_get_aio_context.

[Changed copyright "the QEMU team" to "other QEMU contributors" as
suggested by Daniel Berrange and agreed by Paolo.
--Stefan]

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20170213135235.12274-2-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-02-21 11:14:07 +00:00
Alberto Garcia
7061a07898 qcow2: Optimize the refcount-block overlap check
The metadata overlap checks introduced in a40f1c2add help detect
corruption in the qcow2 image by verifying that data writes don't
overlap with existing metadata sections.

The 'refcount-block' check in particular iterates over the refcount
table in order to get the addresses of all refcount blocks and check
that none of them overlap with the region where we want to write.

The problem with the refcount table is that since it always occupies
complete clusters its size is usually very big. With the default
values of cluster_size=64KB and refcount_bits=16 this table holds 8192
entries, each one of them enough to map 2GB worth of host clusters.

So unless we're using images with several TB of allocated data this
table is going to be mostly empty, and iterating over it is a waste of
CPU. If the storage backend is fast enough this can have an effect on
I/O performance.

This patch keeps the index of the last used (i.e. non-zero) entry in
the refcount table and updates it every time the table changes. The
refcount-block overlap check then uses that index instead of reading
the whole table.

In my tests with a 4GB qcow2 file stored in RAM this doubles the
amount of write IOPS.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 20170201123828.4815-1-berto@igalia.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-02-12 00:47:43 +01:00
Peter Lieven
f67409a5bb block/nfs: fix naming of runtime opts
commit 94d6a7a accidentally left the naming of runtime opts and QAPI
scheme inconsistent. As one consequence passing of parameters in the
URI is broken. Sync the naming of the runtime opts to the QAPI
scheme.

Please note that this is technically backwards incompatible with the 2.8
release, but the 2.8 release is the only version that had the wrong naming.
Furthermore release 2.8 suffered from a NULL pointer dereference during
URI parsing.

Fixes: 94d6a7a76e
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-id: 1485942829-10756-3-git-send-email-pl@kamp.de
[mreitz: Fixed commit message]
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-02-12 00:47:42 +01:00
Peter Lieven
8d20abe87a block/nfs: fix NULL pointer dereference in URI parsing
parse_uint_full wants to put the parsed value into the
variable passed via its second argument which is NULL.

Fixes: 94d6a7a76e
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1485942829-10756-2-git-send-email-pl@kamp.de
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-02-12 00:47:42 +01:00
Dou Liyang
a6baa60807 block/qapi: reduce the execution time of qmp_query_blockstats
In order to reduce the execution time, this patch optimize
the qmp_query_blockstats():
Remove the next_query_bds function.
Remove the bdrv_query_stats function.
Remove some judgement sentence.

The original qmp_query_blockstats calls next_query_bds to get
the next objects in each loops. In the next_query_bds, it checks
the query_nodes and blk. It also call bdrv_query_stats to get
the stats, In the bdrv_query_stats, it checks blk and bs each
times. This waste more times, which may stall the main loop a
bit. And if the disk is too many and donot use the dataplane
feature, this may affect the performance in main loop thread.

This patch removes that two functions, and makes the structure
clearly.

Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Message-id: 1484467275-27919-3-git-send-email-douly.fnst@cn.fujitsu.com
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[mreitz: Removed duplicate info->value assignment]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-02-12 00:47:42 +01:00
Dou Liyang
20a6d768f5 block/qapi: reduce the coupling between the bdrv_query_stats and bdrv_query_bds_stats
The bdrv_query_stats and bdrv_query_bds_stats functions need to call
each other, that increases the coupling. it also makes the program
complicated and makes some unnecessary tests.

Remove the call from bdrv_query_bds_stats to bdrv_query_stats, just
take some recursion to make it clearly.

Avoid testing whether the blk is NULL during querying the bds stats.
It is unnecessary.

Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Message-id: 1484467275-27919-2-git-send-email-douly.fnst@cn.fujitsu.com
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-02-12 00:47:42 +01:00
QingFeng Hao
4545d4f4af block/vmdk: Fix the endian problem of buf_len and lba
The problem was triggered by qemu-iotests case 055. It failed when it
was comparing the compressed vmdk image with original test.img.

The cause is that buf_len in vmdk_write_extent wasn't converted to
little-endian before it was stored to disk. But later vmdk_read_extent
read it and converted it from little-endian to cpu endian.
If the cpu is big-endian like s390, the problem will happen and
the data length read by vmdk_read_extent will become invalid!
The fix is to add the conversion in vmdk_write_extent, meanwhile,
repair the endianness problem of lba field which shall also be converted
to little-endian before storing to disk.

Cc: qemu-stable@nongnu.org
Signed-off-by: QingFeng Hao <haoqf@linux.vnet.ibm.com>
Signed-off-by: Jing Liu <liujbjl@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20161216052040.53067-2-haoqf@linux.vnet.ibm.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-02-12 00:47:42 +01:00
Fam Zheng
9adceb0213 qapi: Tweak error message of bdrv_query_image_info
@bs doesn't always have a device name, such as when it comes from
"qemu-img info". Report file name instead.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 20170119130759.28319-2-famz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-02-12 00:47:41 +01:00
Peter Maydell
4e9f5244e1 -----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJYkeZAAAoJEJykq7OBq3PI6oUH/3qlRvQrWmhWLR+XCtwU0gON
 HRApL57Of+B1YbqJzb8wzjLMLfzZQYLoT7kf3FDRON751Iwpv2Qyl6j79kbmOQwy
 txvtgUTtPZrOZ9HMk6M1VboiKrkM1t0I1QiRYy/af2f1gD3KTqIt8YN1ic3xatKD
 Fgmx+oD+6EkrNilthemvDyaXtGsdTl4GC9ZbGcJB2VJzzWkksRUfeZWysIu9p2zP
 l6viegW/1+o5wYgBt6DxMalfNGbEiuBgXgx6PVFPbkw0xNURC52qDHhQ91xTSWt1
 pvFrIhYWR/ETN0twJh+jtmCjkawKWSsx2nrLlrSh4H0EpwFoRfFqH/ZrOFSg0wg=
 =QnCX
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/stefanha/tags/tracing-pull-request' into staging

# gpg: Signature made Wed 01 Feb 2017 13:44:32 GMT
# gpg:                using RSA key 0x9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>"
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35  775A 9CA4 ABB3 81AB 73C8

* remotes/stefanha/tags/tracing-pull-request:
  trace: clean up trace-events files
  qapi: add missing trace_visit_type_enum() call
  trace: improve error reporting when parsing simpletrace header
  trace: update docs to reflect new code generation approach
  trace: switch to modular code generation for sub-directories
  trace: move setting of group name into Makefiles
  trace: move hw/i386/xen events to correct subdir
  trace: move hw/xen events to correct subdir
  trace: move hw/block/dataplane events to correct subdir
  make: move top level dir to end of include search path

# Conflicts:
#	Makefile

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-02-02 16:08:28 +00:00
Paolo Bonzini
acf6e5f096 sheepdog: reorganize check for overlapping requests
Wrap the code that was copied repeatedly in the two functions,
sd_aio_setup and sd_aio_complete.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20161129113245.32724-6-pbonzini@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2017-02-01 00:17:20 -05:00
Paolo Bonzini
c4080e9391 sheepdog: simplify inflight_aio_head management
Add to the list in add_aio_request and, indirectly, resend_aioreq.  Inline
free_aio_req in the caller, it does not simply undo alloc_aio_req's job.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20161129113245.32724-5-pbonzini@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2017-02-01 00:17:20 -05:00
Paolo Bonzini
28ddd08cd6 sheepdog: do not use BlockAIOCB
Sheepdog's AIOCB are completely internal entities for a group of
requests and do not need dynamic allocation.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20161129113245.32724-4-pbonzini@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2017-02-01 00:17:20 -05:00
Paolo Bonzini
e80ab33dc0 sheepdog: reorganize coroutine flow
Delimit co_recv's lifetime clearly in aio_read_response.

Do a simple qemu_coroutine_enter in aio_read_response, letting
sd_co_writev call sd_write_done.

Handle nr_pending in the same way in sd_co_rw_vector,
sd_write_done and sd_co_flush_to_disk.

Remove sd_co_rw_vector's return value; just leave with no
pending requests.

[Jeff: added missing 'return' back, spotted by Paolo after
       series was applied.]

Signed-off-by: Jeff Cody <jcody@redhat.com>
2017-02-01 00:17:20 -05:00
Paolo Bonzini
a71264f9f5 sheepdog: remove unused cancellation support
SheepdogAIOCB is internal to sheepdog.c, hence it is never canceled.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20161129113245.32724-2-pbonzini@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2017-02-01 00:17:20 -05:00
Stefan Hajnoczi
7f4076c1bb trace: clean up trace-events files
There are a number of unused trace events that
scripts/cleanup-trace-events.pl finds.  The "hw/vfio/pci-quirks.c"
filename was typoed and "qapi/qapi-visit-core.c" was missing the qapi/
directory prefix.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20170126171613.1399-3-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-01-31 17:12:15 +00:00
Peter Lieven
2deb63c2da block/iscsi: statically link qemu_iscsi_opts
commit f57b4b5f moved qemu_iscsi_opts into vl.c. This
made them invisible for qemu-img, qemu-nbd etc.

Fixes: f57b4b5fb1
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-Id: <1485262161-18543-1-git-send-email-pl@kamp.de>
[Drop useless #ifdef. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-27 18:07:58 +01:00
Eric Farman
c4c41a0a65 block: get max_transfer limit for char (scsi-generic) devices
We can get the maximum number of bytes for a single I/O transfer
from the BLKSECTGET ioctl, but we only perform this for block
devices.  scsi-generic devices are represented as character devices,
and so do not issue this today.  Update this, so that virtio-scsi
devices using the scsi-generic interface can return the same data.

Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
Message-Id: <20170120162527.66075-4-farman@linux.vnet.ibm.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-27 18:07:31 +01:00
Eric Farman
482652502e block: Fix target variable of BLKSECTGET ioctl
Commit 6f6071745b ("raw-posix: Fetch max sectors for host block device")
introduced a routine to call the kernel BLKSECTGET ioctl, which stores the
result back to user space.  However, the size of the data returned depends
on the routine handling the ioctl.  The (compat_)blkdev_ioctl returns a
short, while sg_ioctl returns an int.  Thus, on big-endian systems, we can
find ourselves accidentally shifting the result to a much larger value.
(On s390x, a short is 16 bits while an int is 32 bits.)

Also, the two ioctl handlers return values in different scales (block
returns sectors, while sg returns bytes), so some tweaking of the outputs
is required such that hdev_get_max_transfer_length returns a value in a
consistent set of units.

Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
Message-Id: <20170120162527.66075-3-farman@linux.vnet.ibm.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-27 18:07:31 +01:00
Peter Lieven
1da45e0c4c block/iscsi: avoid data corruption with cache=writeback
nb_cls_shrunk in iscsi_allocmap_update can become -1 if the
request starts and ends within the same cluster. This results
in passing -1 to bitmap_set and bitmap_clear and they don't
handle negative values properly. In the end this leads to data
corruption.

Fixes: e1123a3b40
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-Id: <1484579832-18589-1-git-send-email-pl@kamp.de>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-27 18:07:31 +01:00
Ashijeet Acharya
fe44dc9180 migration: disallow migrate_add_blocker during migration
If a migration is already in progress and somebody attempts
to add a migration blocker, this should rightly fail.

Add an errp parameter and a retcode return value to migrate_add_blocker.

Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com>
Message-Id: <1484566314-3987-5-git-send-email-ashijeetacharya@gmail.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Acked-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  Merged with recent 'Allow invtsc migration' change
2017-01-24 18:00:30 +00:00
Ashijeet Acharya
05551e58ee block/vvfat: Remove the undesirable comment
Remove the "// assert(is_consistent(s))" comment in block/vvfat.c

Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com>
Message-Id: <1484566314-3987-2-git-send-email-ashijeetacharya@gmail.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2017-01-24 17:54:47 +00:00
Paolo Bonzini
8f90b5e91d block: get rid of bdrv_io_unplugged_begin/end
bdrv_io_plug and bdrv_io_unplug are only called (via their
BlockBackend equivalents) after starting asynchronous I/O.
bdrv_drain is not going to be called while they are running,
because---even if a coroutine runs for some reason---it will
only drain in the next iteration of the event loop through
bdrv_co_yield_to_drain.

So this mechanism is unnecessary, get rid of it.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20161129113334.605-1-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-01-16 13:25:17 +00:00
Eric Blake
c1bb86cd8a block: Rename raw-{posix,win32} to file-*.c
These files deal with the file protocol, not the raw format (the
file protocol is often used with other formats, and the raw
format is not forced to use the file protocol).  Rename things
to make it a bit easier to follow.

Suggested-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-01-09 13:30:53 +01:00
Eric Blake
2e6fc7eb1a block: Rename raw_bsd to raw-format.c
Given that we have raw-win32.c and raw-posix.c, my initial guess at
raw_bsd.c was that it was for dealing with raw files using code
specific to the BSD operating system (beyond what raw-posix could
do).  Not so - this name was chosen back in commit e1c66c6 to
distinguish that it was a BSD licensed file, in contrast to the
then-existing raw.c with an unclear and potentially unusable
license.  But since it has been more than three years since the
rewrite, it's time to pick a more useful name for this file to
avoid this type of confusion to future contributors that don't know
the backstory, as none of our other files are named solely by the
license they use.

In reality, this file deals with the raw format, which is useful
with any number of protocols, while raw-{win32,posix} deal with
the file protocol (and in turn, that protocol is not limited to
use with the raw format).  So rename raw_bsd to raw-format.c.  We
could have also used the shorter name raw.c, except that collides
with the earlier use of that filename for a different license,
and it's better to be safe than risk license pollution.

The next patch will also rename raw-win32.c and raw-posix.c to
further distinguish the difference in roles.

It doesn't hurt that this gets rid of an underscore in the filename,
thereby making tab-completion on 'ra<TAB>' easier (now I don't have
to type the shift key, which slows things down :)

Suggested-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-01-09 13:30:52 +01:00
Kevin Wolf
44b6789299 blkverify: Implement bdrv_co_preadv/pwritev/flush
This enables byte granularity requests for blkverify, and at the same
time gets us rid of another user of the BDS-level AIO emulation.

The reference output of a test case must be changed because the
verification failure message reports byte offsets instead of sectors
now.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2017-01-09 13:30:52 +01:00
Kevin Wolf
7c3a998531 blkdebug: Implement bdrv_co_preadv/pwritev/flush
This enables byte granularity requests for blkdebug, and at the same
time gets us rid of another user of the BDS-level AIO emulation.

Note that unless align=512 is specified, this can behave subtly
different from the old behaviour because bdrv_co_preadv/pwritev don't
have to perform alignment adjustments any more.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2017-01-09 13:30:52 +01:00
Kevin Wolf
7c37f941d0 quorum: Clean up quorum_aio_get()
Make sure that all fields of the new QuorumAIOCB are zeroed when the
function returns even without explicitly setting them. This will protect
us when new fields are added, removes some explicit zero assignment and
makes the code a little nicer to read.

Suggested-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
2017-01-09 13:30:52 +01:00
Kevin Wolf
a7e159025e quorum: Inline quorum_fifo_aio_cb()
Inlining the function removes some boilerplace code and replaces
recursion by a simple loop, so the code becomes somewhat easier to
understand.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2017-01-09 13:30:52 +01:00
Kevin Wolf
6847da3808 quorum: Implement .bdrv_co_preadv/pwritev()
This enables byte granularity requests on quorum nodes.

Note that the QMP events emitted by the driver are an external API that
we were careless enough to define as sector based. The offset and length
of requests reported in events are rounded therefore.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
2017-01-09 13:30:52 +01:00
Kevin Wolf
dee66e2882 quorum: Avoid bdrv_aio_writev() for rewrites
Replacing it with bdrv_co_pwritev() prepares us for byte granularity
requests and gets us rid of the last bdrv_aio_*() user in quorum.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2017-01-09 13:30:52 +01:00
Kevin Wolf
7cd9b3964e quorum: Inline quorum_aio_cb()
This is a conversion to a more natural coroutine style and improves the
readability of the driver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2017-01-09 13:30:52 +01:00
Kevin Wolf
0f31977d9d quorum: Do cleanup in caller coroutine
Instead of calling quorum_aio_finalize() deeply nested in what used
to be an AIO callback, do it in the same functions that allocated the
AIOCB.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2017-01-09 13:30:52 +01:00
Kevin Wolf
ce15dc08ef quorum: Implement .bdrv_co_readv/writev
This converts the quorum block driver from implementing callback-based
interfaces for read/write to coroutine-based ones. This is the first
step that will allow us further simplification of the code.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
2017-01-09 13:30:52 +01:00
Kevin Wolf
10c8551968 quorum: Remove s from quorum_aio_get() arguments
There is no point in passing the value of bs->opaque in order to
overwrite it with itself.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
2017-01-09 13:30:52 +01:00
Stefan Hajnoczi
ee68697551 linux-aio: poll ring for completions
The Linux AIO userspace ABI includes a ring that is shared with the
kernel.  This allows userspace programs to process completions without
system calls.

Add an AioContext poll handler to check for completions in the ring.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20161201192652.9509-6-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-01-03 16:38:48 +00:00
Stefan Hajnoczi
f6a51c84cd aio: add AioPollFn and io_poll() interface
The new AioPollFn io_poll() argument to aio_set_fd_handler() and
aio_set_event_handler() is used in the next patch.

Keep this code change separate due to the number of files it touches.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20161201192652.9509-3-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-01-03 16:38:48 +00:00
Stefan Hajnoczi
68701de136 Block layer patches for 2.8.0-rc3
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJYRs7XAAoJEH8JsnLIjy/W5ScP/0ukNPAUUYGnU5dIj6q20kRk
 kIhEDZXNRMqT9qhSypi5ivOc/HDIXBUM5jsuAPHTK/JxMWeAFCwOwKXzKLPEZ9iG
 nA5UGMST4uwwB0bANUAyweGIHdTIfhgN2dgLKvKIsbeTNmCyMaJieZ29fkVbNKyS
 msOFmaVn+nm4rgE/q7HXtg/hUdjuoaEOsWsJ7YY3Bj8kgQR/H8iPCvNCl3YWGwIW
 9vXQL25QUaQbBWinA+rHhHiGPAK2GzitVry3fQGch5j4OqpXYt3IQTSqXZMQLHcT
 zUwcx99C16hG9R7sjhNuto+lMuw6qtK75s/7PGpjw8aQFwYR5ITAyB369dxmrGqc
 1bBPsRCXfhWku/4wMzrj4fO7iszMadBIzChwk+IsCRNAWHFGoc9VHvk3mdT4puBB
 2W4JlOzGY/FD/rQRetGGmGN09HheRZ5sW7o9DoUoGBLCk1llIrVs/fKtHLDtx1T9
 sNe5e7EYdNufBTpy7p/75nRMyYlVlENJW/A+nw2pvsEZjU9LAdWqEEfmU1OU1rdl
 TEZxBZQtPq0ZkfQIV6mPUFBhE3e8EbB573jJaD2P8StpR6jCFRZlU2hkW4c7FnZ1
 pUfc8PJFP4Bo6CaNy7PMUCUHD7z8eZP/BFndODAgBFm2WAxN2tsRrfPDWh23huZV
 k9+VZFjeT2jz+mHtRCbA
 =9y67
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'kwolf/tags/for-upstream' into staging

Block layer patches for 2.8.0-rc3

# gpg: Signature made Tue 06 Dec 2016 02:44:39 PM GMT
# gpg:                using RSA key 0x7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* kwolf/tags/for-upstream:
  qcow2: Don't strand clusters near 2G intervals during commit

Message-id: 1481037418-10239-1-git-send-email-kwolf@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-12-06 17:35:29 +00:00
Eric Blake
a3e1505dae qcow2: Don't strand clusters near 2G intervals during commit
The qcow2_make_empty() function is reached during 'qemu-img commit',
in order to clear out ALL clusters of an image.  However, if the
image cannot use the fast code path (true if the image is format
0.10, or if the image contains a snapshot), the cluster size is
larger than 512, and the image is larger than 2G in size, then our
choice of sector_step causes problems.  Since it is not cluster
aligned, but qcow2_discard_clusters() silently ignores an unaligned
head or tail, we are leaving clusters allocated.

Enhance the testsuite to expose the flaw, and patch the problem by
ensuring our step size is aligned.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-12-06 15:37:02 +01:00
Prasanna Kumar Kalever
7103d9165b block/nfs: fix QMP to match debug option
The QMP definition of BlockdevOptionsNfs:
{ 'struct': 'BlockdevOptionsNfs',
  'data': { 'server': 'NFSServer',
            'path': 'str',
            '*user': 'int',
            '*group': 'int',
            '*tcp-syn-count': 'int',
            '*readahead-size': 'int',
            '*page-cache-size': 'int',
            '*debug-level': 'int' } }

To make this consistent with other block protocols like gluster, lets
change s/debug-level/debug/

Suggested-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-12-05 16:30:21 -05:00
Prasanna Kumar Kalever
1a417e46ae block/gluster: fix QMP to match debug option
The QMP definition of BlockdevOptionsGluster:
{ 'struct': 'BlockdevOptionsGluster',
  'data': { 'volume': 'str',
            'path': 'str',
            'server': ['GlusterServer'],
            '*debug-level': 'int',
            '*logfile': 'str' } }

But instead of 'debug-level we have exported 'debug' as the option for choosing
debug level of gluster protocol driver.

This patch fix QMP definition BlockdevOptionsGluster
s/debug-level/debug/

Suggested-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-12-05 16:30:15 -05:00
Stefan Hajnoczi
f05234df63 Block layer patches for 2.8.0-rc2
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJYPZu6AAoJEH8JsnLIjy/W1IAP/AwV0sWafsSMnWiz/4NVqeh3
 Yk2cBtxCBmnq1y+PilLoZBdHui/RumwVuZKaShs3JA1n5CB1AjsVtEVl/6rQM7lv
 yymLr32pODuf4eaGwGY09FqTiL0Erlm846zbSDjkiKbTYoKpzRv0PT2iiA6yTnjO
 Mrs5nG7kEWdXPZ0ZsJyEyU3+vs7rNg+4N/VfTdPmCrV5DVBvAeCawM6JXHQNc7LV
 ER6Y8W9PAu5mYqwekjAW07lPCudytAsOTrbTTO9Sv/+kZUdKEmv7ZHJrPdECCb6N
 vcPOYOzKsEvvR8E0YZtuJDK9W4RTakxdlTste+TtW3VSt1Cs0zpvCFytaGuC+Kmq
 mhlA4lYLDvaiNOMl09SvIjjxGI7+FO+1XsY7e4rI5PJzOKWZMFOIwQMNxE3B2qUI
 dxd6izf7fzF4V5uDDwHTJ8TAiJDSAe6Bkz+vzipQtu5NARl/isbQuIPIGXPkxZln
 fkCYA8/7EXrLXqd3khiRqEHS60ZtNgfm4ss8euMlWAgJAz0RLC1d/XhOIxaCQOg3
 R/F9UdJAon6mfOgamZs5yzJgaPU6M90g/QipMB3Ub00VODacTiA81QUjZdEgELBB
 zvhgeja7qdIvOh9r9heCCuUTmkWRRppmkrKdqFLowZ2aWosISy/UjiPTGdjDdq7Z
 LsfYiRXsW94FmNXdKCqg
 =3WTz
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'kwolf/tags/for-upstream' into staging

Block layer patches for 2.8.0-rc2

# gpg: Signature made Tue 29 Nov 2016 03:16:10 PM GMT
# gpg:                using RSA key 0x7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* kwolf/tags/for-upstream:
  docs: Specify that cache-clean-interval is only supported in Linux
  qcow2: Remove stale comment
  qcow2: Allow 'cache-clean-interval' in Linux only
  qcow2: Make qcow2_cache_table_release() work only in Linux

Message-id: 1480436227-2211-1-git-send-email-kwolf@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-11-29 17:06:39 +00:00
Alberto Garcia
a8b99dd516 qcow2: Remove stale comment
We haven't been using CONFIG_MADVISE since 02d0e09503

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-11-25 13:51:30 +01:00
Alberto Garcia
91203f08f0 qcow2: Allow 'cache-clean-interval' in Linux only
The cache-clean-interval option of qcow2 only works on Linux. However
we allow setting it in other systems regardless of whether it works or
not.

In those systems this option is not simply a no-op: it actually
invalidates perfectly valid cache tables for no good reason without
freeing their memory.

This patch forbids using that option in non-Linux systems.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-11-25 13:51:30 +01:00