Commit Graph

3737 Commits

Author SHA1 Message Date
John Snow 35d6b368f2 blockjobs: ensure abort is called for cancelled jobs
Presently, even if a job is canceled post-completion as a result of
a failing peer in a transaction, it will still call .commit because
nothing has updated or changed its return code.

The reason why this does not cause problems currently is because
backup's implementation of .commit checks for cancellation itself.

I'd like to simplify this contract:

(1) Abort is called if the job/transaction fails
(2) Commit is called if the job/transaction succeeds

To this end: A job's return code, if 0, will be forcibly set as
-ECANCELED if that job has already concluded. Remove the now
redundant check in the backup job implementation.

We need to check for cancellation in both block_job_completed
AND block_job_completed_single, because jobs may be cancelled between
those two calls; for instance in transactions. This also necessitates
an ABORTING -> ABORTING transition to be allowed.

The check in block_job_completed could be removed, but there's no
point in starting to attempt to succeed a transaction that we know
in advance will fail.

This does NOT affect mirror jobs that are "canceled" during their
synchronous phase. The mirror job itself forcibly sets the canceled
property to false prior to ceding control, so such cases will invoke
the "commit" callback.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-19 12:01:24 +01:00
John Snow 75f710599f blockjobs: add block_job_dismiss
For jobs that have reached their CONCLUDED state, prior to having their
last reference put down (meaning jobs that have completed successfully,
unsuccessfully, or have been canceled), allow the user to dismiss the
job's lingering status report via block-job-dismiss.

This gives management APIs the chance to conclusively determine if a job
failed or succeeded, even if the event broadcast was missed.

Note: block_job_do_dismiss and block_job_decommission happen to do
exactly the same thing, but they're called from different semantic
contexts, so both aliases are kept to improve readability.

Note 2: Don't worry about the 0x04 flag definition for AUTO_DISMISS, she
has a friend coming in a future patch to fill the hole where 0x02 is.

Verbs:
Dismiss: operates on CONCLUDED jobs only.
Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-19 12:01:24 +01:00
John Snow 0ec4dfb8d6 blockjobs: add block_job_verb permission table
Which commands ("verbs") are appropriate for jobs in which state is
also somewhat burdensome to keep track of.

As of this commit, it looks rather useless, but begins to look more
interesting the more states we add to the STM table.

A recurring theme is that no verb will apply to an 'undefined' job.

Further, it's not presently possible to restrict the "pause" or "resume"
verbs any more than they are in this commit because of the asynchronous
nature of how jobs enter the PAUSED state; justifications for some
seemingly erroneous applications are given below.

=====
Verbs
=====

Cancel:    Any state except undefined.
Pause:     Any state except undefined;
           'created': Requests that the job pauses as it starts.
           'running': Normal usage. (PAUSED)
           'paused':  The job may be paused for internal reasons,
                      but the user may wish to force an indefinite
                      user-pause, so this is allowed.
           'ready':   Normal usage. (STANDBY)
           'standby': Same logic as above.
Resume:    Any state except undefined;
           'created': Will lift a user's pause-on-start request.
           'running': Will lift a pause request before it takes effect.
           'paused':  Normal usage.
           'ready':   Will lift a pause request before it takes effect.
           'standby': Normal usage.
Set-speed: Any state except undefined, though ready may not be meaningful.
Complete:  Only a 'ready' job may accept a complete request.

=======
Changes
=======

(1)

To facilitate "nice" error checking, all five major block-job verb
interfaces in blockjob.c now support an errp parameter:

- block_job_user_cancel is added as a new interface.
- block_job_user_pause gains an errp paramter
- block_job_user_resume gains an errp parameter
- block_job_set_speed already had an errp parameter.
- block_job_complete already had an errp parameter.

(2)

block-job-pause and block-job-resume will no longer no-op when trying
to pause an already paused job, or trying to resume a job that isn't
paused. These functions will now report that they did not perform the
action requested because it was not possible.

iotests have been adjusted to address this new behavior.

(3)

block-job-complete doesn't worry about checking !block_job_started,
because the permission table guards against this.

(4)

test-bdrv-drain's job implementation needs to announce that it is
'ready' now, in order to be completed.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-19 12:01:24 +01:00
John Snow c9de40505f blockjobs: add state transition table
The state transition table has mostly been implied. We're about to make
it a bit more complex, so let's make the STM explicit instead.

Perform state transitions with a function that for now just asserts the
transition is appropriate.

Transitions:
Undefined -> Created: During job initialization.
Created   -> Running: Once the job is started.
                      Jobs cannot transition from "Created" to "Paused"
                      directly, but will instead synchronously transition
                      to running to paused immediately.
Running   -> Paused:  Normal workflow for pauses.
Running   -> Ready:   Normal workflow for jobs reaching their sync point.
                      (e.g. mirror)
Ready     -> Standby: Normal workflow for pausing ready jobs.
Paused    -> Running: Normal resume.
Standby   -> Ready:   Resume of a Standby job.

+---------+
|UNDEFINED|
+--+------+
   |
+--v----+
|CREATED|
+--+----+
   |
+--v----+     +------+
|RUNNING<----->PAUSED|
+--+----+     +------+
   |
+--v--+       +-------+
|READY<------->STANDBY|
+-----+       +-------+

Notably, there is no state presently defined as of this commit that
deals with a job after the "running" or "ready" states, so this table
will be adjusted alongside the commits that introduce those states.

Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-19 12:01:24 +01:00
John Snow 75859b9420 blockjobs: model single jobs as transactions
model all independent jobs as single job transactions.

It's one less case we have to worry about when we add more states to the
transition machine. This way, we can just treat all job lifetimes exactly
the same. This helps tighten assertions of the STM graph and removes some
conditionals that would have been needed in the coming commits adding a
more explicit job lifetime management API.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-19 12:01:24 +01:00
Peter Maydell 9cc7d0cf6a -----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJaqD6PAAoJEH3vgQaq/DkO9FIP/3pAW3xJUDGYsONiebX1IbhA
 VpoQCcjks3cHD18AUoVHufayJBUVfed1LhYPP8xoDuSRmKs1xU1O9FknxMQaL+Dw
 kbliBY7GjN8A2EcCjW+ZwyNT/KpjyXXwuZ2PSnOSSiN3JK6wrLCzeZyKyOYewLCS
 u9fKscnqWkg+awbCfDlVs92AaBAKoOP9loOq6e2J/jVY8HSDGb2owRnsxaWg8gJ8
 J9BlnXENQ14jEwickD3sluPfWkhu9xh7cCocH8cfgXL5veGUELz0Ugx4RHcsAF9Q
 SVDg/EhRRN11cvOkLnlggETaLbGtEE64AL4HhjxzCLraHsnEazPDwFgetB9mOhhF
 Nqu8HuGcVvRgn89au89mxAvTSWX9KFq4oF8Vi+FZZHkLilRx6NJnMpUpd9zkSJDq
 yjR2/BV0A9Ep1gvWX/rhpPrN5dALYHcaxoiSB497Yj4SI2ZSyzfrneteYdPv4EEc
 3CSJ3l6NCGAE2dNXuVZTVqHyXOSl7mJQQmT53dtsSNipCMEsVr0mOx3DPNY26LIc
 DUdnX6JOyZPU0wzOj8xjFNV72/gBEkqVZ5p9UJ+lrIYwOsTobpzfDtYquu4asda8
 IN44mcbRCZRFIiZZOGEdnwf34vIpQKMiZAtszAaan9KXwTXV9LbipaomBEN88vUD
 IgI5XsZTfiD2uIjnREWv
 =ISfR
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/jnsnow/tags/bitmaps-pull-request' into staging

# gpg: Signature made Tue 13 Mar 2018 21:11:43 GMT
# gpg:                using RSA key 7DEF8106AAFC390E
# gpg: Good signature from "John Snow (John Huston) <jsnow@redhat.com>"
# Primary key fingerprint: FAEB 9711 A12C F475 812F  18F2 88A9 064D 1835 61EB
#      Subkey fingerprint: F9B7 ABDB BCAC DF95 BE76  CBD0 7DEF 8106 AAFC 390E

* remotes/jnsnow/tags/bitmaps-pull-request:
  iotests: add dirty bitmap postcopy test
  iotests: add dirty bitmap migration test
  migration: add postcopy migration of dirty bitmaps
  migration: allow qmp command migrate-start-postcopy for any postcopy
  migration: add is_active_iterate handler
  migration/qemu-file: add qemu_put_counted_string()
  migration: include migrate_dirty_bitmaps in migrate_postcopy
  qapi: add dirty-bitmaps migration capability
  migration: introduce postcopy-only pending
  dirty-bitmap: add locked state
  block/dirty-bitmap: add _locked version of bdrv_reclaim_dirty_bitmap
  block/dirty-bitmap: fix locking in bdrv_reclaim_dirty_bitmap
  block/dirty-bitmap: add bdrv_dirty_bitmap_enable_successor()

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-03-16 14:15:18 +00:00
Peter Maydell 475fe4576f nbd patches for 2018-03-13
- Eric Blake: iotests: Fix stuck NBD process on 33
 - Vladimir Sementsov-Ogievskiy: 0/5 nbd server fixing and refactoring before BLOCK_STATUS
 - Eric Blake: nbd/server: Honor FUA request on NBD_CMD_TRIM
 - Stefan Hajnoczi: 0/2 block: fix nbd-server-stop crash after blockdev-snapshot-sync
 - Vladimir Sementsov-Ogievskiy: nbd block status base:allocation
 -----BEGIN PGP SIGNATURE-----
 Comment: Public key at http://people.redhat.com/eblake/eblake.gpg
 
 iQEcBAABCAAGBQJaqDklAAoJEKeha0olJ0NqRsAH+waQGLA8YPwxlnpRW2kulLfC
 dZXv/ocl2vGgxRrDLEL46xh1RUpapERHADk/Qun8reQpqLicd6p8VCuoOZFEj6QN
 Xo98JHrKKL6AZ1rVhWUzD8G6qwgL6FGq6Eb5ty/kanf2/0igwtHmu86nOgGyc9dz
 zelGPdIxyxIEjCNiLIN49iEFs+gk1hr8qp1TNMbnHlQh8moOYqdCJWNisOQowoEE
 soCJ4NLnvKBtnmrxDrvtkppQKW8ukDOG/q5BkSTvAIEBH/v0ioohFUNTFkC8vmjO
 8YAwlXAz6EpQuKxpEfl7vxaT19edrNIo55JO1/Gwzk50g4/Mt+AH2JM/msFcMKQ=
 =i+nR
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2018-03-13-v2' into staging

nbd patches for 2018-03-13

- Eric Blake: iotests: Fix stuck NBD process on 33
- Vladimir Sementsov-Ogievskiy: 0/5 nbd server fixing and refactoring before BLOCK_STATUS
- Eric Blake: nbd/server: Honor FUA request on NBD_CMD_TRIM
- Stefan Hajnoczi: 0/2 block: fix nbd-server-stop crash after blockdev-snapshot-sync
- Vladimir Sementsov-Ogievskiy: nbd block status base:allocation

# gpg: Signature made Tue 13 Mar 2018 20:48:37 GMT
# gpg:                using RSA key A7A16B4A2527436A
# gpg: Good signature from "Eric Blake <eblake@redhat.com>"
# gpg:                 aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>"
# gpg:                 aka "[jpeg image of size 6874]"
# Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2  F3AA A7A1 6B4A 2527 436A

* remotes/ericb/tags/pull-nbd-2018-03-13-v2:
  iotests: new test 209 for NBD BLOCK_STATUS
  iotests: add file_path helper
  iotests.py: tiny refactor: move system imports up
  nbd: BLOCK_STATUS for standard get_block_status function: client part
  block/nbd-client: save first fatal error in nbd_iter_error
  nbd: BLOCK_STATUS for standard get_block_status function: server part
  nbd/server: add nbd_read_opt_name helper
  nbd/server: add nbd_opt_invalid helper
  iotests: add 208 nbd-server + blockdev-snapshot-sync test case
  block: let blk_add/remove_aio_context_notifier() tolerate BDS changes
  nbd/server: Honor FUA request on NBD_CMD_TRIM
  nbd/server: refactor nbd_trip: split out nbd_handle_request
  nbd/server: refactor nbd_trip: cmd_read and generic reply
  nbd/server: fix: check client->closing before sending reply
  nbd/server: fix sparse read
  nbd/server: move nbd_co_send_structured_error up
  iotests: Fix stuck NBD process on 33

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-03-16 13:14:07 +00:00
Vladimir Sementsov-Ogievskiy 4f43e9535b dirty-bitmap: add locked state
Add special state, when qmp operations on the bitmap are disabled.
It is needed during bitmap migration. "Frozen" state is not
appropriate here, because it looks like bitmap is unchanged.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20180207155837.92351-5-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
2018-03-13 17:05:00 -04:00
Vladimir Sementsov-Ogievskiy 044ee8e143 block/dirty-bitmap: add _locked version of bdrv_reclaim_dirty_bitmap
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20180207155837.92351-4-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
2018-03-13 17:04:54 -04:00
Vladimir Sementsov-Ogievskiy 604ab74bb5 block/dirty-bitmap: fix locking in bdrv_reclaim_dirty_bitmap
Like other setters here these functions should take a lock.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20180207155837.92351-3-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
2018-03-13 17:04:48 -04:00
Vladimir Sementsov-Ogievskiy 78a33ab587 nbd: BLOCK_STATUS for standard get_block_status function: client part
Minimal realization: only one extent in server answer is supported.
Flag NBD_CMD_FLAG_REQ_ONE is used to force this behavior.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20180312152126.286890-6-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: grammar tweaks, fix min_block check and 32-bit cap, use -1
instead of errno on failure in nbd_negotiate_simple_meta_context,
ensure that block status makes progress on success]
Signed-off-by: Eric Blake <eblake@redhat.com>
2018-03-13 15:43:48 -05:00
Vladimir Sementsov-Ogievskiy 1e98efc029 block/nbd-client: save first fatal error in nbd_iter_error
It is ok, that fatal error hides previous not fatal, but hiding
first fatal error is a bad feature.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180312152126.286890-5-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2018-03-13 15:38:56 -05:00
Stefan Hajnoczi d03654eacd block: let blk_add/remove_aio_context_notifier() tolerate BDS changes
Commit 2019ba0a01 ("block: Add AioContextNotifier functions to BB")
added blk_add/remove_aio_context_notifier() and implemented them by
passing through the bdrv_*() equivalent.

This doesn't work across bdrv_append(), which detaches child->bs and
re-attaches it to a new BlockDriverState.  When
blk_remove_aio_context_notifier() is called we will access the new BDS
instead of the one where the notifier was added!

>From the point of view of the blk_*() API user, changes to the root BDS
should be transparent.

This patch maintains a list of AioContext notifiers in BlockBackend and
adds/removes them from the BlockDriverState as needed.

Reported-by: Stefano Panella <spanella@gmail.com>
Cc: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20180306204819.11266-2-stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2018-03-13 15:38:55 -05:00
Vladimir Sementsov-Ogievskiy e73a265e9f block/dirty-bitmap: add bdrv_dirty_bitmap_enable_successor()
Enabling bitmap successor is necessary to enable successors of bitmaps
being migrated before target vm start.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20180207155837.92351-2-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
2018-03-13 15:33:59 -04:00
Daniel P. Berrangé 44acd46f60 block: include original filename when reporting invalid URIs
Consider passing a JSON based block driver to "qemu-img commit"

$ qemu-img commit 'json:{"driver":"qcow2","file":{"driver":"gluster",\
                  "volume":"gv0","path":"sn1.qcow2",
                  "server":[{"type":\
		  "tcp","host":"10.73.199.197","port":"24007"}]},}'

Currently it will commit the content and then report an incredibly
useless error message when trying to re-open the committed image:

  qemu-img: invalid URI
  Usage: file=gluster[+transport]://[host[:port]]volume/path[?socket=...][,file.debug=N][,file.logfile=/path/filename.log]

With this fix we get:

  qemu-img: invalid URI json:{"server.0.host": "10.73.199.197",
      "driver": "gluster", "path": "luks.qcow2", "server.0.type":
      "tcp", "server.0.port": "24007", "volume": "gv0"}

Of course the root cause problem still exists, but now we know
what actually needs fixing.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20180206105204.14817-1-berrange@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2018-03-13 08:06:55 -04:00
Peter Maydell 12c06d6f96 Block layer patches
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJaoqOgAAoJEH8JsnLIjy/W2MMP/1gj7CJgtSG9wIzyBHjSWQMy
 ofXEgRJO9t/smfUMlH2NdrW8P2LvYcmqOsEBkLJzCtl48fPexwtI/cunzVjutXcf
 VlpqKz/8uN4C9D6m8FN/5kKf65l+tnVqnCoJgwafY5uT7jmoC8LF1xO2jo8a+lJd
 0Dv6RxJUQq/tDR6OvO6aW4EzbOUcD4wkLvi/uz8+ZjV1BLSLlpdudejr6W9TnJY/
 EGFedbxqjPV7fIvMbodbFp0Ie8Aw0WEL8ttERboeR4jbA/o+PZVGpPtHsr/4V6QO
 Pgh6vH2rGavxFzwuCWEGhlLKGx66CGqqdTknm6lNJchepCvcfoYxjOPZv9FCaMUs
 enC/x43xSkCmkwBwKKxpXqu1vS5nGdMebAwRjstSIplypjv2YOwS1AiU5snaDwuk
 t9Gjkw0Wka5nySuYi43H2RPXmlWbh4T8DfQ6pOyJGvXGjm8t+f5BTaMtSWn6Iq2W
 F6r1UezQJBDnUbpFgsRg4AP+htPGDHgsOg7KzCCd/lBHwbjX7dkQlAYbBZZ2OBF+
 wQN5olDR6jsKIy2IlARNgNweZHW5UQa1cc+7HlVNNE5tqtkjo7aWPk/LhEzBCIHg
 sWG3VH2y3lQlaMzYh1v+jnGrFoq1ZJU4sbjaxvQX8czjmaQvPtbzKuZAovQ4pGwa
 g0SrWP6p9yLo0LXLuXBP
 =WDF4
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches

# gpg: Signature made Fri 09 Mar 2018 15:09:20 GMT
# gpg:                using RSA key 7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream: (56 commits)
  qemu-iotests: fix 203 migration completion race
  iotests: Tweak 030 in order to trigger a race condition with parallel jobs
  iotests: Skip test for ENOMEM error
  iotests: Mark all tests executable
  iotests: Test creating overlay when guest running
  qemu-iotests: Test ssh image creation over QMP
  qemu-iotests: Test qcow2 over file image creation with QMP
  block: Fail bdrv_truncate() with negative size
  file-posix: Fix no-op bdrv_truncate() with falloc preallocation
  ssh: Support .bdrv_co_create
  ssh: Pass BlockdevOptionsSsh to connect_to_ssh()
  ssh: QAPIfy host-key-check option
  ssh: Use QAPI BlockdevOptionsSsh object
  sheepdog: Support .bdrv_co_create
  sheepdog: QAPIfy "redundancy" create option
  nfs: Support .bdrv_co_create
  nfs: Use QAPI options in nfs_client_open()
  rbd: Use qemu_rbd_connect() in qemu_rbd_do_create()
  rbd: Assign s->snap/image_name in qemu_rbd_open()
  rbd: Support .bdrv_co_create
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-03-12 10:08:09 +00:00
Kevin Wolf 89b259eeaa file-posix: Fix no-op bdrv_truncate() with falloc preallocation
If bdrv_truncate() is called, but the requested size is the same as
before, don't call posix_fallocate(), which returns -EINVAL for length
zero and would therefore make bdrv_truncate() fail.

The problem can be triggered by creating a zero-sized raw image with
'falloc' preallocation mode.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2018-03-09 15:17:48 +01:00
Kevin Wolf 4906da7e4d ssh: Support .bdrv_co_create
This adds the .bdrv_co_create driver callback to ssh, which enables
image creation over QMP.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-03-09 15:17:48 +01:00
Kevin Wolf 375f0b9266 ssh: Pass BlockdevOptionsSsh to connect_to_ssh()
Move the parsing of the QDict options up to the callers, in preparation
for the .bdrv_co_create implementation that directly gets a QAPI type.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-03-09 15:17:47 +01:00
Kevin Wolf ec2f54182c ssh: QAPIfy host-key-check option
This makes the host-key-check option available in blockdev-add.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-03-09 15:17:47 +01:00
Kevin Wolf 16e4bdb1bf ssh: Use QAPI BlockdevOptionsSsh object
Create a BlockdevOptionsSsh object in connect_to_ssh() and take the
options from there. 'host_key_check' is still processed separately
because it's not in the schema yet.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-03-09 15:17:47 +01:00
Kevin Wolf 63fd65a0a5 sheepdog: Support .bdrv_co_create
This adds the .bdrv_co_create driver callback to sheepdog, which enables
image creation over QMP.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-03-09 15:17:47 +01:00
Kevin Wolf a595e4bcca sheepdog: QAPIfy "redundancy" create option
The "redundancy" option for Sheepdog image creation is currently a
string that can encode one or two integers depending on its format,
which at the same time implicitly selects a mode.

This patch turns it into a QAPI union and converts the string into such
a QAPI object before interpreting the values.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-03-09 15:17:47 +01:00
Kevin Wolf a1a42af422 nfs: Support .bdrv_co_create
This adds the .bdrv_co_create driver callback to nfs, which enables
image creation over QMP.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-03-09 15:17:47 +01:00
Kevin Wolf c22a034545 nfs: Use QAPI options in nfs_client_open()
Using the QAPI visitor to turn all options into QAPI BlockdevOptionsNfs
simplifies the code a lot. It will also be useful for implementing the
QAPI based .bdrv_co_create callback.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-03-09 15:17:47 +01:00
Kevin Wolf aa045c2d37 rbd: Use qemu_rbd_connect() in qemu_rbd_do_create()
This is almost exactly the same code. The differences are that
qemu_rbd_connect() supports BlockdevOptionsRbd.server and that the cache
mode is set explicitly.

Supporting 'server' is a welcome new feature for image creation.
Caching is disabled by default, so leave it that way.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-03-09 15:17:47 +01:00
Kevin Wolf d41a55885d rbd: Assign s->snap/image_name in qemu_rbd_open()
Now that the options are already available in qemu_rbd_open() and not
only parsed in qemu_rbd_connect(), we can assign s->snap and
s->image_name there instead of passing the fields by reference to
qemu_rbd_connect().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-03-09 15:17:47 +01:00
Kevin Wolf 1bebea37b4 rbd: Support .bdrv_co_create
This adds the .bdrv_co_create driver callback to rbd, which enables
image creation over QMP.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-03-09 15:17:47 +01:00
Kevin Wolf 4bfb274165 rbd: Pass BlockdevOptionsRbd to qemu_rbd_connect()
With the conversion to a QAPI options object, the function is now
prepared to be used in a .bdrv_co_create implementation.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-03-09 15:17:47 +01:00
Kevin Wolf 4ff4504974 rbd: Remove non-schema options from runtime_opts
Instead of the QemuOpts in qemu_rbd_connect(), we want to use QAPI
objects. As a preparation, fetch those options directly from the QDict
that .bdrv_open() supports in the rbd driver and that are not in the
schema.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-03-09 15:17:47 +01:00
Kevin Wolf 3d9136f972 rbd: Factor out qemu_rbd_connect()
The code to establish an RBD connection is duplicated between open and
create. In order to be able to share the code, factor out the code from
qemu_rbd_open() as a first step.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-03-09 15:17:47 +01:00
Kevin Wolf 71c87815f9 rbd: Fix use after free in qemu_rbd_set_keypairs() error path
If we want to include the invalid option name in the error message, we
can't free the string earlier than that.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2018-03-09 15:17:47 +01:00
Kevin Wolf ab8bda76a0 gluster: Support .bdrv_co_create
This adds the .bdrv_co_create driver callback to gluster, which enables
image creation over QMP.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2018-03-09 15:17:47 +01:00
Kevin Wolf 3766ef579c file-win32: Support .bdrv_co_create
This adds the .bdrv_co_create driver callback to file-win32, which
enables image creation over QMP.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2018-03-09 15:17:47 +01:00
Kevin Wolf 927f11e131 file-posix: Support .bdrv_co_create
This adds the .bdrv_co_create driver callback to file, which enables
image creation over QMP.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2018-03-09 15:17:47 +01:00
Kevin Wolf b0292b851b block: x-blockdev-create QMP command
This adds a synchronous x-blockdev-create QMP command that can create
qcow2 images on a given node name.

We don't want to block while creating an image, so this is not the final
interface in all aspects, but BlockdevCreateOptionsQcow2 and
.bdrv_co_create() are what they actually might look like in the end. In
any case, this should be good enough to test whether we interpret
BlockdevCreateOptions as we should.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2018-03-09 15:17:47 +01:00
Kevin Wolf b76b4f6045 qcow2: Use visitor for options in qcow2_create()
Instead of manually creating the BlockdevCreateOptions object, use a
visitor to parse the given options into the QAPI object.

This involves translation from the old command line syntax to the syntax
mandated by the QAPI schema. Option names are still checked against
qcow2_create_opts, so only the old option names are allowed on the
command line, even if they are translated in qcow2_create().

In contrast, new option values are optionally recognised besides the old
values: 'compat' accepts 'v2'/'v3' as an alias for '0.10'/'1.1', and
'encrypt.format' accepts 'qcow' as an alias for 'aes' now.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2018-03-09 15:17:47 +01:00
Kevin Wolf e4b5dad826 qcow2: Handle full/falloc preallocation in qcow2_co_create()
Once qcow2_co_create() can be called directly on an already existing
node, we must provide the 'full' and 'falloc' preallocation modes
outside of creating the image on the protocol layer. Fortunately, we
have preallocated truncate now which can provide this functionality.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-03-09 15:17:47 +01:00
Kevin Wolf 60900b7bd9 qcow2: Use QCryptoBlockCreateOptions in qcow2_co_create()
Instead of passing the encryption format name and the QemuOpts down, use
the QCryptoBlockCreateOptions contained in BlockdevCreateOptions.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-03-09 15:17:47 +01:00
Kevin Wolf e1d74bc6c6 qcow2: Use BlockdevRef in qcow2_co_create()
Instead of passing a separate BlockDriverState* into qcow2_co_create(),
make use of the BlockdevRef that is included in BlockdevCreateOptions.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-03-09 15:17:47 +01:00
Kevin Wolf 29ca9e450d qcow2: Pass BlockdevCreateOptions to qcow2_co_create()
All of the simple options are now passed to qcow2_co_create() in a
BlockdevCreateOptions object. Still missing: node-name and the
encryption options.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2018-03-09 15:17:47 +01:00
Kevin Wolf cbf2b7c404 qcow2: Let qcow2_create() handle protocol layer
Currently, qcow2_create() only parses the QemuOpts and then calls
qcow2_co_create() for the actual image creation, which includes both the
creation of the actual file on the file system and writing a valid empty
qcow2 image into that file.

The plan is that qcow2_co_create() becomes the function that implements
the functionality for a future 'blockdev-create' QMP command, which only
creates the qcow2 layer on an already opened file node.

This is a first step towards that goal: Let's move out anything that
deals with the protocol layer from qcow2_co_create() into
qcow2_create().  This means that qcow2_co_create() doesn't need a file
name any more.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-03-09 15:17:47 +01:00
Kevin Wolf 8ca5bfdc46 qcow2: Rename qcow2_co_create2() to qcow2_co_create()
The functions originally known as qcow2_create() and qcow2_create2()
are now called qcow2_co_create_opts() and qcow2_co_create(), which
matches the names of the BlockDriver callbacks that they will implement
at the end of this patch series.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-03-09 15:17:47 +01:00
Alberto Garcia 0c2ada8136 qcow2: Make qemu-img check detect corrupted L1 tables in snapshots
'qemu-img check' cannot detect if a snapshot's L1 table is corrupted.
This patch checks the table's offset and size and reports corruption
if the values are not valid.

This patch doesn't add code to fix that corruption yet, only to detect
and report it.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-09 15:17:47 +01:00
Alberto Garcia db5794f1f1 qcow2: Check snapshot L1 table in qcow2_snapshot_delete()
This function deletes a snapshot from disk, removing its entry from
the snapshot table, freeing its L1 table and decreasing the refcounts
of all clusters.

The L1 table offset and size are however not validated. If we use
invalid values in this function we'll probably corrupt the image even
more, so we should return an error instead.

We now have a function to take care of this, so let's use it.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-09 15:17:47 +01:00
Alberto Garcia a8475d7573 qcow2: Check snapshot L1 table in qcow2_snapshot_goto()
This function copies a snapshot's L1 table into the active one without
validating it first.

We now have a function to take care of this, so let's use it.

Cc: Eric Blake <eblake@redhat.com>
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-09 15:17:47 +01:00
Alberto Garcia c7a9d81d70 qcow2: Check snapshot L1 tables in qcow2_check_metadata_overlap()
The inactive-l2 overlap check iterates uses the L1 tables from all
snapshots, but it does not validate them first.

We now have a function to take care of this, so let's use it.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-09 15:17:47 +01:00
Alberto Garcia c9a442e450 qcow2: Check L1 table parameters in qcow2_expand_zero_clusters()
This function iterates over all snapshots of a qcow2 file in order to
expand all zero clusters, but it does not validate the snapshots' L1
tables first.

We now have a function to take care of this, so let's use it.

We can also take the opportunity to replace the sector-based
bdrv_read() with bdrv_pread().

Cc: Eric Blake <eblake@redhat.com>
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-09 15:17:47 +01:00
Alberto Garcia 314e8d3928 qcow2: Check L1 table offset in qcow2_snapshot_load_tmp()
This function checks that the size of a snapshot's L1 table is not too
large, but it doesn't validate the offset.

We now have a function to take care of this, so let's use it.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-09 15:17:47 +01:00
Alberto Garcia 0cf0e5980b qcow2: Generalize validate_table_offset() into qcow2_validate_table()
This function checks that the offset and size of a table are valid.

While the offset checks are fine, the size check is too generic, since
it only verifies that the total size in bytes fits in a 64-bit
integer. In practice all tables used in qcow2 have much smaller size
limits, so the size needs to be checked again for each table using its
actual limit.

This patch generalizes this function by allowing the caller to specify
the maximum size for that table. In addition to that it allows passing
an Error variable.

The function is also renamed and made public since we're going to use
it in other parts of the code.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-09 15:17:47 +01:00
Paolo Bonzini 2fd6163884 block: convert bdrv_check callback to coroutine_fn
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1516279431-30424-8-git-send-email-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-09 15:17:47 +01:00
Paolo Bonzini 2b148f392b block: convert bdrv_invalidate_cache callback to coroutine_fn
QED's bdrv_invalidate_cache implementation would like to reuse functions
that acquire/release the metadata locks.  Call it from coroutine context
to simplify the logic.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1516279431-30424-6-git-send-email-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-09 15:17:47 +01:00
Paolo Bonzini 9fb4dfc570 qed: make bdrv_qed_do_open a coroutine_fn
It is called from qed_invalidate_cache in coroutine context (incoming
migration runs in a coroutine), so it's cleaner if metadata is always
loaded from a coroutine.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1516279431-30424-5-git-send-email-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-09 15:17:47 +01:00
Paolo Bonzini 1fafcd9368 qcow2: make qcow2_do_open a coroutine_fn
It is called from qcow2_invalidate_cache in coroutine context (incoming
migration runs in a coroutine), so it's cleaner if metadata is always
loaded from a coroutine.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1516279431-30424-4-git-send-email-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-09 15:17:47 +01:00
Paolo Bonzini 1a0c2bfb03 qcow2: fix flushing after dirty bitmap metadata writes
update_header_sync itself does not need to flush the caches to disk.
The only paths that allocate clusters are:

- bitmap_list_store with in_place=false, called by update_ext_header_and_dir

- store_bitmap_data, called by store_bitmap

- store_bitmap, called by qcow2_store_persistent_dirty_bitmaps and
  followed by update_ext_header_and_dir

So in the end the central place where we need to flush the caches
is update_ext_header_and_dir.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-09 15:17:47 +01:00
Paolo Bonzini 8b220eb7c8 qcow2: introduce qcow2_write_caches and qcow2_flush_caches
They will be used to avoid recursively taking s->lock during
bdrv_open or bdrv_check.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1516279431-30424-7-git-send-email-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-09 15:17:47 +01:00
Daniel P. Berrange f87e08f9e2 block: implement the bdrv_reopen_prepare helper for LUKS driver
If the bdrv_reopen_prepare helper isn't provided, the qemu-img commit
command fails to re-open the base layer after committing changes into
it. Provide a no-op implementation for the LUKS driver, since there
is not any custom work that needs doing to re-open it.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-09 15:17:47 +01:00
Deepa Srinivasan c060332c76 block: Fix qemu crash when using scsi-block
Starting qemu with the following arguments causes qemu to segfault:
... -device lsi,id=lsi0 -drive file=iscsi:<...>,format=raw,if=none,node-name=
iscsi1 -device scsi-block,bus=lsi0.0,id=<...>,drive=iscsi1

This patch fixes blk_aio_ioctl() so it does not pass stack addresses to
blk_aio_ioctl_entry() which may be invoked after blk_aio_ioctl() returns. More
details about the bug follow.

blk_aio_ioctl() invokes blk_aio_prwv() with blk_aio_ioctl_entry as the
coroutine parameter. blk_aio_prwv() ultimately calls aio_co_enter().

When blk_aio_ioctl() is executed from within a coroutine context (e.g.
iscsi_bh_cb()), aio_co_enter() adds the coroutine (blk_aio_ioctl_entry) to
the current coroutine's wakeup queue. blk_aio_ioctl() then returns.

When blk_aio_ioctl_entry() executes later, it accesses an invalid pointer:
....
    BlkRwCo *rwco = &acb->rwco;

    rwco->ret = blk_co_ioctl(rwco->blk, rwco->offset,
                             rwco->qiov->iov[0].iov_base);  <--- qiov is
                                                                 invalid here
...

In the case when blk_aio_ioctl() is called from a non-coroutine context,
blk_aio_ioctl_entry() executes immediately. But if bdrv_co_ioctl() calls
qemu_coroutine_yield(), blk_aio_ioctl() will return. When the coroutine
execution is complete, control returns to blk_aio_ioctl_entry() after the call
to blk_co_ioctl(). There is no invalid reference after this point, but the
function is still holding on to invalid pointers.

The fix is to change blk_aio_prwv() to accept a void pointer for the IO buffer
rather than a QEMUIOVector. blk_aio_prwv() passes this through in BlkRwCo and the
coroutine function casts it to QEMUIOVector or uses the void pointer directly.

Signed-off-by: Deepa Srinivasan <deepa.srinivasan@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Mark Kanda <mark.kanda@oracle.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-03-08 15:43:11 +00:00
Peter Maydell 58e2e17dba Block layer patches
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJanYJPAAoJEH8JsnLIjy/WxjUQAJA+DTOmGXvaNpMs65BrU79K
 /r/iGVrzHv/RMLmrWMnqj96W9SnpMuiAP9hVLNsekqClY9q4ME4DpGcXhWfhSvF5
 FC51ehvFJdfo8cPorsevcqNj60iWebjcx3lFfUq2606UOyYih3oijYxr6gSwWbRc
 GAgdGMqsvGYpzgqAQVEWHUhaX0La49/OzY42aR+E+LCBNfTYvlydvyoc+tUTdIpW
 1eM/ASGndGsN0Cf2vxlbKgJ0/P6v+cRZuuIDhKZqre+YG+yM+pq7yZb+o7nf/P36
 TPR93BsT7FSVAizRK7VFRuPIynHpiaxYygrJERCXF0sxsV4OlKjpmt/uUPamWFh+
 46Jx2NK1AuAx87BdErgmA119ObO3oAPxK0+2p981obb6SphTbbPxDj6SOlYCt4mJ
 mhff4JtIiwCmDSckAwd2mkBI1Tvl9qqcELrpyd2t2eU4ec2vf7fPd85EsK/Mq6Kr
 dbfqFvjNaaMxChoqFgkHAveYJ7zYqRFI2IY5o9c1QyZehCGPWjScxHXZZYdpDl59
 YF9DkYQDOyvEX2jmMECaO1r/0nnO+BqQHu5ItJuTte9rjP9Q0do3iBISiIefewtf
 yji6/QNn2hFrnr1HPAwLFFC3kPgc8Mq8mIUb53j8vG/01KhVRCcnJm2K6D4IUwLZ
 S6ZnQJB97eE4y7YR5dNt
 =2axz
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches

# gpg: Signature made Mon 05 Mar 2018 17:45:51 GMT
# gpg:                using RSA key 7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream: (38 commits)
  block: Fix NULL dereference on empty drive error
  qcow2: Replace align_offset() with ROUND_UP()
  block/ssh: Add basic .bdrv_truncate()
  block/ssh: Make ssh_grow_file() blocking
  block/ssh: Pull ssh_grow_file() from ssh_create()
  qemu-img: Make resize error message more general
  qcow2: make qcow2_co_create2() a coroutine_fn
  block: rename .bdrv_create() to .bdrv_co_create_opts()
  Revert "IDE: Do not flush empty CDROM drives"
  block: test blk_aio_flush() with blk->root == NULL
  block: add BlockBackend->in_flight counter
  block: extract AIO_WAIT_WHILE() from BlockDriverState
  aio: rename aio_context_in_iothread() to in_aio_context_home_thread()
  docs: document how to use the l2-cache-entry-size parameter
  specs/qcow2: Fix documentation of the compressed cluster descriptor
  iotest 033: add misaligned write-zeroes test via truncate
  block: fix write with zero flag set and iovector provided
  block: Drop unused .bdrv_co_get_block_status()
  vvfat: Switch to .bdrv_co_block_status()
  vpc: Switch to .bdrv_co_block_status()
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

# Conflicts:
#	include/block/block.h
2018-03-06 11:20:44 +00:00
Kevin Wolf bfe1a14c18 block: Fix NULL dereference on empty drive error
blk_error_action() sends a BLOCK_IO_ERROR QMP event which includes the
node name of its root node. If the BlockBackend represents an empty
drive, there is no root node, so we should not try to access its node
name. Make the field optional in the event and include it only when
the BlockBackend isn't empty.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2018-03-05 18:45:32 +01:00
Markus Armbruster 112ed241f5 qapi: Empty out qapi-schema.json
The previous commit improved compile time by including less of the
generated QAPI headers.  This is impossible for stuff defined directly
in qapi-schema.json, because that ends up in headers that that pull in
everything.

Move everything but include directives from qapi-schema.json to new
sub-module qapi/misc.json, then include just the "misc" shard where
possible.

It's possible everywhere, except:

* monitor.c needs qmp-command.h to get qmp_init_marshal()

* monitor.c, ui/vnc.c and the generated qapi-event-FOO.c need
  qapi-event.h to get enum QAPIEvent

Perhaps we'll get rid of those some other day.

Adding a type to qapi/migration.json now recompiles some 120 instead
of 2300 out of 5100 objects.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180211093607.27351-25-armbru@redhat.com>
[eblake: rebase to master]
Signed-off-by: Eric Blake <eblake@redhat.com>
2018-03-02 13:45:50 -06:00
Markus Armbruster 9af2398977 Include less of the generated modular QAPI headers
In my "build everything" tree, a change to the types in
qapi-schema.json triggers a recompile of about 4800 out of 5100
objects.

The previous commit split up qmp-commands.h, qmp-event.h, qmp-visit.h,
qapi-types.h.  Each of these headers still includes all its shards.
Reduce compile time by including just the shards we actually need.

To illustrate the benefits: adding a type to qapi/migration.json now
recompiles some 2300 instead of 4800 objects.  The next commit will
improve it further.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180211093607.27351-24-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
[eblake: rebase to master]
Signed-off-by: Eric Blake <eblake@redhat.com>
2018-03-02 13:45:50 -06:00
Markus Armbruster 0dd13589b0 Include qapi/qmp/qerror.h exactly where needed
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180211093607.27351-2-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2018-03-02 13:14:08 -06:00
Alberto Garcia 9e029689e1 qcow2: Replace align_offset() with ROUND_UP()
The align_offset() function is equivalent to the ROUND_UP() macro so
there's no need to use the former. The ROUND_UP() name is also a bit
more explicit.

This patch uses ROUND_UP() instead of the slower QEMU_ALIGN_UP()
because align_offset() already requires that the second parameter is a
power of two.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20180215131008.5153-1-berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-03-02 18:39:56 +01:00
Max Reitz 624f3006b8 block/ssh: Add basic .bdrv_truncate()
libssh2 does not seem to offer real truncation support, so we can only
grow files -- but that is better than nothing.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180214204915.7980-4-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-03-02 18:39:56 +01:00
Max Reitz bd8e0e32da block/ssh: Make ssh_grow_file() blocking
At runtime (that is, during a future ssh_truncate()), the SSH session is
non-blocking.  However, ssh_truncate() (or rather, bdrv_truncate() in
general) is not a coroutine, so this resize operation needs to block.

For ssh_create(), that is fine, too; the session is never set to
non-blocking anyway.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180214204915.7980-3-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-03-02 18:39:56 +01:00
Max Reitz 2b12a756ac block/ssh: Pull ssh_grow_file() from ssh_create()
If we ever want to offer even rudimentary truncation functionality for
ssh, we should put the respective code into a reusable function.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180214204915.7980-2-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-03-02 18:39:56 +01:00
Stefan Hajnoczi c274393a3e qcow2: make qcow2_co_create2() a coroutine_fn
qcow2_create2() calls qemu_co_mutex_lock().  Only a coroutine_fn may
call another coroutine_fn.  In fact, qcow2_create2 is always called from
coroutine context.

Rename the function to add the "co" moniker and add coroutine_fn.

Reported-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20170705102231.20711-3-stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-02 18:39:07 +01:00
Stefan Hajnoczi efc75e2a4c block: rename .bdrv_create() to .bdrv_co_create_opts()
BlockDriver->bdrv_create() has been called from coroutine context since
commit 5b7e1542cf ("block: make
bdrv_create adopt coroutine").

Make this explicit by renaming to .bdrv_co_create_opts() and add the
coroutine_fn annotation.  This makes it obvious to block driver authors
that they may yield, use CoMutex, or other coroutine_fn APIs.
bdrv_co_create is reserved for the QAPI-based version that Kevin is
working on.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20170705102231.20711-2-stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-02 18:39:07 +01:00
Stefan Hajnoczi 33f2a75777 block: add BlockBackend->in_flight counter
BlockBackend currently relies on BlockDriverState->in_flight to track
requests for blk_drain().  There is a corner case where
BlockDriverState->in_flight cannot be used though: blk->root can be NULL
when there is no medium.  This results in a segfault when the NULL
pointer is dereferenced.

Introduce a BlockBackend->in_flight counter for aio requests so it works
even when blk->root == NULL.

Based on a patch by Kevin Wolf <kwolf@redhat.com>.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-02 18:39:07 +01:00
Stefan Hajnoczi 7719f3c968 block: extract AIO_WAIT_WHILE() from BlockDriverState
BlockDriverState has the BDRV_POLL_WHILE() macro to wait on event loop
activity while a condition evaluates to true.  This is used to implement
synchronous operations where it acts as a condvar between the IOThread
running the operation and the main loop waiting for the operation.  It
can also be called from the thread that owns the AioContext and in that
case it's just a nested event loop.

BlockBackend needs this behavior but doesn't always have a
BlockDriverState it can use.  This patch extracts BDRV_POLL_WHILE() into
the AioWait abstraction, which can be used with AioContext and isn't
tied to BlockDriverState anymore.

This feature could be built directly into AioContext but then all users
would kick the event loop even if they signal different conditions.
Imagine an AioContext with many BlockDriverStates, each time a request
completes any waiter would wake up and re-check their condition.  It's
nicer to keep a separate AioWait object for each condition instead.

Please see "block/aio-wait.h" for details on the API.

The name AIO_WAIT_WHILE() avoids the confusion between AIO_POLL_WHILE()
and AioContext polling.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-02 18:39:07 +01:00
Anton Nefedov 18a59f03c3 block: fix write with zero flag set and iovector provided
The normal bdrv_co_pwritev() use is either
  - BDRV_REQ_ZERO_WRITE clear and iovector provided
  - BDRV_REQ_ZERO_WRITE set and iovector == NULL

while
  - the flag clear and iovector == NULL is an assertion failure
    in bdrv_co_do_zero_pwritev()
  - the flag set and iovector provided is in fact allowed
    (the flag prevails and zeroes are written)

However the alignment logic does not support the latter case so the padding
areas get overwritten with zeroes.

Currently, general functions like bdrv_rw_co() do provide iovector
regardless of flags. So, keep it supported and use bdrv_co_do_zero_pwritev()
alignment for it which also makes the code a bit more obvious anyway.

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-02 18:39:07 +01:00
Eric Blake 636cb51258 block: Drop unused .bdrv_co_get_block_status()
We are gradually moving away from sector-based interfaces, towards
byte-based.  Now that all drivers have been updated to provide the
byte-based .bdrv_co_block_status(), we can delete the sector-based
interface.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-02 18:39:07 +01:00
Eric Blake fba3998dae vvfat: Switch to .bdrv_co_block_status()
We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the vvfat driver accordingly.  Note that we
can rely on the block driver having already clamped limits to our
block size, and simplify accordingly.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-02 18:39:07 +01:00
Eric Blake 2f83673b57 vpc: Switch to .bdrv_co_block_status()
We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the vpc driver accordingly.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-02 18:39:07 +01:00
Eric Blake c72080b9b8 vmdk: Switch to .bdrv_co_block_status()
We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the vmdk driver accordingly.  Drop the
now-unused vmdk_find_index_in_cluster().

Also, fix a pre-existing bug: if find_extent() fails (unlikely,
since the block layer did a bounds check), then we must return a
failure, rather than 0.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-02 18:39:07 +01:00
Eric Blake 67635f6abe vdi: Switch to .bdrv_co_block_status()
We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the vdi driver accordingly.  Note that the
TODO is already covered (the block layer guarantees bounds of its
requests), and that we can remove the now-unused s->block_sectors.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-02 18:39:07 +01:00
Eric Blake b80666bf84 vdi: Avoid bitrot of debugging code
Rework the debug define so that we always get -Wformat checking,
even when debugging is disabled.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-02 18:39:07 +01:00
Eric Blake 47943e9865 sheepdog: Switch to .bdrv_co_block_status()
We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the sheepdog driver accordingly.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-02 18:39:07 +01:00
Eric Blake d41aa7e36f raw: Switch to .bdrv_co_block_status()
We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the raw driver accordingly.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-02 18:39:07 +01:00
Eric Blake b8d739fd6f qed: Switch to .bdrv_co_block_status()
We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the qed driver accordingly, taking the opportunity
to inline qed_is_allocated_cb() into its lone caller (the callback
used to be important, until we switched qed to coroutines).  There is
no intent to optimize based on the want_zero flag for this format.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-02 18:39:07 +01:00
Eric Blake a320fb04b6 qcow2: Switch to .bdrv_co_block_status()
We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the qcow2 driver accordingly.

For now, we are ignoring the 'want_zero' hint.  However, it should
be relatively straightforward to honor the hint as a way to return
larger *pnum values when we have consecutive clusters with the same
data/zero status but which differ only in having non-consecutive
mappings.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-02 18:39:07 +01:00
Eric Blake d63b4c93e3 qcow: Switch to .bdrv_co_block_status()
We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the qcow driver accordingly.  There is no
intent to optimize based on the want_zero flag for this format.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-02 18:39:07 +01:00
Eric Blake 8e0cf59d02 parallels: Switch to .bdrv_co_block_status()
We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the parallels driver accordingly.  Note that
the internal function block_status() is still sector-based, because
it is still in use by other sector-based functions; but that's okay
because request_alignment is 512 as a result of those functions.
For now, no optimizations are added based on the mapping hint.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-02 18:39:07 +01:00
Eric Blake 05c33f1021 null: Switch to .bdrv_co_block_status()
We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the null driver accordingly.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-02 18:39:07 +01:00
Eric Blake 92809c3600 iscsi: Switch to .bdrv_co_block_status()
We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the iscsi driver accordingly.  In this case,
it is handy to teach iscsi_co_block_status() to handle a NULL map
and file parameter, even though the block layer passes non-NULL
values, because we also call the function directly.  For now, there
are no optimizations done based on the want_zero flag.

We can also make the simplification of asserting that the block
layer passed in aligned values.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-02 18:39:07 +01:00
Eric Blake 04a408fbff iscsi: Switch iscsi_allocmap_update() to byte-based
We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Convert all uses of
the allocmap (no semantic change).  Callers that already had bytes
available are simpler, and callers that now scale to bytes will be
easier to switch to byte-based in the future.

Signed-off-by: Eric Blake <eblake@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-02 18:39:07 +01:00
Eric Blake ba059e7b17 iscsi: Switch cluster_sectors to byte-based
We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Convert all uses of
the cluster size in sectors, along with adding assertions that we
are not dividing by zero.

Improve some comment grammar while in the area.

Signed-off-by: Eric Blake <eblake@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-02 18:39:07 +01:00
Eric Blake 08c9e7735e gluster: Switch to .bdrv_co_block_status()
We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the gluster driver accordingly.

In want_zero mode, we continue to report fine-grained hole
information (the caller wants as much mapping detail as possible);
but when not in that mode, the caller prefers larger *pnum and
merely cares about what offsets are allocated at this layer, rather
than where the holes live.  Since holes still read as zeroes at
this layer (rather than deferring to a backing layer), we can take
the shortcut of skipping find_allocation(), and merely state that
all bytes are allocated.

We can also drop redundant bounds checks that are already
guaranteed by the block layer.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-02 18:39:07 +01:00
Eric Blake a290f08590 file-posix: Switch to .bdrv_co_block_status()
We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the file protocol driver accordingly.

In want_zero mode, we continue to report fine-grained hole
information (the caller wants as much mapping detail as possible);
but when not in that mode, the caller prefers larger *pnum and
merely cares about what offsets are allocated at this layer, rather
than where the holes live.  Since holes still read as zeroes at
this layer (rather than deferring to a backing layer), we can take
the shortcut of skipping lseek(), and merely state that all bytes
are allocated.

We can also drop redundant bounds checks that are already
guaranteed by the block layer.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-02 18:39:07 +01:00
Eric Blake 3e4d0e72b7 block: Switch passthrough drivers to .bdrv_co_block_status()
We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the generic helpers, and all passthrough clients
(blkdebug, commit, mirror, throttle) accordingly.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-02 18:39:07 +01:00
Eric Blake e3efee828b nvme: Drop pointless .bdrv_co_get_block_status()
Commit bdd6a90 has a bug: drivers should never directly set
BDRV_BLOCK_ALLOCATED, but only io.c should do that (as needed).
Instead, drivers should report BDRV_BLOCK_DATA if it knows that
data comes from this BDS.

But let's look at the bigger picture: semantically, the nvme
driver is similar to the nbd, null, and raw drivers (no backing
file, all data comes from this BDS).  But while two of those
other drivers have to supply the callback (null because it can
special-case BDRV_BLOCK_ZERO, raw because it can special-case
a different offset), in this case the block layer defaults are
good enough without the callback at all (similar to nbd).

So, fix the bug by deletion ;)

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-02 18:39:07 +01:00
Eric Blake 86a3d5c688 block: Add .bdrv_co_block_status() callback
We are gradually moving away from sector-based interfaces, towards
byte-based. Now that the block layer exposes byte-based allocation,
it's time to tackle the drivers.  Add a new callback that operates
on as small as byte boundaries. Subsequent patches will then update
individual drivers, then finally remove .bdrv_co_get_block_status().

The new code also passes through the 'want_zero' hint, which will
allow subsequent patches to further optimize callers that only care
about how much of the image is allocated (want_zero is false),
rather than full details about runs of zeroes and which offsets the
allocation actually maps to (want_zero is true).  As part of this
effort, fix another part of the documentation: the claim in commit
4c41cb4 that BDRV_BLOCK_ALLOCATED is short for 'DATA || ZERO' is a
lie at the block layer (see commit e88ae2264), even though it is
how the bit is computed from the driver layer.  After all, there
are intentionally cases where we return ZERO but not ALLOCATED at
the block layer, when we know that a read sees zero because the
backing file is too short.  Note that the driver interface is thus
slightly different than the public interface with regards to which
bits will be set, and what guarantees are provided on input.

We also add an assertion that any driver using the new callback will
make progress (the only time pnum will be 0 is if the block layer
already handled an out-of-bounds request, or if there is an error);
the old driver interface did not provide this guarantee, which
could lead to some inf-loops in drastic corner-case failures.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-02 18:39:07 +01:00
Eric Blake fd8d372dd3 nbd: Honor server's advertised minimum block size
Commit 79ba8c98 (v2.7) changed the setting of request_alignment
to occur only during bdrv_refresh_limits(), rather than at at
bdrv_open() time; but at the time, NBD was unaffected, because
it still used sector-based callbacks, so the block layer
defaulted NBD to use 512 request_alignment.

Later, commit 70c4fb26 (also v2.7) changed NBD to use byte-based
callbacks, without setting request_alignment.  This resulted in
NBD using request_alignment of 1, which works great when the
server supports it (as is the case for qemu-nbd), but falls apart
miserably if the server requires alignment (but only if qemu
actually sends a sub-sector request; qemu-io can do it, but
most qemu operations still perform on sectors or larger).

Even later, the NBD protocol was updated to document that clients
should learn the server's minimum alignment during NBD_OPT_GO;
and recommended that clients should assume a minimum size of 512
unless the server understands NBD_OPT_GO and replied with a smaller
size.  Commit 081dd1fe (v2.10) attempted to do that, by assigning
request_alignment to whatever was learned from the server; but
it has two flaws: the assignment is done during bdrv_open() so
it gets unconditionally wiped out back to 1 during any later
bdrv_refresh_limits(); and the code is not using a default of 512
when the server did not report a minimum size.

Fix these issues by moving the assignment to request_alignment
to the right function, and by using a sane default when the
server does not advertise a minimum size.

CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180215032905.27146-1-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy<vsementsov@virtuozzo.com>
2018-03-01 14:02:32 -06:00
Paolo Bonzini 78d8c99e29 block/nvme: fix Coverity reports
1) string not null terminated in sysfs_find_group_file

2) NULL pointer dereference and dead local variable in nvme_init.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>

Message-Id: <20180213015240.9352-1-famz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
2018-03-01 15:21:46 +08:00
Alberto Garcia 1221fe6f63 qcow2: Allow configuring the L2 slice size
Now that the code is ready to handle L2 slices we can finally add an
option to allow configuring their size.

An L2 slice is the portion of an L2 table that is read by the qcow2
cache. Until now the cache was always reading full L2 tables, and
since the L2 table size is equal to the cluster size this was not very
efficient with large clusters. Here's a more detailed explanation of
why it makes sense to have smaller cache entries in order to load L2
data:

   https://lists.gnu.org/archive/html/qemu-block/2017-09/msg00635.html

This patch introduces a new command-line option to the qcow2 driver
named l2-cache-entry-size (cf. l2-cache-size). The cache entry size
has the same restrictions as the cluster size: it must be a power of
two and it has the same range of allowed values, with the additional
requirement that it must not be larger than the cluster size.

The L2 cache entry size (L2 slice size) remains equal to the cluster
size for now by default, so this feature must be explicitly enabled.
Although my tests show that 4KB slices consistently improve
performance and give the best results, let's wait and make more tests
with different cluster sizes before deciding on an optimal default.

Now that the cache entry size is not necessarily equal to the cluster
size we need to reflect that in the MIN_L2_CACHE_SIZE documentation.
That minimum value is a requirement of the COW algorithm: we need to
read two L2 slices (and not two L2 tables) in order to do COW, see
l2_allocate() for the actual code.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: c73e5611ff4a9ec5d20de68a6c289553a13d2354.1517840877.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-02-13 17:00:00 +01:00
Alberto Garcia dd32c88108 qcow2: Rename l2_table in count_cow_clusters()
This function doesn't need any changes to support L2 slices, but since
it's now dealing with slices intead of full tables, the l2_table
variable is renamed for clarity.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 6107001fc79e6739242f1de7d191375e4f130aac.1517840877.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-02-13 17:00:00 +01:00
Alberto Garcia c26f10ba89 qcow2: Rename l2_table in count_contiguous_clusters_unallocated()
This function doesn't need any changes to support L2 slices, but since
it's now dealing with slices instead of full tables, the l2_table
variable is renamed for clarity.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 78bcc54bc632574dd0b900a77a00a1b6ffc359e6.1517840877.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-02-13 17:00:00 +01:00
Alberto Garcia 13f893c47d qcow2: Rename l2_table in count_contiguous_clusters()
This function doesn't need any changes to support L2 slices, but since
it's now dealing with slices intead of full tables, the l2_table
variable is renamed for clarity.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 812b0c3505bb1687e51285dccf1a94f0cecb1f74.1517840877.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-02-13 17:00:00 +01:00
Alberto Garcia e4e7254829 qcow2: Rename l2_table in qcow2_alloc_compressed_cluster_offset()
This function doesn't need any changes to support L2 slices, but since
it's now dealing with slices instead of full tables, the l2_table
variable is renamed for clarity.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 0c5d4b9bf163aa3b49ec19cc512a50d83563f2ad.1517840877.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-02-13 17:00:00 +01:00