qemu-e2k

Author	SHA1	Message	Date
Kevin Wolf	d0ee0204f4	block: Remove wrong bdrv_set_aio_context() calls The mirror and commit block jobs use bdrv_set_aio_context() to move their filter node into the right AioContext before hooking it up in the graph. Similarly, bdrv_open_backing_file() explicitly moves the backing file node into the right AioContext first. This isn't necessary any more, they get automatically moved into the right context now when attaching them. However, in the case of bdrv_open_backing_file() with a node reference, it's actually not only unnecessary, but even wrong: The unchecked bdrv_set_aio_context() changes the AioContext of the child node even if other parents require it to retain the old context. So this is not only a simplification, but a bug fix, too. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1684342 Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-06-04 15:22:22 +02:00
Kevin Wolf	132ada80c4	block: Adjust AioContexts when attaching nodes So far, we only made sure that updating the AioContext of a node affected the whole subtree. However, if a node is newly attached to a new parent, we also need to make sure that both the subtree of the node and the parent are in the same AioContext. This tries to move the new child node to the parent AioContext and returns an error if this isn't possible. BlockBackends now actually apply their AioContext to their root node. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-06-04 15:22:22 +02:00
Kevin Wolf	d861ab3acf	block: Add BlockBackend.ctx This adds a new parameter to blk_new() which requires its callers to declare from which AioContext this BlockBackend is going to be used (or the locks of which AioContext need to be taken anyway). The given context is only stored and kept up to date when changing AioContexts. Actually applying the stored AioContext to the root node is saved for another commit. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-06-04 15:22:22 +02:00
Kevin Wolf	97896a4887	block: Add Error to blk_set_aio_context() Add an Error parameter to blk_set_aio_context() and use bdrv_child_try_set_aio_context() internally to check whether all involved nodes can actually support the AioContext switch. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-06-04 15:22:22 +02:00
Julia Suvorova	2b02fd81de	block/linux-aio: Drop unused BlockAIOCB submission method Callback-based laio_submit() and laio_cancel() were left after rewriting Linux AIO backend to coroutines in hope that they would be used in other code that could bypass coroutines. They can be safely removed because they have not been used since that time. Signed-off-by: Julia Suvorova <jusual@mail.ru> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-06-04 15:20:41 +02:00
Max Reitz	5cb2737e92	block/io: Delay decrementing the quiesce_counter When ending a drained section, bdrv_do_drained_end() currently first decrements the quiesce_counter, and only then actually ends the drain. The bdrv_drain_invoke(bs, false) call may cause graph changes. Say the graph change involves replacing an existing BB's ("blk") BDS (blk_bs(blk)) by @bs. Let us introducing the following values: - bs_oqc = old_quiesce_counter (so bs->quiesce_counter == bs_oqc - 1) - obs_qc = blk_bs(blk)->quiesce_counter (before bdrv_drain_invoke()) Let us assume there is no blk_pread_unthrottled() involved, so blk->quiesce_counter == obs_qc (before bdrv_drain_invoke()). Now replacing blk_bs(blk) by @bs will reduce blk->quiesce_counter by obs_qc (making it 0) and increase it by bs_oqc-1 (making it bs_oqc-1). bdrv_drain_invoke() returns and we invoke bdrv_parent_drained_end(). This will decrement blk->quiesce_counter by one, so it would be -1 -- were there not an assertion against that in blk_root_drained_end(). We therefore have to keep the quiesce_counter up at least until bdrv_drain_invoke() returns, so that bdrv_parent_drained_end() does the right thing for the parents @bs got during bdrv_drain_invoke(). But let us delay it even further, namely until bdrv_parent_drained_end() returns, because then it mirrors bdrv_do_drained_begin(): There, we first increment the quiesce_counter, then begin draining the parents, and then call bdrv_drain_invoke(). It makes sense to let bdrv_do_drained_end() unravel this exactly in reverse. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-06-04 15:20:41 +02:00
Vladimir Sementsov-Ogievskiy	69f47505ee	block: avoid recursive block_status call if possible drv_co_block_status digs bs->file for additional, more accurate search for hole inside region, reported as DATA by bs since `5daa74a6eb`. This accuracy is not free: assume we have qcow2 disk. Actually, qcow2 knows, where are holes and where is data. But every block_status request calls lseek additionally. Assume a big disk, full of data, in any iterative copying block job (or img convert) we'll call lseek(HOLE) on every iteration, and each of these lseeks will have to iterate through all metadata up to the end of file. It's obviously ineffective behavior. And for many scenarios we don't need this lseek at all. However, lseek is needed when we have metadata-preallocated image. So, let's detect metadata-preallocation case and don't dig qcow2's protocol file in other cases. The idea is to compare allocation size in POV of filesystem with allocations size in POV of Qcow2 (by refcounts). If allocation in fs is significantly lower, consider it as metadata-preallocation case. 102 iotest changed, as our detector can't detect shrinked file as metadata-preallocation, which don't seem to be wrong, as with metadata preallocation we always have valid file length. Two other iotests have a slight change in their QMP output sequence: Active 'block-commit' returns earlier because the job coroutine yields earlier on a blocking operation. This operation is loading the refcount blocks in qcow2_detect_metadata_preallocation(). Suggested-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-06-04 15:20:41 +02:00
Peter Maydell	62f6849e7a	Pull request -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE+ber27ys35W+dsvQfe+BBqr8OQ4FAlztyykACgkQfe+BBqr8 OQ4EaA//QQVDpIBRcMN+LeKWeEs8VSLziPUrZuFvuhMEEnjnaU6gbKq8G8xbFQ62 JIHg0DBGhTt8ymE9Ay6O/cooR8F0z+XyfDr7UlpI7JL/Uwl7JguGKQrWUYBRMqCv Q2cLaWStLkfdkuW7Y3WRc16VEnIlizDxjRzfjE2ESYpuzD2fFsBY3KZbgbJwYwZw SujWUQ3MdsNdw5kDmerlrDUy7r/eyl2cLXyIt6ClHNoqq392oGMoUn4XbsaLnCWE H5s46qm33eXtvBHqxVGoOMAli5FwCnhwF+H3xg93jIG6vC/RXQYCIhlEmEwKyrU2 g2DWWe/8+9b0iX+zTIcAPTcn1pmjVivGRorOurP0AtMtjV/8PvV+hAQQeSg2ARB3 rLpXaEphD4WTwu7mYlZ5kX0qvX2SftaMU08k1IgR3mfo8Z3X9znVoFIv8HLlHuy+ OhCmwT5OWYw4mNABTXeBMH/Dcs9EcU4+T/KhAGLReHo18CSyjeT2xsT+XCsETagF KlAP88dP0EdJ9Oiccyb8as22u7ygKWIiDYPplBdb4SkKg/koQnYGDjeDAzB2vXS3 cGVhGJD2DBbcePA8iaCfWzsSCDOTBFQLa45uhPD3DnkAJylhecSsiDQP+IrLslK3 h/8v9e8MAlHMgrueSnS7foMDI9rdrTNsChuNCJWOOaUI/ZWnXFg= =kCrN -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/jnsnow/tags/bitmaps-pull-request' into staging Pull request # gpg: Signature made Wed 29 May 2019 00:58:33 BST # gpg: using RSA key F9B7ABDBBCACDF95BE76CBD07DEF8106AAFC390E # gpg: Good signature from "John Snow (John Huston) <jsnow@redhat.com>" [full] # Primary key fingerprint: FAEB 9711 A12C F475 812F 18F2 88A9 064D 1835 61EB # Subkey fingerprint: F9B7 ABDB BCAC DF95 BE76 CBD0 7DEF 8106 AAFC 390E * remotes/jnsnow/tags/bitmaps-pull-request: iotests: test external snapshot with bitmap copying qapi: support external bitmaps in block-dirty-bitmap-merge migration/dirty-bitmaps: change bitmap enumeration method Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2019-05-30 12:10:27 +01:00
Vladimir Sementsov-Ogievskiy	eff0829b07	qapi: support external bitmaps in block-dirty-bitmap-merge Add new optional parameter making possible to merge bitmaps from different nodes. It is needed to maintain external snapshots during incremental backup chain history. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190517152111.206494-2-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-05-28 19:33:31 -04:00
Andrey Shinkevich	6388903e7c	qcow2-bitmap: initialize bitmap directory alignment Valgrind detects multiple issues in QEMU iotests when the memory is used without being initialized. Valgrind may dump lots of unnecessary reports what makes the memory issue analysis harder. Particularly, that is true for the aligned bitmap directory and can be seen while running the iotest #169. Padding the aligned space with zeros eases the pain. Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Message-id: 1558961521-131620-1-git-send-email-andrey.shinkevich@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-05-28 20:30:55 +02:00
Anton Nefedov	c8bb23cbdb	qcow2: skip writing zero buffers to empty COW areas If COW areas of the newly allocated clusters are zeroes on the backing image, efficient bdrv_write_zeroes(flags=BDRV_REQ_NO_FALLBACK) can be used on the whole cluster instead of writing explicit zero buffers later in perform_cow(). iotest 060: write to the discarded cluster does not trigger COW anymore. Use a backing image instead. Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com> Message-id: 20190516142749.81019-2-anton.nefedov@virtuozzo.com Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-05-28 20:30:55 +02:00
Alberto Garcia	b441dc71c0	block: Make bdrv_root_attach_child() unref child_bs on failure A consequence of the previous patch is that bdrv_attach_child() transfers the reference to child_bs from the caller to parent_bs, which will drop it on bdrv_close() or when someone calls bdrv_unref_child(). But this only happens when bdrv_attach_child() succeeds. If it fails then the caller is responsible for dropping the reference to child_bs. This patch makes bdrv_attach_child() take the reference also when there is an error, freeing the caller for having to do it. A similar situation happens with bdrv_root_attach_child(), so the changes on this patch affect both functions. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-id: 20dfb3d9ccec559cdd1a9690146abad5d204a186.1557754872.git.berto@igalia.com [mreitz: Removed now superfluous BdrvChild * variable in bdrv_open_child()] Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-05-28 20:30:55 +02:00
Vladimir Sementsov-Ogievskiy	ae6b12fa4c	block/backup: refactor: split out backup_calculate_cluster_size Split out cluster_size calculation. Move copy-bitmap creation above block-job creation, as we are going to share it with upcoming backup-top filter, which also should be created before actual block job creation. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190429090842.57910-6-vsementsov@virtuozzo.com [mreitz: Dropped a paragraph from the commit message that was left over from a previous version] Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-05-28 20:30:55 +02:00
Vladimir Sementsov-Ogievskiy	c334e897d0	block/backup: unify different modes code path Do full, top and incremental mode copying all in one place. This unifies the code path and helps further improvements. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190429090842.57910-5-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-05-28 20:30:55 +02:00
Vladimir Sementsov-Ogievskiy	9eb5a248f3	block/backup: refactor and tolerate unallocated cluster skipping Split allocation checking to separate function and reduce nesting. Consider bdrv_is_allocated() fail as allocated area, as copying more than needed is not wrong (and we do it anyway) and seems better than fail the whole job. And, most probably we will fail on the next read, if there are real problem with source. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190429090842.57910-4-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-05-28 20:30:55 +02:00
Vladimir Sementsov-Ogievskiy	a8389e315e	block/backup: move to copy_bitmap with granularity We are going to share this bitmap between backup and backup-top filter driver, so let's share something more meaningful. It also simplifies some calculations. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190429090842.57910-3-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-05-28 20:30:55 +02:00
Vladimir Sementsov-Ogievskiy	c2da3413c0	block/backup: simplify backup_incremental_init_copy_bitmap Simplify backup_incremental_init_copy_bitmap using the function bdrv_dirty_bitmap_next_dirty_area. Note: move to job->len instead of bitmap size: it should not matter but less code. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190429090842.57910-2-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-05-28 20:30:55 +02:00
Vladimir Sementsov-Ogievskiy	8ac0f15f33	qcow2: do encryption in threads Do encryption/decryption in threads, like it is already done for compression. This improves asynchronous encrypted io. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190506142741.41731-9-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-05-28 20:30:55 +02:00
Vladimir Sementsov-Ogievskiy	5447c3a03f	qcow2: bdrv_co_pwritev: move encryption code out of the lock Encryption will be done in threads, to take benefit of it, we should move it out of the lock first. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190506142741.41731-8-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-05-28 20:30:55 +02:00
Vladimir Sementsov-Ogievskiy	f24196d388	qcow2: qcow2_co_preadv: improve locking Background: decryption will be done in threads, to take benefit of it, we should move it out of the lock first. But let's go further: it turns out, that only qcow2_get_cluster_offset() needs locking, so reduce locking to it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190506142741.41731-7-vsementsov@virtuozzo.com Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-05-28 20:30:55 +02:00
Vladimir Sementsov-Ogievskiy	6f13a316dd	qcow2-threads: split out generic path Move generic part out of qcow2_co_do_compress, to reuse it for encryption and rename things that would be shared with encryption path. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190506142741.41731-6-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-05-28 20:30:55 +02:00
Vladimir Sementsov-Ogievskiy	0f5636c51c	qcow2-threads: qcow2_co_do_compress: protect queuing by mutex Drop dependence on AioContext lock. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190506142741.41731-5-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-05-28 20:30:55 +02:00
Vladimir Sementsov-Ogievskiy	269062efc8	qcow2-threads: use thread_pool_submit_co Use thread_pool_submit_co, instead of reinventing it here. Note, that thread_pool_submit_aio() never returns NULL, so checking it was an extra thing. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190506142741.41731-4-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-05-28 20:30:55 +02:00
Vladimir Sementsov-Ogievskiy	56e2f1d898	qcow2: add separate file for threaded data processing functions Move compression-on-threads to separate file. Encryption will be in it too. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190506142741.41731-3-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-05-28 20:30:55 +02:00
Vladimir Sementsov-Ogievskiy	9353db47c5	qcow2.h: add missing include qcow2.h depends on block_int.h. Compilation isn't broken currently only due to block_int.h always included before qcow2.h. Though, it seems better to directly include block_int.h in qcow2.h. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190506142741.41731-2-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-05-28 20:30:55 +02:00
Max Reitz	9c3db310ff	block/file-posix: Unaligned O_DIRECT block-status Currently, qemu crashes whenever someone queries the block status of an unaligned image tail of an O_DIRECT image: $ echo > foo $ qemu-img map --image-opts driver=file,filename=foo,cache.direct=on Offset Length Mapped to File qemu-img: block/io.c:2093: bdrv_co_block_status: Assertion `pnum && QEMU_IS_ALIGNED(pnum, align) && align > offset - aligned_offset' failed. This is because bdrv_co_block_status() checks that the result returned by the driver's implementation is aligned to the request_alignment, but file-posix can fail to do so, which is actually mentioned in a comment there: "[...] possibly including a partial sector at EOF". Fix this by rounding up those partial sectors. There are two possible alternative fixes: (1) We could refuse to open unaligned image files with O_DIRECT altogether. That sounds reasonable until you realize that qcow2 does necessarily not fill up its metadata clusters, and that nobody runs qemu-img create with O_DIRECT. Therefore, unpreallocated qcow2 files usually have an unaligned image tail. (2) bdrv_co_block_status() could ignore unaligned tails. It actually throws away everything past the EOF already, so that sounds reasonable. Unfortunately, the block layer knows file lengths only with a granularity of BDRV_SECTOR_SIZE, so bdrv_co_block_status() usually would have to guess whether its file length information is inexact or whether the driver is broken. Fixing what raw_co_block_status() returns is the safest thing to do. There seems to be no other block driver that sets request_alignment and does not make sure that it always returns aligned values. Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-05-20 17:08:57 +02:00
Kevin Wolf	9ff7f0df87	blockjob: Propagate AioContext change to all job nodes Block jobs require that all of the nodes the job is using are in the same AioContext. Therefore all BdrvChild objects of the job propagate .(can_)set_aio_context to all other job nodes, so that the switch is checked and performed consistently even if both nodes are in different subtrees. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-05-20 17:08:56 +02:00
Kevin Wolf	980b0f943a	block: Add blk_set_allow_aio_context_change() Some users (like block jobs) can tolerate an AioContext change for their BlockBackend. Add a function that tells the BlockBackend that it can allow changes. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-05-20 17:08:56 +02:00
Kevin Wolf	38475269d4	block: Implement .(can_)set_aio_ctx for BlockBackend bdrv_try_set_aio_context() currently fails if a BlockBackend is attached to a node because it doesn't implement the BdrvChildRole callbacks for AioContext management. We can allow changing the AioContext of monitor-owned BlockBackends as long as no device is attached to them. When setting the AioContext of the root node of a BlockBackend, we now need to pass blk->root as an ignored child because we don't want the root node to recursively call back into BlockBackend and execute blk_do_set_aio_context() a second time. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-05-20 17:08:56 +02:00
Alberto Garcia	41ae31e3d7	block: Use BDRV_REQUEST_MAX_BYTES instead of BDRV_REQUEST_MAX_SECTORS There are a few places in which we turn a number of bytes into sectors in order to compare the result against BDRV_REQUEST_MAX_SECTORS instead of using BDRV_REQUEST_MAX_BYTES directly. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-05-20 17:08:56 +02:00
Alberto Garcia	b6c246942b	qcow2: Define and use QCOW2_COMPRESSED_SECTOR_SIZE When an L2 table entry points to a compressed cluster the space used by the data is specified in 512-byte sectors. This size is independent from BDRV_SECTOR_SIZE and is specific to the qcow2 file format. The QCOW2_COMPRESSED_SECTOR_SIZE constant defined in this patch makes this explicit. Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-05-20 17:08:56 +02:00
Max Reitz	50ba5b2d99	block/file-posix: Truncate in xfs_write_zeroes() XFS_IOC_ZERO_RANGE does not increase the file length: $ touch foo $ xfs_io -c 'zero 0 65536' foo $ stat -c "size=%s, blocks=%b" foo size=0, blocks=128 We do want writes beyond the EOF to automatically increase the file length, however. This is evidenced by the fact that iotest 061 is broken on XFS since qcow2's check implementation checks for blocks beyond the EOF. Reported-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-05-20 17:08:56 +02:00
Peter Maydell	01807c8b0e	Miscellaneous patches for 2019-05-13 -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJc2RbiAAoJEDhwtADrkYZTfpAP/itXg+X9wRfMeerni6SAkgtz knaLtJNC7YvwH7W6TIkSf2QgMrg/nYzIXxxj6V+Y3Vdn4CD93C7ldItWkm33amlA SEhREHpAn4F3wg/LsOGYYgpuqF/wrUcZsmzudnslfMd3mM6Q9Q6J3q6mu8n1oRcl RRKlk++ElqyRdvmxFhddhPxk797Vuunh76vd3ARUFmzKs2n7CGkeBu+qbk41VqI9 YtjmWHO6BDY5b01PvjuWPir6n1yJsYcpfo3ZElZvPf5jQHMmO6fGB3SZc/PIWegq gAVeoXtwhNm+nywMpIv1wHQMkvRDZW0wrurIQBc4VGpH1Pa90dR9FNVZ8r0OZqPB aErPCdC7ED73uzJwzXKTnLxY0XDgdhsAsW7lFggANs6YyewZNcbDaVhZWsopTTK/ 3jBbddIw2RsfHNQgXlFVVzjZJGHBNHxFjAFASCKcapUWQwDKU42kQrS1GqxG56NI Lgi8Ce+Q0GsVF4wme96Oa/8EMRfmNvsHMfWQvmqGqA1OACSOf2PSGCeD618A5gq6 kV6wF4v5HdGFkc0x9Vr5ur7kv3eOhpzFzBM6XJXe3CyqnYrkNuBldkyGZBbrNY7G aW5sR26Is4m9i+7159cNB5LmnfQqtsscibkSC0UQiXcuWgevd6cdiF+0r1YuNp7C Faa2yPOHs4mHCjUwade9 =yKfZ -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/armbru/tags/pull-misc-2019-05-13' into staging Miscellaneous patches for 2019-05-13 # gpg: Signature made Mon 13 May 2019 08:04:02 BST # gpg: using RSA key 3870B400EB918653 # gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full] # gpg: aka "Markus Armbruster <armbru@pond.sub.org>" [full] # Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653 * remotes/armbru/tags/pull-misc-2019-05-13: Clean up decorations and whitespace around header guards Normalize header guard symbol definition. Clean up ill-advised or unusual header guards Clean up header guards that don't match their file name target/xtensa: Clean up core-isa.h header guards linux-user/nios2 linux-user/riscv: Clean up header guards authz: Normalize #include "authz/trace.h" to "trace.h" Use #include "..." for our own headers, <...> for others Clean up includes Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2019-05-13 13:55:13 +01:00
Markus Armbruster	a8b991b52d	Clean up ill-advised or unusual header guards Leading underscores are ill-advised because such identifiers are reserved. Trailing underscores are merely ugly. Strip both. Our header guards commonly end in _H. Normalize the exceptions. Done with scripts/clean-header-guards.pl. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190315145123.28030-7-armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> [Changes to slirp/ dropped, as we're about to spin it off]	2019-05-13 08:58:55 +02:00
Alberto Garcia	433e8e3b22	qcow2: Remove BDRVQcow2State.cluster_sectors The last user of this field disappeared when we replace the sector-based bdrv_write() with the byte-based bdrv_pwrite(). Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-05-10 16:45:40 +02:00
Alberto Garcia	2e11d7562a	block: Remove bdrv_read() and bdrv_write() No one is using these functions anymore, all callers have switched to the byte-based bdrv_pread() and bdrv_pwrite() Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-05-10 16:45:40 +02:00
Alberto Garcia	e5a0a6784a	vvfat: Replace bdrv_{read,write}() with bdrv_{pread,pwrite}() There's only a couple of bdrv_read() and bdrv_write() calls left in the vvfat code, and they can be trivially replaced with the byte-based bdrv_pread() and bdrv_pwrite(). Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-05-10 16:45:40 +02:00
Alberto Garcia	d4f189713f	vdi: Replace bdrv_{read,write}() with bdrv_{pread,pwrite}() There's only a couple of bdrv_read() and bdrv_write() calls left in the vdi code, and they can be trivially replaced with the byte-based bdrv_pread() and bdrv_pwrite(). Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-05-10 16:45:40 +02:00
Alberto Garcia	e3b4257d03	qcow2: Replace bdrv_write() with bdrv_pwrite() There's only one bdrv_write() call left in the qcow2 code, and it can be trivially replaced with the byte-based bdrv_pwrite(). Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-05-10 16:45:40 +02:00
Andrey Shinkevich	118f99442d	block/io.c: fix for the allocation failure On a file system used by the customer, fallocate() returns an error if the block is not properly aligned. So, bdrv_co_pwrite_zeroes() fails. We can handle that case the same way as it is done for the unsupported cases, namely, call to bdrv_driver_pwritev() that writes zeroes to an image for the unaligned chunk of the block. Suggested-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 1554474244-553661-1-git-send-email-andrey.shinkevich@virtuozzo.com Message-Id: <1554474244-553661-1-git-send-email-andrey.shinkevich@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-05-10 10:53:21 +01:00
Alberto Garcia	6a63419980	commit: Use bdrv_append() in commit_start() This function combines bdrv_set_backing_hd() and bdrv_replace_node() so we can use it to simplify the code a bit in commit_start(). Signed-off-by: Alberto Garcia <berto@igalia.com> Message-id: 20190403143748.9790-1-berto@igalia.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-05-07 17:14:21 +02:00
Max Reitz	21205c7c3b	block/ssh: Implement .bdrv_dirname() ssh_bdrv_dirname() is basically the generic bdrv_dirname(), except it takes care not to silently chop off any query string (i.e., host_key_check). Signed-off-by: Max Reitz <mreitz@redhat.com> Tested-by: Richard W.M. Jones <rjones@redhat.com> Message-id: 20190225190828.17726-3-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-05-07 17:14:21 +02:00
Max Reitz	b8c1f90118	block/ssh: Implement .bdrv_refresh_filename() This requires some changes to keep iotests 104 and 207 working. qemu-img info in 104 will now return a filename including the user name and the port, which need to be filtered by adjusting REMOTE_TEST_DIR in common.rc. This additional information has to be marked optional, however (which is simple as REMOTE_TEST_DIR is a regex), because otherwise 197 and 215 would fail: They use it (indirectly) to filter qemu-img create output which contains a backing filename they have passed to it -- which probably does not contain a user name or port number. The problem in 207 is a nice one to have: qemu-img info used to return json:{} filenames, but with this patch it returns nice plain ones. We now need to adjust the filtering to hide the user name (and port number while we are at it). The simplest way to do this is to include both in iotests.remote_filename() so that bdrv_refresh_filename() will not change it, and then iotests.img_info_log() will filter it correctly automatically. Signed-off-by: Max Reitz <mreitz@redhat.com> Tested-by: Richard W.M. Jones <rjones@redhat.com> Message-id: 20190225190828.17726-2-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-05-07 17:14:21 +02:00
Andrey Shinkevich	444b82369b	qcow2: discard bitmap when removed Bitmap data may take a lot of disk space, so it's better to discard it always. Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Message-id: 1551346019-293202-1-git-send-email-andrey.shinkevich@virtuozzo.com Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [mreitz: Use the commit message proposed by Vladimir] Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-05-07 17:14:21 +02:00
Vladimir Sementsov-Ogievskiy	54b10010eb	qcow2-refcount: don't mask corruptions under internal errors No reasons for not reporting found corruptions as corruptions in case of some internal errors, especially in case of just failed to fix l2 entry (and in this case, missed corruptions may influence comparing logic, when we calculate difference between corruptions fields of two results) Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190227131433.197063-6-vsementsov@virtuozzo.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-05-07 17:14:21 +02:00
Vladimir Sementsov-Ogievskiy	cbb51e9f93	qcow2-refcount: check_refcounts_l2: don't count fixed cluster as allocated Do not count a cluster which is fixed to be ZERO as allocated. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190227131433.197063-5-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-05-07 17:14:21 +02:00
Vladimir Sementsov-Ogievskiy	1ef337b7a0	qcow2-refcount: check_refcounts_l2: reduce ignored overlaps Reduce number of structures ignored in overlap check: when checking active table ignore active tables, when checking inactive table ignore inactive ones. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190227131433.197063-4-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-05-07 17:14:21 +02:00
Vladimir Sementsov-Ogievskiy	a5fff8d4b4	qcow2-refcount: avoid eating RAM qcow2_inc_refcounts_imrt() (through realloc_refcount_array()) can eat an unpredictable amount of memory on corrupted table entries, which are referencing regions far beyond the end of file. Prevent this, by skipping such regions from further processing. Interesting that iotest 138 checks exactly the behavior which we fix here. So, change the test appropriately. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190227131433.197063-3-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-05-07 17:14:21 +02:00
Vladimir Sementsov-Ogievskiy	7e3e736cbd	qcow2-refcount: fix check_oflag_copied Increase corruptions_fixed only after successful fix. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190227131433.197063-2-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-05-07 17:14:21 +02:00
Vladimir Sementsov-Ogievskiy	54277a2aab	block/qed: add missed coroutine_fn markers qed_read_table and qed_write_table use coroutine-only interfaces but are not marked coroutine_fn. Happily, they are called only from coroutine context, so we only need to add missed markers. Reported-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-04-30 15:29:00 +02:00
Alberto Garcia	065abf9f2b	commit: Make base read-only if there is an early failure You can reproduce this by passing an invalid filter-node-name (like "1234") to block-commit. In this case the base image is put in read-write mode but is never reset back to read-only. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-04-30 15:29:00 +02:00
Vladimir Sementsov-Ogievskiy	f4326aefcf	block/stream: use buffer-based io Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-04-30 15:29:00 +02:00
Vladimir Sementsov-Ogievskiy	08b6261f34	block/commit: use buffer-based io Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-04-30 15:29:00 +02:00
Vladimir Sementsov-Ogievskiy	607dbdc4e0	block/backup: use buffer-based io Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-04-30 15:29:00 +02:00
Vladimir Sementsov-Ogievskiy	a4072543cc	block/parallels: use buffer-based io Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-04-30 15:29:00 +02:00
Vladimir Sementsov-Ogievskiy	696e8cb292	block/qed: use buffer-based io Move to _co_ versions of io functions qed_read_table() and qed_write_table(), as we use qemu_co_mutex_unlock() anyway. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-04-30 15:29:00 +02:00
Vladimir Sementsov-Ogievskiy	4ed3e0c486	block/qcow: use buffer-based io Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-04-30 15:29:00 +02:00
Vladimir Sementsov-Ogievskiy	b00cb15bda	block/qcow2: use buffer-based io Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-04-30 15:29:00 +02:00
Alberto Garcia	e1f4a37a49	qcow2: Fix error handling in the compression code This patch fixes a few things in the way error codes are handled in the qcow2 compression code: a) qcow2_co_pwritev_compressed() expects qcow2_co_compress() to only return -1 or -2 on failure, but this is not correct. Since the change from qcow2_compress() to qcow2_co_compress() in commit `ceb029cd6f` the new code can also return -EINVAL (although there does not seem to exist any code path that would cause that error in the current implementation). b) -1 and -2 are ad-hoc error codes defined in qcow2_compress(). This patch replaces them with standard constants from errno.h. c) Both qcow2_compress() and qcow2_co_do_compress() return a negative value on failure, but qcow2_co_pwritev_compressed() stores the value in an unsigned data type. Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-04-30 15:29:00 +02:00
Kevin Wolf	db04524f82	qcow2: Fix qcow2_make_empty() with external data file make_completely_empty() is an optimisated path for bdrv_make_empty() where completely new metadata is created inside the image file instead of going through all clusters and discarding them. For an external data file, however, we actually need to do discard operations on the data file; just overwriting the qcow2 file doesn't get rid of the data. The necessary slow path with an explicit discard operation already exists for other cases. Use it for external data files, too. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2019-04-30 15:29:00 +02:00
Kevin Wolf	718c0fce2f	qcow2: Fix full preallocation with external data file preallocate_co() already gave the data file the full size without forwarding the requested preallocation mode to the protocol. When bdrv_co_truncate() was called later with the preallocation mode, the file didn't actually grow any more, so the data file stayed unallocated even if full preallocation was requested. Pass the right preallocation mode to preallocate_co() and remove the second bdrv_co_truncate() to fix this. As a side effect, the ugly one-byte write in preallocate_co() is replaced with a truncate call, now leaving the last block unallocated on the protocol level as it should be. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2019-04-30 15:29:00 +02:00
Kevin Wolf	360bd07471	qcow2: Add errp to preallocate_co() We'll add a bdrv_co_truncate() call in the next patch which can return an Error that we don't want to discard. So add an errp parameter to preallocate_co(). Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2019-04-30 15:29:00 +02:00
Kevin Wolf	f29fbf7c6b	qcow2: Avoid COW during metadata preallocation Limiting the allocation to INT_MAX bytes isn't particularly clever because it means that the final cluster will be a partial cluster which will be completed through a COW operation. This results in unnecessary data read and write requests which lead to an unwanted non-sparse filesystem block for metadata preallocation. Align the maximum allocation size down to the cluster size to avoid this situation. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2019-04-30 15:29:00 +02:00
Eric Blake	de38b5005e	qemu-img: Saner printing of large file sizes Disk sizes close to INT64_MAX cause overflow, for some pretty ridiculous output: $ ./nbdkit -U - memory size=$((2**63 - 512)) --run 'qemu-img info $nbd' image: nbd+unix://?socket=/tmp/nbdkitHSAzNz/socket file format: raw virtual size: -8388607T (9223372036854775296 bytes) disk size: unavailable But there's no reason to have two separate implementations of integer to human-readable abbreviation, where one has overflow and stops at 'T', while the other avoids overflow and goes all the way to 'E'. With this patch, the output now claims 8EiB instead of -8388607T, which really is the correct rounding of largest file size supported by qemu (we could go 511 bytes larger if we used byte-accurate sizing instead of rounding up to the next sector boundary, but that wouldn't change the human-readable result). Quite a few iotests need updates to expected output to match. Reported-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Tested-by: Richard W.M. Jones <rjones@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Tested-by: Max Reitz <mreitz@redhat.com>	2019-04-30 15:29:00 +02:00
Stefano Garzarella	0cb98af218	block/vhdx: Use IEC binary prefixes for size constants Using IEC binary prefixes in order to make the code more readable, with the exception of DEFAULT_LOG_SIZE because it's passed to stringify(). Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-04-30 15:29:00 +02:00
Stefano Garzarella	e9991e29ea	block/vhdx: Remove redundant IEC binary prefixes definition IEC binary prefixes are already defined in "qemu/units.h", so we can remove redundant definitions in "block/vhdx.h". Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-04-30 15:29:00 +02:00
Sam Eiderman	7502be838e	vmdk: Set vmdk parent backing_format to vmdk Commit `b69864e5a` ("vmdk: Support version=3 in VMDK descriptor files") fixed the probe function to correctly guess vmdk descriptors with version=3. This solves the issue where vmdk snapshot with parent vmdk descriptor containing "version=3" would be treated as raw instead vmdk. In the future case where a new vmdk version is introduced, we will again experience this issue, even if the user will provide "-f vmdk" it will only apply to the tip image and not to the underlying "misprobed" parent image. The code in vmdk.c already assumes that the backing file of vmdk must be vmdk (see vmdk_is_cid_valid which returns 0 if backing file is not vmdk). So let's make it official by supplying the backing_format as vmdk. Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Reviewed-By: Liran Alon <liran.alon@oracle.com> Reviewed-by: Arbel Moshe <arbel.moshe@oracle.com> Signed-off-by: Shmuel Eiderman <shmuel.eiderman@oracle.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <fam@euphon.net> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-04-30 15:29:00 +02:00
Zhengui li	126734c4f7	vpc: unlock Coroutine lock to make IO submit Concurrently Concurrent IO becomes serial IO because of the qemu Coroutine lock, which reduce IO performance severely. So unlock Coroutine lock before bdrv_co_pwritev and bdrv_co_preadv to fix it. Signed-off-by: Zhengui li <lizhengui@huawei.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-04-30 15:29:00 +02:00
Markus Armbruster	e1ce7d747b	block/qapi: Clean up how we print to monitor or stdout bdrv_snapshot_dump(), bdrv_image_info_specific_dump(), bdrv_image_info_dump() and their helpers take an fprintf()-like callback and a FILE * to pass to it. hmp.c passes monitor_printf() cast to fprintf_function and the current monitor cast to FILE *. qemu-img.c and qemu-io-cmds.c pass fprintf and stdout. The type-punning is technically undefined behaviour, but works in practice. Clean up: drop the callback, and call qemu_printf() instead. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20190417191805.28198-8-armbru@redhat.com>	2019-04-18 22:18:59 +02:00
Markus Armbruster	6b3048cee0	block/ssh: Do not report read/write/flush errors to the user Callbacks ssh_co_readv(), ssh_co_writev(), ssh_co_flush() report errors to the user with error_printf(). They shouldn't, it's their caller's job. Replace by a suitable trace point. While there, drop the unreachable !s->sftp case. Perhaps we should convert this part of the block driver interface to Error, so block drivers can pass more detail to their callers. Not today. Cc: "Richard W.M. Jones" <rjones@redhat.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Max Reitz <mreitz@redhat.com> Cc: qemu-block@nongnu.org Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20190417190641.26814-3-armbru@redhat.com>	2019-04-17 21:21:49 +02:00
Kevin Wolf	93e32b3e20	qcow2: Fix preallocation bdrv_pwrite to wrong file With an external data file, preallocate_co() must write the final byte to the external data file, not to the qcow2 image file. This is harmless for preallocation of newly created images (only the qcow2 file size is increased to the virtual disk size while it should be much smaller), but with preallocated resize, it could in theory cause visible corruption if the metadata of the image is larger than the data (e.g. lots of bitmaps). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2019-04-16 16:23:24 +02:00
Alberto Garcia	20509c4b8b	block: freeze the backing chain earlier in stream_start() Commit `6585493369` added code to freeze the backing chain from 'top' to 'base' for the duration of the block-stream job. The problem is that the freezing happens too late in stream_start(): during the bdrv_reopen_set_read_only() call earlier in that function another job can jump in and remove the base image. If that happens we have an invalid chain and QEMU crashes. This patch puts the bdrv_freeze_backing_chain() call at the beginning of the function. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-04-02 12:04:44 +02:00
Vladimir Sementsov-Ogievskiy	696aaaed57	block/file-posix: do not fail on unlock bytes bdrv_replace_child() calls bdrv_check_perm() with error_abort on loosening permissions. However file-locking operations may fail even in this case, for example on NFS. And this leads to Qemu crash. Let's avoid such errors. Note, that we ignore such things anyway on permission update commit and abort. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-04-02 12:04:44 +02:00
Stefano Garzarella	de23e72bb7	block/gluster: limit the transfer size to 512 MiB Several versions of GlusterFS (3.12? -> 6.0.1) fail when the transfer size is greater or equal to 1024 MiB, so we are limiting the transfer size to 512 MiB to avoid this rare issue. Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1691320 Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-04-02 12:04:44 +02:00
Eric Blake	75d34eb98c	nbd/client: Trace server noncompliance on structured reads Just as we recently added a trace for a server sending block status that doesn't match the server's advertised minimum block alignment, let's do the same for read chunks. But since qemu 3.1 is such a server (because it advertised 512-byte alignment, but when serving a file that ends in data but is not sector-aligned, NBD_CMD_READ would detect a mid-sector change between data and hole at EOF and the resulting read chunks are unaligned), we don't want to change our behavior of otherwise tolerating unaligned reads. Note that even though we fixed the server for 4.0 to advertise an actual block alignment (which gets rid of the unaligned reads at EOF for posix files), we can still trigger it via other means: $ qemu-nbd --image-opts driver=blkdebug,align=512,image.driver=file,image.filename=/path/to/non-aligned-file Arguably, that is a bug in the blkdebug block status function, for leaking a block status that is not aligned. It may also be possible to observe issues with a backing layer with smaller alignment than the active layer, although so far I have been unable to write a reliable iotest for that scenario. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190330165349.32256-1-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2019-04-01 08:58:04 -05:00
Eric Blake	4841211e0d	block: Add bdrv_get_request_alignment() The next patch needs access to a device's minimum permitted alignment, since NBD wants to advertise this to clients. Add an accessor function, borrowing from blk_get_max_transfer() for accessing a backend's block limits. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20190329042750.14704-6-eblake@redhat.com>	2019-04-01 08:46:52 -05:00
Eric Blake	9cf638508c	nbd/client: Support qemu-img convert from unaligned size If an NBD server advertises a size that is not a multiple of a sector, the block layer rounds up that size, even though we set info.size to the exact byte value sent by the server. The block layer then proceeds to let us read or query block status on the hole that it added past EOF, which the NBD server is unlikely to be happy with. Fortunately, qemu as a server never advertizes an unaligned size, so we generally don't run into this problem; but the nbdkit server makes it easy to test: $ printf %1000d 1 > f1 $ ~/nbdkit/nbdkit -fv file f1 & pid=$! $ qemu-img convert -f raw nbd://localhost:10809 f2 $ kill $pid $ qemu-img compare f1 f2 Pre-patch, the server attempts a 1024-byte read, which nbdkit rightfully rejects as going beyond its advertised 1000 byte size; the conversion fails and the output files differ (not even the first sector is copied, because qemu-img does not follow ddrescue's habit of trying smaller reads to get as much information as possible in spite of errors). Post-patch, the client's attempts to read (and query block status, for new enough nbdkit) are properly truncated to the server's length, with sane handling of the hole the block layer forced on us. Although f2 ends up as a larger file (1024 bytes instead of 1000), qemu-img compare shows the two images to have identical contents for display to the guest. I didn't add iotests coverage since I didn't want to add a dependency on nbdkit in iotests. I also did NOT patch write, trim, or write zeroes - these commands continue to fail (usually with ENOSPC, but whatever the server chose), because we really can't write to the end of the file, and because 'qemu-img convert' is the most common case where we care about being tolerant (which is read-only). Perhaps we could truncate the request if the client is writing zeros to the tail, but that seems like more work, especially if the block layer is fixed in 4.1 to track byte-accurate sizing (in which case this patch would be reverted as unnecessary). Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190329042750.14704-5-eblake@redhat.com> Tested-by: Richard W.M. Jones <rjones@redhat.com>	2019-04-01 08:32:44 -05:00
Eric Blake	a62a85ef5c	nbd/client: Report offsets in bdrv_block_status It is desirable for 'qemu-img map' to have the same output for a file whether it is served over file or nbd protocols. However, ever since we implemented block status for NBD (2.12), the NBD protocol forgot to inform the block layer that as the final layer in the chain, the offset is valid; without an offset, the human-readable form of qemu-img map gives up with the unhelpful: $ nbdkit -U - data data="1" size=512 --run 'qemu-img map $nbd' Offset Length Mapped to File qemu-img: File contains external, encrypted or compressed clusters. The --output=json form always works, because it is reporting the lower-level bdrv_block_status results directly rather than trying to filter out sparse ranges for human consumption - but now it also shows the offset member. With this patch, the human output changes to: Offset Length Mapped to File 0 0x200 0 nbd+unix://?socket=/tmp/nbdkitOxeoLa/socket This change is observable to several iotests. Fixes: `78a33ab5` Reported-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190329042750.14704-4-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2019-03-30 20:52:29 -05:00
Eric Blake	7da537f70d	nbd/client: Lower min_block for block-status, unaligned size We have a latent bug in our NBD client code, tickled by the brand new nbdkit 1.11.10 block status support: $ nbdkit --filter=log --filter=truncate -U - \ data data="1" size=511 truncate=64K logfile=/dev/stdout \ --run 'qemu-img convert $nbd /var/tmp/out' ... qemu-img: block/io.c:2122: bdrv_co_block_status: Assertion `pnum && QEMU_IS_ALIGNED(pnum, align) && align > offset - aligned_offset' failed. The culprit? Our implementation of .bdrv_co_block_status can return unaligned block status for any server that operates with a lower actual alignment than what we tell the block layer in request_alignment, in violation of the block layer's constraints. To date, we've been unable to trip the bug, because qemu as NBD server always advertises block sizing (at which point it is a server bug if the server sends unaligned status - although qemu 3.1 is such a server and I've sent separate patches for 4.0 both to get the server to obey the spec, and to let the client to tolerate server oddities at EOF). But nbdkit does not (yet) advertise block sizing, and therefore is not in violation of the spec for returning block status at whatever boundaries it wants, and those unaligned results can occur anywhere rather than just at EOF. While we are still wise to avoid sending sub-sector read/write requests to a server of unknown origin, we MUST consider that a server telling us block status without an advertised block size is correct. So, we either have to munge unaligned answers from the server into aligned ones that we hand back to the block layer, or we have to tell the block layer about a smaller alignment. Similarly, if the server advertises an image size that is not sector-aligned, we might as well assume that the server intends to let us access those tail bytes, and therefore supports a minimum block size of 1, regardless of whether the server supports block status (although we still need more patches to fix the problem that with an unaligned image, we can send read or block status requests that exceed EOF to the server). Again, qemu as server cannot trip this problem (because it rounds images to sector alignment), but nbdkit advertised unaligned size even before it gained block status support. Solve both alignment problems at once by using better heuristics on what alignment to report to the block layer when the server did not give us something to work with. Note that very few NBD servers implement block status (to date, only qemu and nbdkit are known to do so); and as the NBD spec mentioned block sizing constraints prior to documenting block status, it can be assumed that any future implementations of block status are aware that they must advertise block size if they want a minimum size other than 1. We've had a long history of struggles with picking the right alignment to use in the block layer, as evidenced by the commit message of `fd8d372d` (v2.12) that introduced the current choice of forced 512-byte alignment. There is no iotest coverage for this fix, because qemu can't provoke it, and I didn't want to make test 241 dependent on nbdkit. Fixes: `fd8d372d` Reported-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190329042750.14704-3-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Tested-by: Richard W.M. Jones <rjones@redhat.com>	2019-03-30 20:52:19 -05:00
Eric Blake	737d3f5244	nbd-client: Work around server BLOCK_STATUS misalignment at EOF The NBD spec is clear that a server that advertises a minimum block size should reply to NBD_CMD_BLOCK_STATUS with extents aligned accordingly. However, we know that the qemu NBD server implementation has had a corner-case bug where it is not compliant with the spec, present since the introduction of NBD_CMD_BLOCK_STATUS in qemu 2.12 (and unlikely to be patched in time for 4.0). Namely, when qemu is serving a file that is not a multiple of 512 bytes, it rounds the size advertised over NBD up to the next sector boundary (someday, I'd like to fix that to be byte-accurate, but it's a much bigger audit not appropriate for this release); yet if the final sector contains data prior to EOF, lseek(SEEK_HOLE) will point to the implicit hole mid-sector which qemu then reported over NBD. We are well within our rights to hang up on a server that can't follow the spec, but it is more useful to try and keep the connection alive in spite of the problem. Do so by tracing a message about the problem, and then either truncating the request back to an aligned boundary (if it covered more than the final sector) or widening it out to the full boundary with a forced status of data (since truncating would result in 0 bytes, but we have to make progress, and valid since data is a default-safe answer). And in practice, since the problem only happens on a sector that starts with data and ends with a hole, we are going to want to read that full sector anyway (where qemu as the server fills in the tail beyond EOF with appropriate NUL bytes). Easy reproduction: $ printf %1000d 1 > file $ qemu-nbd -f raw -t file & pid=$! $ qemu-img map --output=json -f raw nbd://localhost:10809 qemu-img: Could not read file metadata: Invalid argument $ kill $pid where the patched version instead succeeds with: [{ "start": 0, "length": 1024, "depth": 0, "zero": false, "data": true}] Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190326171317.4036-1-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2019-03-30 10:06:08 -05:00
Eric Blake	ebd82cd872	nbd: Permit simple error to NBD_CMD_BLOCK_STATUS The NBD spec is clear that when structured replies are active, a simple error reply is acceptable to any command except for NBD_CMD_READ. However, we were mistakenly requiring structured errors for NBD_CMD_BLOCK_STATUS, and hanging up on a server that gave a simple error (since qemu does not behave as such a server, we didn't notice the problem until now). Broken since its introduction in commit `78a33ab5` (v2.12). Noticed while debugging a separate failure reported by nbdkit while working out its initial implementation of BLOCK_STATUS, although it turns out that nbdkit also chose to send structured error replies for BLOCK_STATUS, so I had to manually provoke the situation by hacking qemu's server to send a simple error reply: \| diff --git i/nbd/server.c w/nbd/server.c \| index fd013a2817a..833288d7c45 100644 \| 00--- i/nbd/server.c \| +++ w/nbd/server.c \| @@ -2269,6 +2269,8 @@ static coroutine_fn int nbd_handle_request(NBDClient *client, \| "discard failed", errp); \| \| case NBD_CMD_BLOCK_STATUS: \| + return nbd_co_send_simple_reply(client, request->handle, ENOMEM, \| + NULL, 0, errp); \| if (!request->len) { \| return nbd_send_generic_reply(client, request->handle, -EINVAL, \| "need non-zero length", errp); \| Signed-off-by: Eric Blake <eblake@redhat.com> Acked-by: Richard W.M. Jones <rjones@redhat.com> Message-Id: <20190325190104.30213-3-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2019-03-30 10:06:08 -05:00
Eric Blake	b29f3a3d2a	nbd: Don't lose server's error to NBD_CMD_BLOCK_STATUS When the server replies with a (structured []) error to NBD_CMD_BLOCK_STATUS, without any extent information sent first, the client code was blindly throwing away the server's error code and instead telling the caller that EIO occurred. This has been broken since its introduction in `78a33ab5` (v2.12, where we should have called: error_setg(&local_err, "Server did not reply with any status extents"); nbd_iter_error(&iter, false, -EIO, &local_err); to declare the situation as a non-fatal error if no earlier error had already been flagged, rather than just blindly slamming iter.err and iter.ret), although it is more noticeable since commit `7f86068d`, which actually tries hard to preserve the server's code thanks to a separate iter.request_ret. [] The spec is clear that the server is also permitted to reply with a simple error, but that's a separate fix. I was able to provoke this scenario with a hack to the server, then seeing whether ENOMEM makes it back to the caller: \| diff --git a/nbd/server.c b/nbd/server.c \| index fd013a2817a..29c7995de02 100644 \| --- a/nbd/server.c \| +++ b/nbd/server.c \| @@ -2269,6 +2269,8 @@ static coroutine_fn int nbd_handle_request(NBDClient *client, \| "discard failed", errp); \| \| case NBD_CMD_BLOCK_STATUS: \| + return nbd_send_generic_reply(client, request->handle, -ENOMEM, \| + "no status for you today", errp); \| if (!request->len) { \| return nbd_send_generic_reply(client, request->handle, -EINVAL, \| "need non-zero length", errp); \| -- Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190325190104.30213-2-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2019-03-30 10:06:08 -05:00
Eric Blake	a39286dd61	nbd: Tolerate some server non-compliance in NBD_CMD_BLOCK_STATUS The NBD spec states that NBD_CMD_FLAG_REQ_ONE (which we currently always use) should not reply with an extent larger than our request, and that the server's response should be exactly one extent. Right now, that means that if a server sends more than one extent, we treat the server as broken, fail the block status request, and disconnect, which prevents all further use of the block device. But while good software should be strict in what it sends, it should be tolerant in what it receives. While trying to implement NBD_CMD_BLOCK_STATUS in nbdkit, we temporarily had a non-compliant server sending too many extents in spite of REQ_ONE. Oddly enough, 'qemu-img convert' with qemu 3.1 failed with a somewhat useful message: qemu-img: Protocol error: invalid payload for NBD_REPLY_TYPE_BLOCK_STATUS which then disappeared with commit `d8b4bad8`, on the grounds that an error message flagged only at the time of coroutine teardown is pointless, and instead we should rely on the actual failed API to report an error - in other words, the 3.1 behavior was masking the fact that qemu-img was not reporting an error. That has since been fixed in the previous patch, where qemu-img convert now fails with: qemu-img: error while reading block status of sector 0: Invalid argument But even that is harsh. Since we already partially relaxed things in commit `acfd8f7a` to tolerate a server that exceeds the cap (although that change was made prior to the NBD spec actually putting a cap on the extent length during REQ_ONE - in fact, the NBD spec change was BECAUSE of the qemu behavior prior to that commit), it's not that much harder to argue that we should also tolerate a server that sends too many extents. But at the same time, it's nice to trace when we are being tolerant of server non-compliance, in order to help server writers fix their implementations to be more portable (if they refer to our traces, rather than just stderr). Reported-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190323212639.579-3-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2019-03-30 10:06:08 -05:00
Kevin Wolf	738301e117	file-posix: Support BDRV_REQ_NO_FALLBACK for zero writes We know that the kernel implements a slow fallback code path for BLKZEROOUT, so if BDRV_REQ_NO_FALLBACK is given, we shouldn't call it. The other operations we call in the context of .bdrv_co_pwrite_zeroes should usually be quick, so no modification should be needed for them. If we ever notice that there are additional problematic cases, we can still make these conditional as well. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Eric Blake <eblake@redhat.com>	2019-03-26 11:37:51 +01:00
Kevin Wolf	80f5c33ff3	block: Advertise BDRV_REQ_NO_FALLBACK in filter drivers Filter drivers that support .bdrv_co_pwrite_zeroes can safely advertise BDRV_REQ_NO_FALLBACK because they just forward the request flags to their child node. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Eric Blake <eblake@redhat.com>	2019-03-26 11:37:51 +01:00
Kevin Wolf	fe0480d629	block: Add BDRV_REQ_NO_FALLBACK For qemu-img convert, we want an operation that zeroes out the whole image if this can be done efficiently, but that returns an error otherwise so we don't write explicit zeroes and immediately overwrite them with the real data, potentially doubling the amount of data to be written. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Eric Blake <eblake@redhat.com>	2019-03-26 11:37:51 +01:00
Kevin Wolf	48ce986096	block: Remove error messages in bdrv_make_zero() There is only a single caller of bdrv_make_zero(), which is qemu-img convert. If the function fails, we just fall back to a different method of zeroing out blocks on the target image. There is no good reason to print error messages on stderr when the higher level operation will actually succeed. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Eric Blake <eblake@redhat.com>	2019-03-26 11:37:51 +01:00
Markus Armbruster	a9779a3ab0	trace-events: Delete unused trace points Tracked down with cleanup-trace-events.pl. Funnies requiring manual post-processing: * block.c and blockdev.c trace points are in block/trace-events. * hw/block/nvme.c uses the preprocessor to hide its trace point use from cleanup-trace-events.pl. * include/hw/xen/xen_common.h trace points are in hw/xen/trace-events. * net/colo-compare and net/filter-rewriter.c use pseudo trace points colo_compare_udp_miscompare and colo_filter_rewriter_debug to guard debug code. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-id: 20190314180929.27722-5-armbru@redhat.com Message-Id: <20190314180929.27722-5-armbru@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-03-22 16:18:07 +00:00
Markus Armbruster	500016e5db	trace-events: Shorten file names in comments We spell out sub/dir/ in sub/dir/trace-events' comments pointing to source files. That's because when trace-events got split up, the comments were moved verbatim. Delete the sub/dir/ part from these comments. Gets rid of several misspellings. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20190314180929.27722-3-armbru@redhat.com Message-Id: <20190314180929.27722-3-armbru@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-03-22 16:18:07 +00:00
Alberto Garcia	782b9d06bf	block: Make bdrv_{copy_on_read,crypto_luks,replication} static Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-19 15:49:29 +01:00
Sam Eiderman	b69864e5a8	vmdk: Support version=3 in VMDK descriptor files Commit `509d39aa22` added support for read only VMDKs of version 3. This commit fixes the probe function to correctly handle descriptors of version 3. This commit has two effects: 1. We no longer need to supply '-f vmdk' when pointing to descriptor files of version 3 in qemu/qemu-img command line arguments. 2. This fixes the scenario where a VMDK points to a parent version 3 descriptor file which is being probed as "raw" instead of "vmdk". Reviewed-by: Arbel Moshe <arbel.moshe@oracle.com> Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Signed-off-by: Shmuel Eiderman <shmuel.eiderman@oracle.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-19 15:49:29 +01:00
Kevin Wolf	a0cf83639c	qcow2: Fix data file error condition in qcow2_co_create() We were trying to check whether bdrv_open_blockdev_ref() returned success, but accidentally checked the wrong variable. Spotted by Coverity (CID 1399703). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>	2019-03-19 15:49:29 +01:00
Sergio Lopez	5e771752a1	mirror: Confirm we're quiesced only if the job is paused or cancelled While child_job_drained_begin() calls to job_pause(), the job doesn't actually transition between states until it runs again and reaches a pause point. This means bdrv_drained_begin() may return with some jobs using the node still having 'busy == true'. As a consequence, block_job_detach_aio_context() may get into a deadlock, waiting for the job to be actually paused, while the coroutine servicing the job is yielding and doesn't get the opportunity to get scheduled again. This situation can be reproduced by issuing a 'block-commit' immediately followed by a 'device_del'. To ensure bdrv_drained_begin() only returns when the jobs have been paused, we change mirror_drained_poll() to only confirm it's quiesced when job->paused == true and there aren't any in-flight requests, except if we reached that point by a drained section initiated by the mirror/commit job itself. The other block jobs shouldn't need any changes, as the default drained_poll() behavior is to only confirm it's quiesced if the job is not busy or completed. Signed-off-by: Sergio Lopez <slp@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-19 15:49:29 +01:00
Peter Maydell	dbbc277510	Pull request * Add 'drop-cache=on\|off' option to file-posix.c. The default is on. Disabling the option fixes a QEMU 3.0.0 performance regression when live migrating on the same host with cache.direct=off. -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJciOSEAAoJEJykq7OBq3PIVSUIAI6r2Mgoi+no4nle8Jf2nZ+W EnQXnNEFyJA0lKRtqQ2UILD9udVdKd/L1PZu5k/Il/Ralto9Yf3+62brekI7rsss c3Qusu4LUK6jom2RslRjRIaJ9GilQi/jWezKV/O0VlcsMVemgVHX008EIR+ea1U4 H0/u2kfu04PciKQ5MR2+6aacu9bfmyH1yM2no+aMN5dDu/38PV6JEsf0Zl2agowg opGepJ7YiDQsxH9IBXrbfm38mBrrY0K2vFzAb9BzTHfBPotGMNIZNJNM2FChRfoM sTjOIpZz3NDwPEUPQPZxp+7YKRFFYfse1oHtpyh4n1rMQksB019SCGlP9TBhrF0= =CH5G -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging Pull request * Add 'drop-cache=on\|off' option to file-posix.c. The default is on. Disabling the option fixes a QEMU 3.0.0 performance regression when live migrating on the same host with cache.direct=off. # gpg: Signature made Wed 13 Mar 2019 11:07:48 GMT # gpg: using RSA key 9CA4ABB381AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" [full] # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" [full] # Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8 * remotes/stefanha/tags/block-pull-request: file-posix: add drop-cache=on\|off option Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2019-03-14 09:34:51 +00:00
Peter Maydell	523a2a42c3	Pull request -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE+ber27ys35W+dsvQfe+BBqr8OQ4FAlyIFSwACgkQfe+BBqr8 OQ7wghAAm16eCEr57oTO7QXR3y8uVFsKqXBn9cNH6nbrFp2PUQSglwMDKBls1Z5m olF23X/JaqSlSmkL9BBuzDZ6Up+kkHKuxPq4/5RKXfiDI0pr3R0eqts0COAlaN9q Bew3ipj99m8gzMi2093AW4+Ob0N3658fuDTGLe1M1Uoy7CEg1QJ7rVOBBEui7vIl RbZ8l/Zmb4ldNpB3lnE4Nh9ue8fy0RAj3Nai161nCnNeXNF/VzD3Ye8bojSBbnux PIMX6/RWmykX4feIf9QP8apDpxX4HkyuPq5EdwT9PD8PwdyXPAXZtsYUNCuNtQuk n5VKFVgFYgqUclBeVHmrMYPU4K4iCFQp4/Fua7wzPEC0iG05NiiDv91oVkEJCp3L ManHeuGfNLCcXaIntKZhuJl1cK8yMM3yDww6/pPTehrPjcyvKa0NOqhQBExektcD R6q7maJRzFaxSxdcs+Zzuog9zESvH1mlJxQCKzeYhAP0kkxInyTELE/Vbx37xuqR RFfZYyVQ6x87Q/sxHx4EMiV97WUM8elZOQdSEC/okt5WUUNpgIu0WF9nSQ1VKZ8C CZmv5xh9ogfwvB/kOm6IVwNkLvVagJQcLwddORI5LLXLbSIUcuwVSuyMp/7iDtQ/ hnHkGs2mIJ2JUYbSSNsSJNs6oTurn8eTFCeGoYKJgd9l4QxaThU= =ekU+ -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/jnsnow/tags/bitmaps-pull-request' into staging Pull request # gpg: Signature made Tue 12 Mar 2019 20:23:08 GMT # gpg: using RSA key F9B7ABDBBCACDF95BE76CBD07DEF8106AAFC390E # gpg: Good signature from "John Snow (John Huston) <jsnow@redhat.com>" [full] # Primary key fingerprint: FAEB 9711 A12C F475 812F 18F2 88A9 064D 1835 61EB # Subkey fingerprint: F9B7 ABDB BCAC DF95 BE76 CBD0 7DEF 8106 AAFC 390E * remotes/jnsnow/tags/bitmaps-pull-request: (22 commits) tests/qemu-iotests: add bitmap resize test 246 block/qcow2-bitmap: Allow resizes with persistent bitmaps block/qcow2-bitmap: Don't check size for IN_USE bitmap docs/interop/qcow2: Improve bitmap flag in_use specification bitmaps: Fix typo in function name block/dirty-bitmaps: implement inconsistent bit block/dirty-bitmaps: disallow busy bitmaps as merge source block/dirty-bitmaps: prohibit removing readonly bitmaps block/dirty-bitmaps: prohibit readonly bitmaps for backups block/dirty-bitmaps: add block_dirty_bitmap_check function block/dirty-bitmap: add inconsistent status block/dirty-bitmaps: add inconsistent bit iotests: add busy/recording bit test to 124 blockdev: remove unused paio parameter documentation block/dirty-bitmaps: move comment block block/dirty-bitmaps: unify qmp_locked and user_locked calls block/dirty-bitmap: explicitly lock bitmaps with successors nbd: change error checking order for bitmaps block/dirty-bitmap: change semantics of enabled predicate block/dirty-bitmap: remove set/reset assertions against enabled bit ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org> # Conflicts: # tests/qemu-iotests/group	2019-03-13 17:30:34 +00:00
Stefan Hajnoczi	f357fcd890	file-posix: add drop-cache=on\|off option Commit `dd577a26ff` ("block/file-posix: implement bdrv_co_invalidate_cache() on Linux") introduced page cache invalidation so that cache.direct=off live migration is safe on Linux. The invalidation takes a significant amount of time when the file is large and present in the page cache. Normally this is not the case for cross-host live migration but it can happen when migrating between QEMU processes on the same host. On same-host migration we don't need to invalidate pages for correctness anyway, so an option to skip page cache invalidation is useful. I investigated optimizing invalidation and detecting same-host migration, but both are hard to achieve so a user-visible option will suffice. As a bonus this option means that the cache invalidation feature will now be detectable by libvirt via QMP schema introspection. Suggested-by: Neil Skrypuch <neil@tembosocial.com> Tested-by: Neil Skrypuch <neil@tembosocial.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20190307164941.3322-1-stefanha@redhat.com Message-Id: <20190307164941.3322-1-stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-03-13 10:54:55 +00:00
Alberto Garcia	5019aece2a	block: Remove the AioContext parameter from bdrv_reopen_multiple() This parameter has been unused since `1a63a90750` Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-12 20:30:14 +01:00
Alberto Garcia	8a2ce0bc1e	block: Add a 'mutable_opts' field to BlockDriver If we reopen a BlockDriverState and there is an option that is present in bs->options but missing from the new set of options then we have to return an error unless the driver is able to reset it to its default value. This patch adds a new 'mutable_opts' field to BlockDriver. This is a list of runtime options that can be modified during reopen. If an option in this list is unspecified on reopen then it must be reset (or return an error). Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-12 20:30:14 +01:00
Alberto Garcia	077e8e2018	block: Add 'keep_old_opts' parameter to bdrv_reopen_queue() The bdrv_reopen_queue() function is used to create a queue with the BDSs that are going to be reopened and their new options. Once the queue is ready bdrv_reopen_multiple() is called to perform the operation. The original options from each one of the BDSs are kept, with the new options passed to bdrv_reopen_queue() applied on top of them. For "x-blockdev-reopen" we want a function that behaves much like "blockdev-add". We want to ignore the previous set of options so that only the ones actually specified by the user are applied, with the rest having their default values. One of the things that we need is a way to tell bdrv_reopen_queue() whether we want to keep the old set of options or not, and that's what this patch does. All current callers are setting this new parameter to true and x-blockdev-reopen will set it to false. Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-12 20:30:14 +01:00
Alberto Garcia	6585493369	block: Freeze the backing chain for the duration of the stream job Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-12 20:30:14 +01:00
Alberto Garcia	ef53dc09ed	block: Freeze the backing chain for the duration of the mirror job Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-12 20:30:14 +01:00
Alberto Garcia	df827336ab	block: Freeze the backing chain for the duration of the commit job Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-12 20:30:14 +01:00
Kevin Wolf	23dece19da	file-posix: Make auto-read-only dynamic Until now, with auto-read-only=on we tried to open the file read-write first and if that failed, read-only was tried. This is actually not good enough for libvirt, which gives QEMU SELinux permissions for read-write only as soon as it actually intends to write to the image. So we need to be able to switch between read-only and read-write at runtime. This patch makes auto-read-only dynamic, i.e. the file is opened read-only as long as no user of the node has requested write permissions, but it is automatically reopened read-write as soon as the first writer is attached. Conversely, if the last writer goes away, the file is reopened read-only again. bs->read_only is no longer set for auto-read-only=on files even if the file descriptor is opened read-only because it will be transparently upgraded as soon as a writer is attached. This changes the output of qemu-iotests 232. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-12 20:30:14 +01:00
Kevin Wolf	6ceabe6f77	file-posix: Prepare permission code for fd switching In order to be able to dynamically reopen the file read-only or read-write, depending on the users that are attached, we need to be able to switch to a different file descriptor during the permission change. This interacts with reopen, which also creates a new file descriptor and performs permission changes internally. In this case, the permission change code must reuse the reopen file descriptor instead of creating a third one. In turn, reopen can drop its code to copy file locks to the new file descriptor because that is now done when applying the new permissions. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-12 20:30:14 +01:00
Kevin Wolf	a6aeca0ca5	file-posix: Lock new fd in raw_reopen_prepare() There is no reason why we can take locks on the new file descriptor only in raw_reopen_commit() where error handling isn't possible any more. Instead, we can already do this in raw_reopen_prepare(). Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-12 20:30:14 +01:00
Kevin Wolf	e0c9cf3a48	file-posix: Store BDRVRawState.reopen_state during reopen We'll want to access the file descriptor in the reopen_state while processing permission changes in the context of the repoen. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-12 20:30:14 +01:00
Kevin Wolf	5cec287025	file-posix: Factor out raw_reconfigure_getfd() Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-12 20:30:14 +01:00
Vladimir Sementsov-Ogievskiy	cb8aac3783	qapi: drop x- from x-block-latency-histogram-set Drop x- and x_ prefixes for latency histograms and update version to 4.0 Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-12 20:30:08 +01:00
John Snow	d19c6b36ff	block/qcow2-bitmap: Allow resizes with persistent bitmaps Since we now load all bitmaps into memory anyway, we can just truncate them in-memory and then flush them back to disk. Just in case, we will still check and enforce that this shortcut is valid -- i.e. that any bitmap described on-disk is indeed in-memory and can be modified. If there are any inconsistent bitmaps, we refuse to allow the truncate as we do not actually load these bitmaps into memory, and it isn't safe or reasonable to attempt to truncate corrupted data. Signed-off-by: John Snow <jsnow@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190311185147.52309-4-vsementsov@virtuozzo.com [vsementsov: drop bitmap flushing, fix block comments style] Signed-off-by: John Snow <jsnow@redhat.com>	2019-03-12 14:57:38 -04:00
Vladimir Sementsov-Ogievskiy	bf5f0cf5d8	block/qcow2-bitmap: Don't check size for IN_USE bitmap We are going to allow image resize when there are persistent bitmaps. It may lead to appearing of inconsistent bitmaps (IN_USE=1) with inconsistent size. But we still want to load them as inconsistent. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190311185147.52309-3-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-03-12 14:50:28 -04:00
Eric Blake	796a3798ab	bitmaps: Fix typo in function name Commit `a88b179f` introduced the ability to set and query bitmap persistence, but with an atypical spelling. Signed-off-by: Eric Blake <eblake@redhat.com> Message-id: 20190308205845.25734-1-eblake@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-03-12 12:05:49 -04:00
John Snow	74da6b9435	block/dirty-bitmaps: implement inconsistent bit Set the inconsistent bit on load instead of rejecting such bitmaps. There is no way to un-set it; the only option is to delete the bitmap. Obvervations: - bitmap loading does not need to update the header for in_use bitmaps. - inconsistent bitmaps don't need to have their data loaded; they're glorified corruption sentinels. - bitmap saving does not need to save inconsistent bitmaps back to disk. - bitmap reopening DOES need to drop the readonly flag from inconsistent bitmaps to allow reopening of qcow2 files with non-qemu-owned bitmaps being eventually flushed back to disk. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20190301191545.8728-8-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-03-12 12:05:49 -04:00
John Snow	cb8e58e3de	block/dirty-bitmaps: disallow busy bitmaps as merge source We didn't do any state checking on source bitmaps at all, so this adds inconsistent and busy checks. readonly is allowed, so you can still copy a readonly bitmap to a new destination to use it for operations like drive-backup. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190301191545.8728-7-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-03-12 12:05:49 -04:00
John Snow	3ae96d6684	block/dirty-bitmaps: add block_dirty_bitmap_check function Instead of checking against busy, inconsistent, or read only directly, use a check function with permissions bits that let us streamline the checks without reproducing them in many places. Included in this patch are permissions changes that simply add the inconsistent check to existing permissions call spots, without addressing existing bugs. In general, this means that busy+readonly checks become BDRV_BITMAP_DEFAULT, which checks against all three conditions. busy-only checks become BDRV_BITMAP_ALLOW_RO. Notably, remove allows inconsistent bitmaps, so it doesn't follow the pattern. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190301191545.8728-4-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-03-12 12:05:49 -04:00
John Snow	0064cfefa4	block/dirty-bitmap: add inconsistent status Even though the status field is deprecated, we still have to support it for a few more releases. Since this is a very new kind of bitmap state, it makes sense for it to have its own status field. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190301191545.8728-3-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-03-12 12:05:49 -04:00
John Snow	b0f455599d	block/dirty-bitmaps: add inconsistent bit Add an inconsistent bit to dirty-bitmaps that allows us to report a bitmap as persistent but potentially inconsistent, i.e. if we find bitmaps on a qcow2 that have been marked as "in use". Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190301191545.8728-2-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-03-12 12:05:49 -04:00
John Snow	1e6fddcd6f	block/dirty-bitmaps: move comment block Simply move the big status enum comment block to above the status function, and document it as being deprecated. The whole confusing block can get deleted in three releases time. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190223000614.13894-9-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-03-12 12:05:48 -04:00
John Snow	27a1b301a4	block/dirty-bitmaps: unify qmp_locked and user_locked calls These mean the same thing now. Unify them and rename the merged call bdrv_dirty_bitmap_busy to indicate semantically what we are describing, as well as help disambiguate from the various _locked and _unlocked versions of bitmap helpers that refer to mutex locks. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190223000614.13894-8-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-03-12 12:05:48 -04:00
John Snow	21d2376f26	block/dirty-bitmap: explicitly lock bitmaps with successors Instead of implying a user_locked/busy status, make it explicit. Now, bitmaps in use by migration, NBD or backup operations are all treated the same way with the same code paths. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190223000614.13894-7-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-03-12 12:05:48 -04:00
John Snow	8b2e20f64f	block/dirty-bitmap: change semantics of enabled predicate Currently, the enabled predicate means something like: "the QAPI status of the bitmap is ACTIVE." After this patch, it should mean exclusively: "This bitmap is recording guest writes, and is allowed to do so." In many places, this is how this predicate was already used. Internal usages of the bitmap QPI can call user_locked to find out if the bitmap is in use by an operation. To accommodate this, modify the create_successor routine to now explicitly disable the parent bitmap at creation time. Justifications: 1. bdrv_dirty_bitmap_status suffers no change from the lack of 1:1 parity with the new predicates because of the order in which the predicates are checked. This is now only for compatibility. 2. bdrv_set_dirty() is unchanged: pre-patch, it was skipping bitmaps that were disabled or had a successor, while post-patch it is only skipping bitmaps that are disabled. To accommodate this, create_successor now ensures that any bitmap with a successor is explicitly disabled. 3. qcow2_store_persistent_dirty_bitmaps: No functional change. This function cares only about the literal enabled bit, and makes no effort to check if the bitmap is in-use or not. After this patch there are still no ways to produce an enabled bitmap with a successor. 4. block_dirty_bitmap_enable_prepare block_dirty_bitmap_disable_prepare init_dirty_bitmap_migration nbd_export_new These functions care about the literal enabled bit, and already check user_locked separately. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190223000614.13894-5-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-03-12 12:05:48 -04:00
John Snow	c28ddbb07e	block/dirty-bitmap: remove set/reset assertions against enabled bit bdrv_set_dirty_bitmap and bdrv_reset_dirty_bitmap are only used as an internal API by the mirror and migration areas of our code. These calls modify the bitmap, but do so at the behest of QEMU and not the guest. Presently, these bitmaps are always "enabled" anyway, but there's no reason they have to be. Modify these internal APIs to drop this assertion. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190223000614.13894-4-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-03-12 12:05:48 -04:00
John Snow	50a47257f8	block/dirty-bitmaps: rename frozen predicate helper "Frozen" was a good description a long time ago, but it isn't adequate now. Rename the frozen predicate to has_successor to make the semantics of the predicate more clear to outside callers. In the process, remove some calls to frozen() that no longer semantically make sense. For bdrv_enable_dirty_bitmap_locked and bdrv_disable_dirty_bitmap_locked, it doesn't make sense to prohibit QEMU internals from performing this action when we only wished to prohibit QMP users from issuing these commands. All of the QMP API commands for bitmap manipulation already check against user_locked() to prohibit these actions. Several other assertions really want to check that the bitmap isn't in-use by another operation -- use the bitmap_user_locked function for this instead, which presently also checks for has_successor. This leaves some redundant checks of has_successor through different helpers that are addressed in forthcoming patches. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190223000614.13894-3-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-03-12 12:05:48 -04:00
John Snow	4db6ceb0b5	block/dirty-bitmap: add recording and busy properties The current API allows us to report a single status, which we've defined as: Frozen: has a successor, treated as qmp_locked, may or may not be enabled. Locked: no successor, qmp_locked. may or may not be enabled. Disabled: Not frozen or locked, disabled. Active: Not frozen, locked, or disabled. The problem is that both "Frozen" and "Locked" mean nearly the same thing, and that both of them do not intuit whether they are recording guest writes or not. This patch deprecates that status field and introduces two orthogonal properties instead to replace it. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190223000614.13894-2-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-03-12 12:05:48 -04:00
Niels de Vos	0e3b891fef	gluster: the glfs_io_cbk callback function pointer adds pre/post stat args The glfs_*_async() functions do a callback once finished. This callback has changed its arguments, pre- and post-stat structures have been added. This makes it possible to improve caching, which is useful for Samba and NFS-Ganesha, but not so much for QEMU. Gluster 6 is the first release that includes these new arguments. With an additional detection in ./configure, the new arguments can conditionally get included in the glfs_io_cbk handler. Signed-off-by: Niels de Vos <ndevos@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-12 14:26:49 +01:00
Prasanna Kumar Kalever	e014dbe74e	gluster: Handle changed glfs_ftruncate signature New versions of Glusters libgfapi.so have an updated glfs_ftruncate() function that returns additional 'struct stat' structures to enable advanced caching of attributes. This is useful for file servers, not so much for QEMU. Nevertheless, the API has changed and needs to be adopted. Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com> Signed-off-by: Niels de Vos <ndevos@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-12 14:26:49 +01:00
Philippe Mathieu-Daudé	d4cef0c67c	block/iscsi: Restrict Linux-specific code Some Linux specific code is missing guards, leading to build failure on OSX: $ sudo brew install libiscsi $ ./configure && make [...] CC block/iscsi.o qemu/block/iscsi.c:338:24: error: 'iscsi_aiocb_info' defined but not used [-Werror=unused-const-variable=] static const AIOCBInfo iscsi_aiocb_info = { ^~~~~~~~~~~~~~~~ qemu/block/iscsi.c:168:1: error: 'iscsi_schedule_bh' defined but not used [-Werror=unused-function] iscsi_schedule_bh(IscsiAIOCB *acb) ^~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors Add guards to restrict this code for Linux. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20190220000553.28438-1-philmd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-03-11 16:33:49 +01:00
Kevin Wolf	6c3944dc62	qcow2: Implement data-file-raw create option Provide an option to force QEMU to always keep the external data file consistent as a standalone read-only raw image. At the moment, this means making sure that write_zeroes requests are forwarded to the data file instead of just updating the metadata, and checking that no backing file is used. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-08 12:26:46 +01:00
Kevin Wolf	9b890bdcb6	qcow2: Store data file name in the image Rather than requiring that the external data file node is passed explicitly when creating the qcow2 node, store the filename in the designated header extension during .bdrv_create and read it from there as a default during .bdrv_open. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-08 12:26:46 +01:00
Kevin Wolf	dcc98687f8	qcow2: Creating images with external data file This adds a .bdrv_create option to use an external data file. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-08 12:26:46 +01:00
Kevin Wolf	0e8c08be27	qcow2: Add basic data-file infrastructure This adds a .bdrv_open option to specify the external data file node. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-08 12:26:46 +01:00
Kevin Wolf	e9f5b6deaa	qcow2: Support external data file in qemu-img check For external data files, data clusters must be excluded from the refcount calculations. Instead, an implicit refcount of 1 is assumed for the COPIED flag. Compressed clusters and internal snapshots are incompatible with external data files, so print an error if they are in use for images with an external data file. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-08 12:26:46 +01:00
Kevin Wolf	aa8b34c1b2	qcow2: Return error for snapshot operation with data file Internal snapshots and an external data file are incompatible because snapshots require refcounting and non-linear mapping. Return an error for all of the snapshot operations if an external data file is in use. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-08 12:26:46 +01:00
Kevin Wolf	966b000f49	qcow2: External file I/O This changes the qcow2 implementation to direct all guest data I/O to s->data_file rather than bs->file, while metadata I/O still uses bs->file. At the moment, this is still always the same, but soon we'll add options to set s->data_file to an external data file. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-08 12:26:46 +01:00
Kevin Wolf	37be14036b	qcow2: Prepare qcow2_co_block_status() for data file Offset 0 cannot be assumed to mean an unallocated cluster any more. Instead, the cluster type needs to be checked. *file must refer to the data file instead of the image file if a valid offset is returned from qcow2_co_block_status(). Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-08 12:26:45 +01:00
Kevin Wolf	77e023ff79	qcow2: Return 0/-errno in qcow2_alloc_compressed_cluster_offset() qcow2_alloc_compressed_cluster_offset() used to return the cluster offset for success and 0 for error. This doesn't only conflict with 0 as a valid host offset, but also loses the error code. Similar to the change made to qcow2_alloc_cluster_offset() for uncompressed clusters in commit `148da7ea9d`, make the function return 0/-errno and return the allocated cluster offset in a by-reference parameter. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-08 12:26:45 +01:00
Kevin Wolf	c6d619cc12	qcow2: Don't assume 0 is an invalid cluster offset The cluster allocation code uses 0 as an invalid offset that is used in case of errors or as "offset not yet determined". With external data files, a host cluster offset of 0 becomes valid, though. Define a constant INV_OFFSET (which is not cluster aligned and will therefore never be a valid offset) that can be used for such purposes. This removes the additional host_offset == 0 check that commit `ff52aab2df` introduced; the confusion between an invalid offset and (erroneous) allocation at offset 0 is removed with this change. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-08 12:26:45 +01:00
Kevin Wolf	b8c8353a38	qcow2: Prepare count_contiguous_clusters() for external data file Offset 0 can be valid for normal (allocated) clusters now, so use qcow2_get_cluster_type() instead. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-08 12:26:45 +01:00
Kevin Wolf	a4ea184d8a	qcow2: Prepare qcow2_get_cluster_type() for external data file Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-08 12:26:45 +01:00
Kevin Wolf	808c2bb4c4	qcow2: Pass bs to qcow2_get_cluster_type() Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-08 12:26:45 +01:00
Kevin Wolf	93c2493646	qcow2: Basic definitions for external data files This adds basic constants, struct fields and helper function for external data file support to the implementation. QCOW2_INCOMPAT_MASK and QCOW2_AUTOCLEAR_MASK are not updated yet so that opening images with an external data file still fails (we don't handle them correctly yet). Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-08 12:26:45 +01:00
Kevin Wolf	c5e86ebc11	qcow2: Simplify preallocation code Image creation already involves a bdrv_co_truncate() call, which allows to specify a preallocation mode. Just pass the right mode there and remove the code that is made redundant by this. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-08 12:26:45 +01:00
Alberto Garcia	af39bd0d9a	qcow2: Default to 4KB for the qcow2 cache entry size QEMU 2.12 (commit `1221fe6f63`) introduced a new setting called l2-cache-entry-size that allows making entries on the qcow2 L2 cache smaller than the cluster size. I have been performing several tests with different cluster and entry sizes and all of them show that reducing the entry size (aka L2 slice) consistently improves I/O performance, notably during random I/O (all tests done with sequential I/O show similar results). This is to be expected because loading and evicting an L2 slice is more expensive the larger the slice is. Here are some numbers on fully populated 40GB qcow2 images. The rightmost column represents the maximum L2 cache size in both cases. Cluster size = 64 KB \|-------------+--------------+--------------+--------------\| \| \| 1MB L2 cache \| 3MB L2 cache \| 5MB L2 cache \| \|-------------+--------------+--------------+--------------\| \| 4KB slices \| 6545 IOPS \| 12045 IOPS \| 55680 IOPS \| \| 16KB slices \| 5177 IOPS \| 9798 IOPS \| 56278 IOPS \| \| 64KB slices \| 2718 IOPS \| 5326 IOPS \| 57355 IOPS \| \|-------------+--------------+--------------+--------------\| Cluster size = 256 KB \|--------------+----------------+--------------+-----------------\| \| \| 512KB L2 cache \| 1MB L2 cache \| 1280KB L2 cache \| \|--------------+----------------+--------------+-----------------\| \| 4KB slices \| 8539 IOPS \| 21071 IOPS \| 55417 IOPS \| \| 64KB slices \| 3598 IOPS \| 9772 IOPS \| 57687 IOPS \| \| 256KB slices \| 1415 IOPS \| 4120 IOPS \| 58001 IOPS \| \|--------------+----------------+--------------+-----------------\| As can be seen in the numbers, the only exception to the rule is when the cache is large enough to hold all L2 tables. This is also to be expected because in this case no cache entry is ever evicted so reducing its size doesn't bring any benefit. This patch sets the default L2 cache entry size to 4KB except when the cache is large enough for the whole disk. Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-08 12:26:45 +01:00
Peter Maydell	adf2e451f3	Block layer patches: - Block graph change fixes (avoid loops, cope with non-tree graphs) - bdrv_set_aio_context() related fixes - HMP snapshot commands: Use only tag, not the ID to identify snapshots - qmeu-img, commit: Error path fixes - block/nvme: Build fix for gcc 9 - MAINTAINERS updates - Fix various issues with bdrv_refresh_filename() - Fix various iotests - Include LUKS overhead in qemu-img measure for qcow2 - A fix for vmdk's image creation interface -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJcc/knAAoJEH8JsnLIjy/WptQP/3F8Lh52H4egXaP7NUUuDjQM AhqhuDAp/EZBS+xim9kLTogNJADe/rMWdSX/YB5aLpSPYbjasC66NgaLhd6QewgQ VIcsLUdlYAyZ5ZjJytimfMTLwm1X02RmVIe55y52DTY8LlfViZzOlf3qwqPm00ao EJB2cl8UJLM+PVEu59cCw3R0/06LY+WIJRB32d3tnCBRTkaJwfR9h4lrp/juVcFZ U+2eWU68KMbUHSYiWANowN+KRV3uPY4HVA98v3F0vDmcBxlVHOeBg6S+PcT7tK8p huzCMwcdwUyPMJgVs/+WBtUnbG0jN6SHUYmFLz859UMVgBnCw5tzBMf8qw1wOA4A Iw+zor27Pxj4IlxcLPp5f97YZ8k9acdMR2VKPH6xLJZ1JF+sKa54RfzESd5EJeIj Mfcp773H0lIaWcFJ6RY1F0L1E1ta7QigwNBiWMdYfh0a0EWHnDvGyYeaSPYEQ+rl e8bZOcfrYwVI7DTDiZOIkGA9D8DXEPDNp+sl6s1DxeY69D0NNaXTtCPqFNNAbFbd 20uD7yDRZlWq32cQB/K9D5cSkZRSOzdUpLfLU3nQU2+dz11x6OpM6m7DVboSrztD 1HtPPDzDEvH5dOP7ibd60s+ntjkSiNfNkUgnuVrBE/d/PocC1eHHpZt5V7f43Ofb RxVwH5+smzQ9nsNBfQR0 =gaah -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches: - Block graph change fixes (avoid loops, cope with non-tree graphs) - bdrv_set_aio_context() related fixes - HMP snapshot commands: Use only tag, not the ID to identify snapshots - qmeu-img, commit: Error path fixes - block/nvme: Build fix for gcc 9 - MAINTAINERS updates - Fix various issues with bdrv_refresh_filename() - Fix various iotests - Include LUKS overhead in qemu-img measure for qcow2 - A fix for vmdk's image creation interface # gpg: Signature made Mon 25 Feb 2019 14:18:15 GMT # gpg: using RSA key 7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: (71 commits) iotests: Skip 211 on insufficient memory vmdk: false positive of compat6 with hwversion not set iotests: add LUKS payload overhead to 178 qemu-img measure test qcow2: include LUKS payload overhead in qemu-img measure iotests.py: s/_/-/g on keys in qmp_log() iotests: Let 045 be run concurrently iotests: Filter SSH paths iotests.py: Filter filename in any string value iotests.py: Add is_str() iotests: Fix 207 to use QMP filters for qmp_log iotests: Fix 232 for LUKS iotests: Remove superfluous rm from 232 iotests: Fix 237 for Python 2.x iotests: Re-add filename filters iotests: Test json:{} filenames of internal BDSs block: BDS options may lack the "driver" option block/null: Generate filename even with latency-ns block/curl: Implement bdrv_refresh_filename() block/curl: Harmonize option defaults block/nvme: Fix bdrv_refresh_filename() ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2019-02-26 19:04:47 +00:00
yuchenlin	26c9296c31	vmdk: false positive of compat6 with hwversion not set In vmdk_co_create_opts, when it finds hw_version is undefined, it will set it to 4, which misleading the compat6 and hwversion in vmdk_co_do_create. Simply set hw_version to NULL after free, let the logic in vmdk_co_do_create to decide the value of hw_version. This bug can be reproduced by: $ qemu-img convert -O vmdk -o subformat=streamOptimized,compat6 /home/yuchenlin/syno.qcow2 /home/yuchenlin/syno.vmdk qemu-img: /home/yuchenlin/syno.vmdk: error while converting vmdk: compat6 cannot be enabled with hwversion set Signed-off-by: yuchenlin <yuchenlin@synology.com> Message-id: 20190221110805.28239-1-yuchenlin@synology.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-02-25 15:11:28 +01:00
Stefan Hajnoczi	61914f8906	qcow2: include LUKS payload overhead in qemu-img measure LUKS encryption reserves clusters for its own payload data. The size of this area must be included in the qemu-img measure calculation so that we arrive at the correct minimum required image size. (Ab)use the qcrypto_block_create() API to determine the payload overhead. We discard the payload data that qcrypto thinks will be written to the image. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190218104525.23674-2-stefanha@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-02-25 15:11:28 +01:00
Max Reitz	1e47cb7f52	block/null: Generate filename even with latency-ns While we cannot represent the latency-ns option in a filename, it is not a strong option so not being able to should not stop us from generating a filename nonetheless. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20190201192935.18394-30-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-02-25 15:11:27 +01:00
Max Reitz	937c007b6e	block/curl: Implement bdrv_refresh_filename() Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20190201192935.18394-29-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-02-25 15:11:27 +01:00
Max Reitz	712b64e8f3	block/curl: Harmonize option defaults Both of the defaults we currently have in the curl driver are named based on a slightly different schema, let's unify that and call both CURL_BLOCK_OPT_${NAME}_DEFAULT. While at it, we can add a macro for the third option for which a default exists, namely "sslverify". Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20190201192935.18394-28-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-02-25 15:11:27 +01:00
Max Reitz	cc61b0740f	block/nvme: Fix bdrv_refresh_filename() Currently, nvme's bdrv_refresh_filename() is an exact copy of null's implementation. However, for null, "null-co://" and "null-aio://" are indeed valid filenames -- for nvme, they are not, as a device address is still required. The correct implementation should generate a filename of the form "nvme://[PCI address]/[namespace]" (as the comment above nvme_parse_filename() describes). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20190201192935.18394-27-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-02-25 15:11:27 +01:00
Max Reitz	998b3a1e5a	block: Purify .bdrv_refresh_filename() Currently, BlockDriver.bdrv_refresh_filename() is supposed to both refresh the filename (BDS.exact_filename) and set BDS.full_open_options. Now that we have generic code in the central bdrv_refresh_filename() for creating BDS.full_open_options, we can drop the latter part from all BlockDriver.bdrv_refresh_filename() implementations. This also means that we can drop all of the existing default code for this from the global bdrv_refresh_filename() itself. Furthermore, we now have to call BlockDriver.bdrv_refresh_filename() after having set BDS.full_open_options, because the block driver's implementation should now be allowed to depend on BDS.full_open_options being set correctly. Finally, with this patch we can drop the @options parameter from BlockDriver.bdrv_refresh_filename(); also, add a comment on this function's purpose in block/block_int.h while touching its interface. This completely obsoletes blklogwrite's implementation of .bdrv_refresh_filename(). Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190201192935.18394-25-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-02-25 15:11:27 +01:00

1 2 3 4 5 ...

4410 Commits