Commit Graph

288 Commits

Author SHA1 Message Date
Max Reitz 2e1990b26e block/mirror: Convert to coroutines
In order to talk to the source BDS (and maybe in the future to the
target BDS as well) directly, we need to convert our existing AIO
requests into coroutine I/O requests.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20180613181823.13618-3-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-18 17:04:48 +02:00
Max Reitz 4295c5fc61 block/mirror: Pull out mirror_perform()
When converting mirror's I/O to coroutines, we are going to need a point
where these coroutines are created.  mirror_perform() is going to be
that point.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20180613181823.13618-2-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-18 17:04:43 +02:00
Kevin Wolf 89bd030533 block: Really pause block jobs on drain
We already requested that block jobs be paused in .bdrv_drained_begin,
but no guarantee was made that the job was actually inactive at the
point where bdrv_drained_begin() returned.

This introduces a new callback BdrvChildRole.bdrv_drained_poll() and
uses it to make bdrv_drain_poll() consider block jobs using the node to
be drained.

For the test case to work as expected, we have to switch from
block_job_sleep_ns() to qemu_co_sleep_ns() so that the test job is even
considered active and must be waited for when draining the node.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18 15:03:25 +02:00
Kevin Wolf 1266c9b9f5 job: Add error message for failing jobs
So far we relied on job->ret and strerror() to produce an error message
for failed jobs. Not surprisingly, this tends to result in completely
useless messages.

This adds a Job.error field that can contain an error string for a
failing job, and a parameter to job_completed() that sets the field. As
a default, if NULL is passed, we continue to use strerror(job->ret).

All existing callers are changed to pass NULL. They can be improved in
separate patches.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
2018-05-30 13:31:01 +02:00
Kevin Wolf 30a5c887bf job: Move progress fields to Job
BlockJob has fields .offset and .len, which are actually misnomers today
because they are no longer tied to block device sizes, but just progress
counters. As such they make a lot of sense in generic Jobs.

This patch moves the fields to Job and renames them to .progress_current
and .progress_total to describe their function better.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-05-23 14:30:51 +02:00
Kevin Wolf 2e1795b581 job: Add job_transition_to_ready()
The transition to the READY state was still performed in the BlockJob
layer, in the same function that sent the BLOCK_JOB_READY QMP event.

This patch brings the state transition to the Job layer and implements
the QMP event using a notifier called from the Job layer, like we
already do for other events related to state transitions.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-05-23 14:30:51 +02:00
Kevin Wolf 198c49cc8d job: Add job_yield()
This moves block_job_yield() to the Job layer.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-05-23 14:30:51 +02:00
Kevin Wolf 3d70ff53b6 job: Move completion and cancellation to Job
This moves the top-level job completion and cancellation functions from
BlockJob to Job.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-05-23 14:30:51 +02:00
Kevin Wolf 3453d97243 job: Move .complete callback to Job
This moves the .complete callback that tells a READY job to complete
from BlockJobDriver to JobDriver. The wrapper function job_complete()
doesn't require anything block job specific any more and can be moved
to Job.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-05-23 14:30:50 +02:00
Kevin Wolf b69f777dd9 job: Add job_drain()
block_job_drain() contains a blk_drain() call which cannot be moved to
Job, so add a new JobDriver callback JobDriver.drain which has a common
implementation for all BlockJobs. In addition to this we keep the
existing BlockJobDriver.drain callback that is called by the common
drain implementation for all block jobs.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-05-23 14:30:50 +02:00
Kevin Wolf 004e95df98 job: Convert block_job_cancel_async() to Job
block_job_cancel_async() did two things that were still block job
specific:

* Setting job->force. This field makes sense on the Job level, so we can
  just move it. While at it, rename it to job->force_cancel to make its
  purpose more obvious.

* Resetting the I/O status. This can't be moved because generic Jobs
  don't have an I/O status. What the function really implements is a
  user resume, except without entering the coroutine. Consequently, it
  makes sense to call the .user_resume driver callback here which
  already resets the I/O status.

  The old block_job_cancel_async() has two separate if statements that
  check job->iostatus != BLOCK_DEVICE_IO_STATUS_OK and job->user_paused.
  However, the former condition always implies the latter (as is
  asserted in block_job_iostatus_reset()), so changing the explicit call
  of block_job_iostatus_reset() on the former condition with the
  .user_resume callback on the latter condition is equivalent and
  doesn't need to access any BlockJob specific state.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-05-23 14:30:50 +02:00
Kevin Wolf 4ad351819b job: Move single job finalisation to Job
This moves the finalisation of a single job from BlockJob to Job.

Some part of this code depends on job transactions, and job transactions
call this code, we introduce some temporary calls from Job functions to
BlockJob ones. This will be fixed once transactions move to Job, too.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-05-23 14:30:50 +02:00
Kevin Wolf bb02b65c7d job: Move BlockJobCreateFlags to Job
This renames the BlockJobCreateFlags constants, moves a few JOB_INTERNAL
checks to job_create() and the auto_{finalize,dismiss} fields from
BlockJob to Job.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-05-23 14:30:50 +02:00
Kevin Wolf b15de82867 job: Move pause/resume functions to Job
While we already moved the state related to job pausing to Job, the
functions to do were still BlockJob only. This commit moves them over to
Job.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
2018-05-23 14:30:50 +02:00
Kevin Wolf 5d43e86e11 job: Add job_sleep_ns()
There is nothing block layer specific about block_job_sleep_ns(), so
move the function to Job.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-05-23 14:30:50 +02:00
Kevin Wolf da01ff7f38 job: Move coroutine and related code to Job
This commit moves some core functions for dealing with the job coroutine
from BlockJob to Job. This includes primarily entering the coroutine
(both for the first and reentering) and yielding explicitly and at pause
points.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
2018-05-23 14:30:50 +02:00
Kevin Wolf 1908a5590c job: Move defer_to_main_loop to Job
Move the defer_to_main_loop functionality from BlockJob to Job.

The code can be simplified because we can use job->aio_context in
job_defer_to_main_loop_bh() now, instead of having to access the
BlockDriverState.

Probably taking the data->aio_context lock in addition was already
unnecessary in the old code because we didn't actually make use of
anything protected by the old AioContext except getting the new
AioContext, in case it changed between scheduling the BH and running it.
But it's certainly unnecessary now that the BDS isn't accessed at all
any more.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
2018-05-23 14:30:50 +02:00
Kevin Wolf daa7f2f946 job: Move cancelled to Job
We cannot yet move the whole logic around job cancelling to Job because
it depends on quite a few other things that are still only in BlockJob,
but we can move the cancelled field at least.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
2018-05-23 14:30:49 +02:00
Kevin Wolf 80fa2c756b job: Add reference counting
This moves reference counting from BlockJob to Job.

In order to keep calling the BlockJob cleanup code when the job is
deleted via job_unref(), introduce a new JobDriver.free callback. Every
block job must use block_job_free() for this callback, this is asserted
in block_job_create().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
2018-05-23 14:30:49 +02:00
Kevin Wolf 252291eaea job: Add JobDriver.job_type
This moves the job_type field from BlockJobDriver to JobDriver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
2018-05-23 14:30:49 +02:00
Kevin Wolf 8e4c87000f job: Rename BlockJobType into JobType
QAPI types aren't externally visible, so we can rename them without
causing problems. Before we add a job type to Job, rename the enum
so it can be used for more than just block jobs.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
2018-05-23 14:30:49 +02:00
Kevin Wolf 33e9e9bd62 job: Create Job, JobDriver and job_create()
This is the first step towards creating an infrastructure for generic
background jobs that aren't tied to a block device. For now, Job only
stores its ID and JobDriver, the rest stays in BlockJob.

The following patches will move over more parts of BlockJob to Job if
they are meaningful outside the context of a block job.

BlockJob.driver is now redundant, but this patch leaves it around to
avoid unnecessary churn. The next patches will get rid of almost all of
its uses anyway so that it can be removed later with much less churn.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
2018-05-23 14:30:49 +02:00
Max Reitz 228345bf5d block: Support BDRV_REQ_WRITE_UNCHANGED in filters
Update the rest of the filter drivers to support
BDRV_REQ_WRITE_UNCHANGED.  They already forward write request flags to
their children, so we just have to announce support for it.

This patch does not cover the replication driver because that currently
does not support flags at all, and because it just grabs the WRITE
permission for its children when it can, so we should be fine just
submitting the incoming WRITE_UNCHANGED requests as normal writes.

It also does not cover format drivers for similar reasons.  They all use
bdrv_format_default_perms() as their .bdrv_child_perm() implementation
so they just always grab the WRITE permission for their file children
whenever possible.  In addition, it often would be difficult to
ascertain whether incoming unchanging writes end up as unchanging writes
in their files.  So we just leave them as normal potentially changing
writes.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20180421132929.21610-7-mreitz@redhat.com
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-05-15 16:15:21 +02:00
Kevin Wolf dee81d5111 blockjob: Introduce block_job_ratelimit_get_delay()
This gets us rid of more direct accesses to BlockJob fields from the
job drivers.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
2018-05-15 16:11:50 +02:00
Kevin Wolf 18bb69287e blockjob: Implement block_job_set_speed() centrally
All block job drivers support .set_speed and all of them duplicate the
same code to implement it. Move that code to blockjob.c and remove the
now useless callback.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
2018-05-15 16:11:50 +02:00
Kevin Wolf f05fee508f blockjob: Move RateLimit to BlockJob
Every block job has a RateLimit, and they all do the exact same thing
with it, so it should be common infrastructure. Move the struct field
for a start.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
2018-05-15 16:11:50 +02:00
Kevin Wolf 05df8a6a2b blockjob: Wrappers for progress counter access
Block job drivers are not expected to mess with the internals of the
BlockJob object, so provide wrapper functions for one of the cases where
they still do it: Updating the progress counter.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
2018-05-15 16:11:49 +02:00
Max Reitz eb36639f7b block/mirror: Make cancel always cancel pre-READY
Commit b76e4458b1 made the mirror block
job respect block-job-cancel's @force flag: With that flag set, it would
now always really cancel, even post-READY.

Unfortunately, it had a side effect: Without that flag set, it would now
never cancel, not even before READY.  Considering that is an
incompatible change and not noted anywhere in the commit or the
description of block-job-cancel's @force parameter, this seems
unintentional and we should revert to the previous behavior, which is to
immediately cancel the job when block-job-cancel is called before source
and target are in sync (i.e. before the READY event).

Cc: qemu-stable@nongnu.org
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1572856
Reported-by: Yanan Fu <yfu@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20180501220509.14152-2-mreitz@redhat.com
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
2018-05-08 10:47:27 -04:00
Stefan Hajnoczi ddc4115efd block/mirror: honor ratelimit again
Commit b76e4458b1 ("block/mirror: change
the semantic of 'force' of block-job-cancel") accidentally removed the
ratelimit in the mirror job.

Reintroduce the ratelimit but keep the block-job-cancel force=true
behavior that was added in commit
b76e4458b1.

Note that block_job_sleep_ns() returns immediately when the job is
cancelled.  Therefore it's safe to unconditionally call
block_job_sleep_ns() - a cancelled job does not sleep.

This commit fixes the non-deterministic qemu-iotests 185 output.  The
test relies on the ratelimit to make the job sleep until the 'quit'
command is processed.  Previously the job could complete before the
'quit' command was received since there was no ratelimit.

Cc: Liang Li <liliang.opensource@gmail.com>
Cc: Jeff Cody <jcody@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20180424123527.19168-1-stefanha@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2018-05-08 10:47:27 -04:00
Liang Li b76e4458b1 block/mirror: change the semantic of 'force' of block-job-cancel
When doing drive mirror to a low speed shared storage, if there was heavy
BLK IO write workload in VM after the 'ready' event, drive mirror block job
can't be canceled immediately, it would keep running until the heavy BLK IO
workload stopped in the VM.

Libvirt depends on the current block-job-cancel semantics, which is that
when used without a flag after the 'ready' event, the command blocks
until data is in sync.  However, these semantics are awkward in other
situations, for example, people may use drive mirror for realtime
backups while still wanting to use block live migration.  Libvirt cannot
start a block live migration while another drive mirror is in progress,
but the user would rather abandon the backup attempt as broken and
proceed with the live migration than be stuck waiting for the current
drive mirror backup to finish.

The drive-mirror command already includes a 'force' flag, which libvirt
does not use, although it documented the flag as only being useful to
quit a job which is paused.  However, since quitting a paused job has
the same effect as abandoning a backup in a non-paused job (namely, the
destination file is not in sync, and the command completes immediately),
we can just improve the documentation to make the force flag obviously
useful.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Jeff Cody <jcody@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Max Reitz <mreitz@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Cc: John Snow <jsnow@redhat.com>
Reported-by: Huaitong Han <huanhuaitong@didichuxing.com>
Signed-off-by: Huaitong Han <huanhuaitong@didichuxing.com>
Signed-off-by: Liang Li <liliangleo@didichuxing.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-19 12:01:39 +01:00
John Snow 75859b9420 blockjobs: model single jobs as transactions
model all independent jobs as single job transactions.

It's one less case we have to worry about when we add more states to the
transition machine. This way, we can just treat all job lifetimes exactly
the same. This helps tighten assertions of the STM graph and removes some
conditionals that would have been needed in the coming commits adding a
more explicit job lifetime management API.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-19 12:01:24 +01:00
Eric Blake 3e4d0e72b7 block: Switch passthrough drivers to .bdrv_co_block_status()
We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the generic helpers, and all passthrough clients
(blkdebug, commit, mirror, throttle) accordingly.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-02 18:39:07 +01:00
Paolo Bonzini 5bf1d5a73a blockjob: remove clock argument from block_job_sleep_ns
All callers are using QEMU_CLOCK_REALTIME, and it will not be possible to
support more than one clock when block_job_sleep_ns switches to a single
timer stored in the BlockJob struct.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Tested-By: Jeff Cody <jcody@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-11-29 15:11:02 +01:00
Eric Blake 3182664220 block: Convert bdrv_get_block_status_above() to bytes
We are gradually moving away from sector-based interfaces, towards
byte-based.  In the common case, allocation is unlikely to ever use
values that are not naturally sector-aligned, but it is possible
that byte-based values will let us be more precise about allocation
at the end of an unaligned file that can do byte-based access.

Changing the name of the function from bdrv_get_block_status_above()
to bdrv_block_status_above() ensures that the compiler enforces that
all callers are updated.  Likewise, since it a byte interface allows
an offset mapping that might not be sector aligned, split the mapping
out of the return value and into a pass-by-reference parameter.  For
now, the io.c layer still assert()s that all uses are sector-aligned,
but that can be relaxed when a later patch implements byte-based
block status in the drivers.

For the most part this patch is just the addition of scaling at the
callers followed by inverse scaling at bdrv_block_status(), plus
updates for the new split return interface.  But some code,
particularly bdrv_block_status(), gets a lot simpler because it no
longer has to mess with sectors.  Likewise, mirror code no longer
computes s->granularity >> BDRV_SECTOR_BITS, and can therefore drop
an assertion about alignment because the loop no longer depends on
alignment (never mind that we don't really have a driver that
reports sub-sector alignments, so it's not really possible to test
the effect of sub-sector mirroring).  Fix a neighboring assertion to
use is_power_of_2 while there.

For ease of review, bdrv_get_block_status() was tackled separately.

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-26 14:45:57 +02:00
Eric Blake 7cfd527525 block: Make bdrv_round_to_clusters() signature more useful
In the process of converting sector-based interfaces to bytes,
I'm finding it easier to represent a byte count as a 64-bit
integer at the block layer (even if we are internally capped
by SIZE_MAX or even INT_MAX for individual transactions, it's
still nicer to not have to worry about truncation/overflow
issues on as many variables).  Update the signature of
bdrv_round_to_clusters() to uniformly use int64_t, matching
the signature already chosen for bdrv_is_allocated and the
fact that off_t is also a signed type, then adjust clients
according to the required fallout (even where the result could
now exceed 32 bits, no client is directly assigning the result
into a 32-bit value without breaking things into a loop first).

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-26 14:45:57 +02:00
Eric Blake 298a1665a2 block: Allow NULL file for bdrv_get_block_status()
Not all callers care about which BDS owns the mapping for a given
range of the file.  This patch merely simplifies the callers by
consolidating the logic in the common call point, while guaranteeing
a non-NULL file to all the driver callbacks, for no semantic change.
The only caller that does not care about pnum is bdrv_is_allocated,
as invoked by vvfat; we can likewise add assertions that the rest
of the stack does not have to worry about a NULL pnum.

Furthermore, this will also set the stage for a future cleanup: when
a caller does not care about which BDS owns an offset, it would be
nice to allow the driver to optimize things to not have to return
BDRV_BLOCK_OFFSET_VALID in the first place.  In the case of fragmented
allocation (for example, it's fairly easy to create a qcow2 image
where consecutive guest addresses are not at consecutive host
addresses), the current contract requires bdrv_get_block_status()
to clamp *pnum to the limit where host addresses are no longer
consecutive, but allowing a NULL file means that *pnum could be
set to the full length of known-allocated data.

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-26 14:45:57 +02:00
Vladimir Sementsov-Ogievskiy ce960aa906 block/mirror: check backing in bdrv_mirror_top_flush
Backing may be zero after failed bdrv_append in mirror_start_job,
which leads to SIGSEGV.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20170929152255.5431-1-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-06 16:30:48 +02:00
Vladimir Sementsov-Ogievskiy 18775ff326 block/mirror: check backing in bdrv_mirror_top_refresh_filename
Backing may be zero after failed bdrv_attach_child in
bdrv_set_backing_hd, which leads to SIGSEGV.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20170928120300.58164-1-vsementsov@virtuozzo.com
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-06 16:30:47 +02:00
Eric Blake 23ca459a45 mirror: Switch mirror_dirty_init() to byte-based iteration
Now that we have adjusted the majority of the calls this function
makes to be byte-based, it is easier to read the code if it makes
passes over the image using bytes rather than sectors.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake e0d7f73e63 dirty-bitmap: Change bdrv_[re]set_dirty_bitmap() to use bytes
Some of the callers were already scaling bytes to sectors; others
can be easily converted to pass byte offsets, all in our shift
towards a consistent byte interface everywhere.  Making the change
will also make it easier to write the hold-out callers to use byte
rather than sectors for their iterations; it also makes it easier
for a future dirty-bitmap patch to offload scaling over to the
internal hbitmap.  Although all callers happen to pass
sector-aligned values, make the internal scaling robust to any
sub-sector requests.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake 3b5d4df0c6 dirty-bitmap: Change bdrv_get_dirty_locked() to take bytes
Half the callers were already scaling bytes to sectors; the other
half can eventually be simplified to use byte iteration.  Both
callers were already using the result as a bool, so make that
explicit.  Making the change also makes it easier for a future
dirty-bitmap patch to offload scaling over to the internal hbitmap.

Remember, asking whether a byte is dirty is effectively asking
whether the entire granularity containing the byte is dirty, since
we only track dirtiness by granularity.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake 9a46dba7b7 dirty-bitmap: Change bdrv_get_dirty_count() to report bytes
Thanks to recent cleanups, all callers were scaling a return value
of sectors into bytes; do the scaling internally instead.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake f798184cfd dirty-bitmap: Change bdrv_dirty_iter_next() to report byte offset
Thanks to recent cleanups, most callers were scaling a return value
of sectors into bytes (the exception, in qcow2-bitmap, will be
converted to byte-based iteration later).  Update the interface to
do the scaling internally instead.

In qcow2-bitmap, the code was specifically checking for an error
return of -1.  To avoid a regression, we either have to make sure
we continue to return -1 (rather than a scaled -512) on error, or
we have to fix the caller to treat all negative values as error
rather than just one magic value.  It's easy enough to make both
changes at the same time, even though either one in isolation
would work.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake 715a74d819 dirty-bitmap: Set iterator start by offset, not sector
All callers to bdrv_dirty_iter_new() passed 0 for their initial
starting point, drop that parameter.

Most callers to bdrv_set_dirty_iter() were scaling a byte offset to
a sector number; the exception qcow2-bitmap will be converted later
to use byte rather than sector iteration.  Move the scaling to occur
internally to dirty bitmap code instead, so that callers now pass
in bytes.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Kevin Wolf e0995dc3da block: Add reopen_queue to bdrv_child_perm()
In the context of bdrv_reopen(), we'll have to look at the state of the
graph as it will be after the reopen. This interface addition is in
preparation for the change.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2017-09-26 14:46:23 +02:00
Manos Pitsidianakis f7cc69b326 block: add default implementations for bdrv_co_get_block_status()
bdrv_co_get_block_status_from_file() and
bdrv_co_get_block_status_from_backing() set *file to bs->file and
bs->backing respectively, so that bdrv_co_get_block_status() can recurse
to them. Future block drivers won't have to duplicate code to implement
this.

Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-09-04 18:31:13 +02:00
Fam Zheng 045a2f8254 mirror: Mark target BB as "force allow inactivate"
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20170823134242.12080-4-famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2017-08-23 10:21:55 -05:00
Kevin Wolf d3c8c67469 block: Skip implicit nodes in query-block/blockstats
Commits 0db832f and 6cdbceb introduced the automatic insertion of filter
nodes above the top layer of mirror and commit block jobs. The
assumption made there was that since libvirt doesn't do node-level
management of the block layer yet, it shouldn't be affected by added
nodes.

This is true as far as commands issued by libvirt are concerned. It only
uses BlockBackend names to address nodes, so any operations it performs
still operate on the root of the tree as intended.

However, the assumption breaks down when you consider query commands,
which return data for the wrong node now. These commands also return
information on some child nodes (bs->file and/or bs->backing), which
libvirt does make use of, and which refer to the wrong nodes, too.

One of the consequences is that oVirt gets wrong information about the
image size and stops the VM in response as long as a mirror or commit
job is running:

https://bugzilla.redhat.com/show_bug.cgi?id=1470634

This patch fixes the problem by hiding the implicit nodes created
automatically by the mirror and commit block jobs in the output of
query-block and BlockBackend-based query-blockstats as long as the user
doesn't indicate that they are aware of those nodes by providing a node
name for them in the QMP command to start the block job.

The node-based commands query-named-block-nodes and query-blockstats
with query-nodes=true still show all nodes, including implicit ones.
This ensures that users that are capable of node-level management can
still access the full information; users that only know BlockBackends
won't use these commands.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Tested-by: Eric Blake <eblake@redhat.com>
2017-07-24 15:06:04 +02:00
Max Reitz 3a691c50f1 block: Add PreallocMode to blk_truncate()
blk_truncate() itself will pass that value to bdrv_truncate(), and all
callers of blk_truncate() just set the parameter to PREALLOC_MODE_OFF
for now.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20170613202107.10125-4-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-07-11 17:45:01 +02:00
Eric Blake 51b0a48888 block: Make bdrv_is_allocated_above() byte-based
We are gradually moving away from sector-based interfaces, towards
byte-based.  In the common case, allocation is unlikely to ever use
values that are not naturally sector-aligned, but it is possible
that byte-based values will let us be more precise about allocation
at the end of an unaligned file that can do byte-based access.

Changing the signature of the function to use int64_t *pnum ensures
that the compiler enforces that all callers are updated.  For now,
the io.c layer still assert()s that all callers are sector-aligned,
but that can be relaxed when a later patch implements byte-based
block status.  Therefore, for the most part this patch is just the
addition of scaling at the callers followed by inverse scaling at
bdrv_is_allocated().  But some code, particularly stream_run(),
gets a lot simpler because it no longer has to mess with sectors.
Leave comments where we can further simplify by switching to
byte-based iterations, once later patches eliminate the need for
sector-aligned operations.

For ease of review, bdrv_is_allocated() was tackled separately.

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-07-10 13:18:07 +02:00
Eric Blake fb2ef7919b mirror: Switch mirror_iteration() to byte-based
We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Change the internal
loop iteration of mirroring to track by bytes instead of sectors
(although we are still guaranteed that we iterate by steps that
are both sector-aligned and multiples of the granularity).  Drop
the now-unused mirror_clip_sectors().

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-07-10 13:18:06 +02:00
Eric Blake ae4cc8777b mirror: Switch mirror_do_read() to byte-based
We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Convert another internal
function, preserving all existing semantics, and adding one more
assertion that things are still sector-aligned (so that conversions
to sectors in mirror_read_complete don't need to round).

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-07-10 13:18:06 +02:00
Eric Blake 782d97efec mirror: Switch mirror_cow_align() to byte-based
We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Convert another internal
function (no semantic change), and add mirror_clip_bytes() as a
counterpart to mirror_clip_sectors().  Some of the conversion is
a bit tricky, requiring temporaries to convert between units; it
will be cleared up in a following patch.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-07-10 13:18:06 +02:00
Eric Blake 931e52607f mirror: Update signature of mirror_clip_sectors()
Rather than having a void function that modifies its input
in-place as the output, change the signature to reduce a layer
of indirection and return the result.

Suggested-by: John Snow <jsnow@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-07-10 13:18:06 +02:00
Eric Blake e6f2419389 mirror: Switch mirror_do_zero_or_discard() to byte-based
We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Convert another internal
function (no semantic change).

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-07-10 13:18:06 +02:00
Eric Blake b436982f04 mirror: Switch MirrorBlockJob to byte-based
We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Continue by converting an
internal structure (no semantic change), and all references to the
buffer size.

Add an assertion that our use of s->granularity >> BDRV_SECTOR_BITS
(necessary for interaction with sector-based dirty bitmaps, until
a later patch converts those to be byte-based) does not suffer from
truncation problems.

[checkpatch has a false positive on use of MIN() in this patch]

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-07-10 13:18:06 +02:00
Eric Blake 5cb1a49e01 trace: Show blockjob actions via bytes, not sectors
Upcoming patches are going to switch to byte-based interfaces
instead of sector-based.  Even worse, trace_backup_do_cow_enter()
had a weird mix of cluster and sector indices.

The trace interface is low enough that there are no stability
guarantees, and therefore nothing wrong with changing our units,
even in cases like trace_backup_do_cow_skip() where we are not
changing the trace output.  So make the tracing uniformly use
bytes.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-07-10 13:18:06 +02:00
Eric Blake f3e4ce4af3 blockjob: Track job ratelimits via bytes, not sectors
The user interface specifies job rate limits in bytes/second.
It's pointless to have our internal representation track things
in sectors/second, particularly since we want to move away from
sector-based interfaces.

Fix up a doc typo found while verifying that the ratelimit
code handles the scaling difference.

Repetition of expressions like 'n * BDRV_SECTOR_SIZE' will be
cleaned up later when functions are converted to iterate over
images by bytes rather than by sectors.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-07-10 13:18:06 +02:00
Eric Blake d5254033da block: Simplify use of BDRV_BLOCK_RAW
The lone caller that cares about a return of BDRV_BLOCK_RAW
(namely, io.c:bdrv_co_get_block_status) completely replaces the
return value, so there is no point in passing BDRV_BLOCK_DATA.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-07-10 13:18:05 +02:00
Manos Pitsidianakis f5a5ca7969 block: change variable names in BlockDriverState
Change the 'int count' parameter in *pwrite_zeros, *pdiscard related
functions (and some others) to 'int bytes', as they both refer to bytes.
This helps with code legibility.

Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr>
Message-id: 20170609101808.13506-1-el13635@mail.ntua.gr
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-06-26 14:54:46 +02:00
Paolo Bonzini b64bd51efa block: protect modification of dirty bitmaps with a mutex
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20170605123908.18777-17-pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
2017-06-16 07:55:00 +08:00
Paolo Bonzini 2119882c7e block: introduce dirty_bitmap_mutex
It protects only the list of dirty bitmaps; in the next patch we will
also protect their content.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20170605123908.18777-15-pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
2017-06-16 07:55:00 +08:00
Stefan Hajnoczi 0748b3526e Block layer patches
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJZLDGTAAoJEH8JsnLIjy/WjY0P/jtXF+SQKs9H695Uy+U+EUr5
 9w+lYTfjHXTUx9F5AKT4OAMww/uWSG5ywG8FZD8AJZyt1B8piDrSA7sW5YYgWCV4
 XD1HMzN6pasVGrchzgfBSW/K9BgUlvBEjqFrLCNVii0osa68J7vX3uJsLyRA+yPo
 ijFseJaB/b0KnAPG9JqHXc4DVQgU2IhZH3ZsgE6Utb+KUrm8/veiORhWQboBbfCW
 eyTF+YWee/rI+up4nKn9+Le57EUXWr3Y50BBxFZw1/4wQbv0WEhOIbl/Z4ggxNRR
 VoyXGMqOIZFlGQ7bGxDawuEwGKfcOFFEHj3rhPrs7RPnod/6X8Vxm0rAr2GNZoOW
 9tLMQKe4HLo3kVwX65AhzWd/WMukAd0MC/0oaULOjiAEdlWjflyEhx9H9uYefl5x
 kn77ZqQqIEL4aCYRBLJn9oJV3CIZbSd6cuKC9UPuXAwD8XDWTXUzT/aDMJUnfij7
 vhS1qks1Wyk37wIjTuuERwrr0K6y8e3De30NeU6LqQuzXvJ1MYSp49BA1FCgHfmT
 x5RWbTSjLdjZiIo7biE2dSwdOtsxsnuxX19utqfGqOmegNU7LykRtu6mFh9kJhNe
 5ladt1Mu//puZLQ/q4RH/J0pwkuS/OVrS3LBobcggPGHhUIiHhdNUI73bb7BK0+9
 ME5DGLm4/c/REB3Lidr9
 =kkU0
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'kwolf/tags/for-upstream' into staging

Block layer patches

# gpg: Signature made Mon 29 May 2017 03:34:59 PM BST
# gpg:                using RSA key 0x7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* kwolf/tags/for-upstream:
  block/file-*: *_parse_filename() and colons
  block: Fix backing paths for filenames with colons
  block: Tweak error message related to qemu-img amend
  qemu-img: Fix leakage of options on error
  qemu-img: copy *key-secret opts when opening newly created files
  qemu-img: introduce --target-image-opts for 'convert' command
  qemu-img: fix --image-opts usage with dd command
  qemu-img: add support for --object with 'dd' command
  qemu-img: Fix documentation of convert
  qcow2: remove extra local_error variable
  mirror: Drop permissions on s->target on completion
  nvme: Add support for Controller Memory Buffers
  iotests: 147: Don't test inet6 if not available
  qemu-iotests: Test streaming with missing job ID
  stream: fix crash in stream_start() when block_job_create() fails

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-05-30 14:15:15 +01:00
Kevin Wolf 63c8ef2890 mirror: Drop permissions on s->target on completion
This fixes an assertion failure that was triggered by qemu-iotests 129
on some CI host, while the same test case didn't seem to fail on other
hosts.

Essentially the problem is that the blk_unref(s->target) in
mirror_exit() doesn't necessarily mean that the BlockBackend goes away
immediately. It is possible that the job completion was triggered nested
in mirror_drain(), which looks like this:

    BlockBackend *target = s->target;
    blk_ref(target);
    blk_drain(target);
    blk_unref(target);

In this case, the write permissions for s->target are retained until
after blk_drain(), which makes removing mirror_top_bs fail for the
active commit case (can't have a writable backing file in the chain
without the filter driver).

Explicitly dropping the permissions first means that the additional
reference doesn't hurt and the job can complete successfully even if
called from the nested blk_drain().

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2017-05-29 15:37:26 +02:00
Paolo Bonzini 05b0d8e3b8 blockjob: introduce block_job_early_fail
Outside blockjob.c, block_job_unref is only used when a block job fails
to start, and block_job_ref is not used at all.  The reference counting
thus is pretty well hidden.  Introduce a separate function to be used
by block jobs; because block_job_ref and block_job_unref now become
static, move them earlier in blockjob.c.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 20170508141310.8674-4-pbonzini@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2017-05-24 16:38:51 -04:00
Max Reitz ed3d2ec98a block: Add errp to b{lk,drv}_truncate()
For one thing, this allows us to drop the error message generation from
qemu-img.c and blockdev.c and instead have it unified in
bdrv_truncate().

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170328205129.15138-3-mreitz@redhat.com
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-04-28 16:02:02 +02:00
Fam Zheng 78bbd910bb block: Make errp the last parameter of commit_active_start
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20170421122710.15373-9-famz@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2017-04-24 09:13:44 +02:00
Fam Zheng 51ccfa2dbf mirror: Make errp the last parameter of mirror_start_job
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20170421122710.15373-8-famz@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2017-04-24 09:13:44 +02:00
Fam Zheng 19dd29e8a7 mirror: Fix aio context of mirror_top_bs
It should be moved to the same context as source, before inserting to the
graph.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-04-07 14:44:06 +02:00
Max Reitz 7a25fcd056 block/mirror: Fix use-after-free
If @bs does not have any parents, the only reference to @mirror_top_bs
will be held by the BlockJob object after the bdrv_unref() following
block_job_create(). However, if block_job_create() fails, this reference
will not exist and @mirror_top_bs will have been deleted when we
goto fail.

The issue comes back at all later entries to the fail label: We delete
the BlockJob object before rolling back our changes to the node graph.
This means that we will delete @mirror_top_bs in the process.

All in all, whenever @bs does not have any parents and we go down the
fail path we will dereference @mirror_top_bs after it has been deleted.

Fix this by invoking bdrv_unref() only when block_job_create() was
successful and by bdrv_ref()'ing @mirror_top_bs in the fail path before
deleting the BlockJob object. Finally, bdrv_unref() it at the end of the
fail path after we actually no longer need it.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-04-07 14:44:05 +02:00
Eric Blake 67adf4b398 trace: Fix backwards mirror_yield parameters
block/trace-events lists the parameters for mirror_yield
consistently with other mirror events (cnt just after s, like in
mirror_before_sleep; in_flight last, like in mirror_yield_in_flight).
But the callers were passing parameters in the wrong order, leading
to poor trace messages, including type truncation when there are
more than 4G dirty sectors involved.  Broken since its introduction
in commit bd48bde.

While touching this, ensure that all callers use the same type
(uint64_t) for cnt, as a later patch will enable the compiler to do
stricter type-checking.

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-03-24 09:21:42 +00:00
Fam Zheng c1cef67251 block: Always call bdrv_child_check_perm first
bdrv_child_set_perm alone is not very usable because the caller must
call bdrv_child_check_perm first. This is already encapsulated
conveniently in bdrv_child_try_set_perm, so remove the other prototypes
from the header and fix the one wrong caller, block/mirror.c.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-03-17 12:54:06 +01:00
Kevin Wolf fd4a6493bb mirror: Implement .bdrv_refresh_filename
We want query-block to return the right filename, even if a mirror job
put a bdrv_mirror_top on top of the actual image format driver. Let
bdrv_mirror_top.bdrv_refresh_filename get the filename from its backing
file.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2017-03-13 12:49:33 +01:00
Kevin Wolf 5fe31c25cc block: Fix error handling in bdrv_replace_in_backing_chain()
When adding an Error parameter, bdrv_replace_in_backing_chain() would
become nothing more than a wrapper around change_parent_backing_link().
So make the latter public, renamed as bdrv_replace_node(), and remove
bdrv_replace_in_backing_chain().

Most of the callers just remove a node from the graph that they just
inserted, so they can use &error_abort, but completion of a mirror job
with 'replaces' set can actually fail.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2017-03-07 14:53:28 +01:00
Kevin Wolf 88f9d1b3d2 mirror: Fix error path for dirty bitmap creation
mirror_top_bs must be removed from the graph again when creating the
dirty bitmap fails.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2017-03-07 14:53:28 +01:00
Kevin Wolf 0bf74767ff mirror: Fix permissions for removing mirror_top_bs
mirror_top_bs takes write permissions on its backing file, which can
make it impossible to attach that backing file node to another parent.
However, this is exactly what needs to be done in order to remove
mirror_top_bs from the backing chain. So give up the write permission
first.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2017-03-07 14:53:28 +01:00
Kevin Wolf 7d9fcb391c mirror: Fix permission problem with 'replaces'
The 'replaces' option of drive-mirror can be used to mirror a Quorum
node to a new image and then let the target image replace one of the
Quorum children. In order for this graph modification to succeed, the
mirror job needs to lift its restrictions on the target node first
before actually replacing the child.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2017-03-07 14:53:28 +01:00
Kevin Wolf b2c2832c61 block: Add Error parameter to bdrv_append()
Aborting on error in bdrv_append() isn't correct. This patch fixes it
and lets the callers handle failures.

Test case 085 needs a reference output update. This is caused by the
reversed order of bdrv_set_backing_hd() and change_parent_backing_link()
in bdrv_append(): When the backing file of the new node is set, the
parent nodes are still pointing to the old top, so the backing blocker
is now initialised with the node name rather than the BlockBackend name.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2017-02-28 20:47:51 +01:00
Kevin Wolf 12fa4af61f block: Add Error parameter to bdrv_set_backing_hd()
Not all callers of bdrv_set_backing_hd() know for sure that attaching
the backing file will be allowed by the permission system. Return the
error from the function rather than aborting.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2017-02-28 20:47:51 +01:00
Kevin Wolf 0db832f42e commit: Add filter-node-name to block-commit
Management tools need to be able to know about every node in the graph
and need a way to address them. Changing the graph structure was okay
because libvirt doesn't really manage the node level yet, but future
libvirt versions need to deal with both new and old version of qemu.

This new option to blockdev-commit allows the client to set a node-name
for the automatically inserted filter driver, and at the same time
serves as a witness for a future libvirt that this version of qemu does
automatically insert a filter driver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2017-02-28 20:47:50 +01:00
Kevin Wolf 6cdbceb12c mirror: Add filter-node-name to blockdev-mirror
Management tools need to be able to know about every node in the graph
and need a way to address them. Changing the graph structure was okay
because libvirt doesn't really manage the node level yet, but future
libvirt versions need to deal with both new and old version of qemu.

This new option to blockdev-mirror allows the client to set a node-name
for the automatically inserted filter driver, and at the same time
serves as a witness for a future libvirt that this version of qemu does
automatically insert a filter driver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
2017-02-28 20:47:50 +01:00
Kevin Wolf 4ef85a9c23 mirror: Use real permissions in mirror/active commit block job
The mirror block job is mainly used for two different scenarios:
Mirroring to an otherwise unused, independent target node, or for active
commit where the target node is part of the backing chain of the source.

Similarly to the commit block job patch, we need to insert a new filter
node to keep the permissions correct during active commit.

Note that one change this implies is that job->blk points to
mirror_top_bs as its root now, and mirror_top_bs (rather than the actual
source node) contains the bs->job pointer. This requires qemu-img commit
to get the job by name now rather than just taking bs->job.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
Acked-by: Max Reitz <mreitz@redhat.com>
2017-02-28 20:47:46 +01:00
Kevin Wolf 76d554e20b blockjob: Add permissions to block_job_add_bdrv()
Block jobs don't actually do I/O through the the reference they create
with block_job_add_bdrv(), but they might want to use the permisssion
system to express what the block job does to intermediate nodes. This
adds permissions to block_job_add_bdrv() to provide the means to request
permissions.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
2017-02-28 20:40:37 +01:00
Kevin Wolf c6cc12bfa7 blockjob: Add permissions to block_job_create()
This functions creates a BlockBackend internally, so the block jobs need
to tell it what they want to do with the BB.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
2017-02-28 20:40:37 +01:00
Kevin Wolf d7086422b1 block: Add error parameter to blk_insert_bs()
Now that blk_insert_bs() requests the BlockBackend permissions for the
node it attaches to, it can fail. Instead of aborting, pass the errors
to the callers.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
2017-02-28 20:40:36 +01:00
Kevin Wolf 6d0eb64d5c block: Add permissions to blk_new()
We want every user to be specific about the permissions it needs, so
we'll pass the initial permissions as parameters to blk_new(). A user
only needs to call blk_set_perm() if it wants to change the permissions
after the fact.

The permissions are stored in the BlockBackend and applied whenever a
BlockDriverState should be attached in blk_insert_bs().

This does not include actually choosing the right set of permissions
everywhere yet. Instead, the usual FIXME comment is added to each place
and will be addressed in individual patches.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
2017-02-28 20:40:36 +01:00
John Snow 39c11580f3 block/mirror: fix broken sparseness detection
int64_t is in all likelihood the actual scalar type we want.
Yep, really.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1219541

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
2017-02-27 14:02:31 -05:00
Kevin Wolf becc347e1c mirror: Resize active commit base in mirror_run()
This is more consistent with the commit block job, and it moves the code
to a place where we already have the necessary BlockBackends to resize
the base image when bdrv_truncate() is changed to require a BdrvChild.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2017-02-24 16:09:23 +01:00
Anton Nefedov 90ab48eb07 mirror: do not increase offset during initial zero_or_discard phase
If explicit zeroing out before mirroring is required for the target image,
it moves the block job offset counter to EOF, then offset and len counters
count the image size twice. There is no harm but stats are confusing,
specifically the progress of the operation is always reported as 99% by
management tools.

The patch skips offset increase for the first "technical" pass over the
image. This should not cause any further harm.

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1486045515-8009-1-git-send-email-den@openvz.org
CC: Jeff Cody <jcody@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Eric Blake <eblake@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
2017-02-21 10:38:00 -05:00
Paolo Bonzini b9e413dd37 block: explicitly acquire aiocontext in aio callbacks that need it
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170213135235.12274-16-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-02-21 11:39:39 +00:00
Paolo Bonzini bdffb31d8e mirror: do not flush every time the disks are synced
This puts a huge strain on the disks when there are many concurrent
migrations.  With this patch we only flush twice: just before issuing
the event, and just before pivoting to the destination.  If management
will complete the job close to the BLOCK_JOB_READY event, the cost of
the second flush should be small anyway.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20161109162008.27287-2-pbonzini@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-11-14 22:49:26 -05:00
John Snow 5ccac6f186 blockjob: add block_job_start
Instead of automatically starting jobs at creation time via backup_start
et al, we'd like to return a job object pointer that can be started
manually at later point in time.

For now, add the block_job_start mechanism and start the jobs
automatically as we have been doing, with conversions job-by-job coming
in later patches.

Of note: cancellation of unstarted jobs will perform all the normal
cleanup as if the job had started, particularly abort and clean. The
only difference is that we will not emit any events, because the job
never actually started.

Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 1478587839-9834-5-git-send-email-jsnow@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-11-14 22:47:34 -05:00
John Snow a7815a764c blockjob: add .start field
Add an explicit start field to specify the entrypoint. We already have
ownership of the coroutine itself AND managing the lifetime of the
coroutine, let's take control of creation of the coroutine, too.

This will allow us to delay creation of the actual coroutine until we
know we'll actually start a BlockJob in block_job_start. This avoids
the sticky question of how to "un-create" a Coroutine that hasn't been
started yet.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1478587839-9834-4-git-send-email-jsnow@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-11-14 22:47:34 -05:00
John Snow c87621ea68 blockjobs: split interface into public/private, Part 1
To make it a little more obvious which functions are intended to be
public interface and which are intended to be for use only by jobs
themselves, split the interface into "public" and "private" files.

Convert blockjobs (e.g. block/backup) to using the private interface.
Leave blockdev and others on the public interface.

There are remaining uses of private state by qemu-img, and several
cases in blockdev.c and block/io.c where we grab job->blk for the
purposes of acquiring an AIOContext.

These will be corrected in future patches.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 1477584421-1399-7-git-send-email-jsnow@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-11-01 08:04:56 -04:00
John Snow 8254b6d953 blockjob: centralize QMP event emissions
There's no reason to leave this to blockdev; we can do it in blockjobs
directly and get rid of an extra callback for most users.

All non-internal events, even those created outside of QMP, will
consistently emit events.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 1477584421-1399-5-git-send-email-jsnow@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-11-01 07:55:57 -04:00
John Snow 47970dfb0a Replication/Blockjobs: Create replication jobs as internal
Bubble up the internal interface to commit and backup jobs, then switch
replication tasks over to using this methodology.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 1477584421-1399-4-git-send-email-jsnow@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-11-01 07:55:57 -04:00
John Snow f81e0b4532 blockjobs: Allow creating internal jobs
Add the ability to create jobs without an ID.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 1477584421-1399-3-git-send-email-jsnow@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-11-01 07:55:57 -04:00
Fam Zheng 6f13acf97e block: Turn on "unmap" in active commit
We already specified BDRV_O_UNMAP when opening images in 'qemu-img
commit', but didn't turn on the "unmap" in the active commit job. This
patch fixes that so that zeroed clusters in top image can be discarded
which is desired in the virt-sparsify use case, where a temporary
overlay is created and fstrim'ed before commiting back, to free space in
the original image.

This also enables it for block-commit.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1474974892-5031-1-git-send-email-famz@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-11-01 07:55:57 -04:00
Alberto Garcia f3ede4b05d block: Block all intermediate nodes in commit_active_start()
When block-commit is launched without the top parameter, it uses
internally a mirror block job. In that case all intermediate nodes
between the active and base nodes must be blocked as well.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-10-31 16:52:38 +01:00
Alberto Garcia cee3c6b5ca block: Use block_job_add_bdrv() in mirror_start_job()
Use block_job_add_bdrv() instead of blocking all operations in
mirror_start_job() and unblocking them in mirror_exit().

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-10-31 16:52:38 +01:00