The paravirtualized PAPR platform sometimes needs to restrict the guest to
using only some of the page sizes actually supported by the host's MMU.
At the moment this is handled in KVM specific code, but for consistency we
want to apply the same limitations to all accelerators.
This makes a start on this by providing a helper function in the cpu code
to allow platform code to remove some of the cpu's page size definitions
via a caller supplied callback.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
The way we used to handle KVM allowable guest pagesizes for PAPR guests
required some convoluted checking of memory attached to the guest.
The allowable pagesizes advertised to the guest cpus depended on the memory
which was attached at boot, but then we needed to ensure that any memory
later hotplugged didn't change which pagesizes were allowed.
Now that we have an explicit machine option to control the allowable
maximum pagesize we can simplify this. We just check all memory backends
against that declared pagesize. We check base and cold-plugged memory at
reset time, and hotplugged memory at pre_plug() time.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
The way the POWER Hash Page Table (HPT) MMU is virtualized by KVM HV means
that every page that the guest puts in the pagetables must be truly
physically contiguous, not just GPA-contiguous. In effect this means that
an HPT guest can't use any pagesizes greater than the host page size used
to back its memory.
At present we handle this by changing what we advertise to the guest based
on the backing pagesizes. This is pretty bad, because it means the guest
sees a different environment depending on what should be host configuration
details.
As a start on fixing this, we add a new capability parameter to the
pseries machine type which gives the maximum allowed pagesizes for an
HPT guest. For now we just create and validate the parameter without
making it do anything.
For backwards compatibility, on older machine types we set it to the max
available page size for the host. For the 3.0 machine type, we fix it to
16, the intention being to only allow HPT pagesizes up to 64kiB by default
in future.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
The changes are:
1. fixed broken_sc1;
2. added switching between boot consoles;
3. added PXE boot.
The full list is:
> lib/libnet/pxelinux: Fix two off-by-one bugs in the pxelinux.cfg parser
> lib/libnet/pxelinux: Make the size handling for pxelinux_load_cfg more logical
> libc: Add a simple implementation of an assert() function
> libnet: Support UUID-based pxelinux.cfg file names
> slof: Add a helper function to get the contents of a property in C code
> libnet: Add support for DHCPv4 options 209 and 210
> libnet: Wire up pxelinux.cfg network booting
> libnet: Add functions for downloading and parsing pxelinux.cfg files
> libnet: Put code for determing TFTP error strings into a separate function
> libc: Add the snprintf() function
> libnet: Pass ip_version via struct filename_ip
> resolve ihandle and xt handle in the input command (like for the output)
> Fix output word
> obp-tftp: Make sure to not overwrite paflof in memory
> libnet: Get rid of unused huge_load and block_size parameters
> libc: Check for NULL pointers in free()
> libc: Implement strrchr()
> libnet: Get rid of unnecessary (char *) casts
> broken_sc1: check for H_PRIVILEGE
> OF: Use new property "stdout-path" for boot console
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
According to PPC440 User Manual PPC440 has multiple opcodes for icbt
instruction: one for compatibility with older cores and two 440
specific opcodes one of which is defined in BookE. QEMU only
implements two of these, add the missing one.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
As well as being able to generate its own i2c transactions, the ppc4xx
i2c controller has a DIRECTCNTL register which allows explicit control
of the i2c lines.
Using this register an OS can directly bitbang i2c operations. In
order to let emulated i2c devices respond to this, we need to wire up
the DIRECTCNTL register to qemu's bitbanged i2c handling code.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We don't emulate slave mode so related registers are not needed.
[lh]sadr are only retained to avoid too many warnings and simplify
debugging but sdata is not even correct because device has a 4 byte
FIFO instead so just remove this unimplemented register for now.
The intr register is also not implemented correctly, it is for
diagnostics and normally not even visible on device without explicitly
enabling it. As no guests are known to need this remove it as well.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
According to the sm501 specs the hardware cursor colors are to be given in
the rgb565 format, but the code currently interprets them as bgr565.
Therefore, the colors of the hardware cursors are wrong in the QEMU
display, e.g., the standard mouse pointer of AmigaOS appears blue instead
of red. This change fixes this issue by replacing the existing naive
bgr565 => rgb888 conversion with a standard rgb565 => rgb888 one that also
scales the color component values properly.
Signed-off-by: Sebastian Bauer <mail@sebastianbauer.info>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Fix the helper_fpscr_clrbit() function so it correctly sets the FEX
and VX bits.
Determining the value for the Floating Point Status and Control
Register's (FPSCR) FEX bit is suppose to be done like this:
FEX = (VX & VE) | (OX & OE) | (UX & UE) | (ZX & ZE) | (XX & XE))
It is described as "the logical OR of all the floating-point exception
bits masked by their respective enable bits". It was not implemented
correctly. The value of FEX would stay on even when all other bits
were set to off.
The VX bit is described as "the logical OR of all of the invalid
operation exceptions". This bit was also not implemented correctly. It
too would stay on when all the other bits were set to off.
My main source of information is an IBM document called:
PowerPC Microprocessor Family:
The Programming Environments for 32-Bit Microprocessors
Page 62 is where the FPSCR information is located.
This is an older copy than the one I use but it is still very useful:
https://www.pdfdrive.net/powerpc-microprocessor-family-the-programming-environments-for-32-e3087633.html
I use a G3 and G5 iMac to compare bit values with QEMU. This patch
fixed all the problems I was having with these bits.
Signed-off-by: John Arbuckle <programmingkidx@gmail.com>
[dwg: Re-wrapped commit message]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
spapr_irq_alloc_block and spapr_irq_alloc() are now deprecated.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Today, when a device requests for IRQ number in a sPAPR machine, the
spapr_irq_alloc() routine first scans the ICSState status array to
find an empty slot and then performs the assignement of the selected
numbers. Split this sequence in two distinct routines : spapr_irq_find()
for lookups and spapr_irq_claim() for claiming the IRQ numbers.
This will ease the introduction of a static layout of IRQ numbers.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
KVM HV has a restriction that for HPT mode guests, guest pages must be hpa
contiguous as well as gpa contiguous. We have to account for that in
various places. We determine whether we're subject to this restriction
from the SMMU information exposed by KVM.
Planned cleanups to the way we handle this will require knowing whether
this restriction is in play in wider parts of the code. So, expose a
helper function which returns it.
This does mean some redundant calls to kvm_get_smmu_info(), but they'll go
away again with future cleanups.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
spapr capabilities have an apply hook to actually activate (or deactivate)
the feature in the system at reset time. However, a number of capabilities
affect the setup of cpus, and need to be applied to each of them -
including hotplugged cpus for extra complication. To make this simpler,
add an optional cpu_apply hook that is called from spapr_cpu_reset().
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Previously, the effective values of the various spapr capability flags
were only determined at machine reset time. That was a lazy way of making
sure it was after cpu initialization so it could use the cpu object to
inform the defaults.
But we've now improved the compat checking code so that we don't need to
instantiate the cpus to use it. That lets us move the resolution of the
capability defaults much earlier.
This is going to be necessary for some future capabilities.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
ppc_check_compat() is used in a number of places to check if a cpu object
supports a certain compatiblity mode, subject to various constraints.
It takes a PowerPCCPU *, however it really only depends on the cpu's class.
We have upcoming cases where it would be useful to make compatibility
checks before we fully instantiate the cpu objects.
ppc_type_check_compat() will now make an equivalent check, but based on a
CPU's QOM typename instead of an instantiated CPU object.
We make use of the new interface in several places in spapr, where we're
essentially making a global check, rather than one specific to a particular
cpu. This avoids some ugly uses of first_cpu to grab a "representative"
instance.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
The device tree node of the ISA bus was being partially done in
different places. Move all the nodes creation under the same routine.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
It introduces a base PnvChip class from which the specific processor
chip classes, Pnv8Chip and Pnv9Chip, inherit. Each of them needs to
define an init and a realize routine which will create the controllers
of the target processor. For the moment, the base PnvChip class
handles the XSCOM bus and the cores.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
QEMU implements the "Shared Processor LPAR" (SPLPAR) option, which allows
the hypervisor to time-slice a physical processor into multiple virtual
processor. The intent is to allow more guests to run, and to optimize
processor utilization.
The guest OS can cede idle VCPUs, so that their processing capacity may
be used by other VCPUs, with the H_CEDE hcall. The guest OS can also
optimize spinlocks, by confering the time-slice of a spinning VCPU to the
spinlock holder if it's currently notrunning, with the H_CONFER hcall.
Both hcalls depend on a "Virtual Processor Area" (VPA) to be registered
by the guest OS, generally during early boot. Other per-VCPU areas can
be registered: the "SLB Shadow Buffer" which allows a more efficient
dispatching of VCPUs, and the "Dispatch Trace Log Buffer" (DTL) which
is used to compute time stolen by the hypervisor. Both DTL and SLB Shadow
areas depend on the VPA to be registered.
The VPA/SLB Shadow/DTL are state that QEMU should migrate, but this doesn't
happen, for no apparent reason other than it was just never coded. This
causes the features listed above to stop working after migration, and it
breaks the logic of the H_REGISTER_VPA hcall in the destination.
The VPA is set at the guest request, ie, we don't have to migrate
it before the guest has actually set it. This patch hence adds an
"spapr_cpu/vpa" subsection to the recently introduced per-CPU machine
data migration stream.
Since DTL and SLB Shadow are optional and both depend on VPA, they get
their own subsections "spapr_cpu/vpa/slb_shadow" and "spapr_cpu/vpa/dtl"
hanging from the "spapr_cpu/vpa" subsection.
Note that this won't break migration to older QEMUs. Is is already handled
by only registering the vmstate handler for per-CPU data with newer machine
types.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
A per-CPU machine data pointer was recently added to PowerPCCPU. The
motivation is to to hide platform specific details from the core CPU
code. This per-CPU data can hold state which is relevant to the guest
though, eg, Virtual Processor Areas, and we should migrate this state.
This patch adds the plumbing so that we can migrate the per-CPU data
for PAPR guests. We only do this for newer machine types for the sake
of backward compatibility. No state is migrated for the moment: the
vmstate_spapr_cpu_state structure will be populated by subsequent
patches.
Signed-off-by: Greg Kurz <groug@kaod.org>
[dwg: Fix some trivial spelling and spacing errors]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This moves the details of the ISA bus creation under the LPC model but
more important, the new PnvChip operation will let us choose the chip
class to use when we introduce the different chip classes for Power9
and Power8. It hides away the processor chip controllers from the
machine.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
On Power9, the thread interrupt presenter has a different type and is
linked to the chip owning the cores.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
- accommodate guests using vfio-ccw without specifying unlimited
prefetch, but actually working fine
- add cpu model for the z14 Model ZR1
- add support for pxelinux.cfg-style network booting to the s390x
firmware
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEw9DWbcNiT/aowBjO3s9rk8bwL68FAlsozdISHGNvaHVja0By
ZWRoYXQuY29tAAoJEN7Pa5PG8C+vafQQAIUl+TPrwXQQCp9xBy+CpMAAQtPp0PI2
DGisaVohLQARfNorqDXqPWMy9ulbYQ1bXcdpbhVwzIHlBZopMAvC+QTRwqs4EIFw
0vjvm5ttC2b3zt+7BAom2xWqaxmSen02O0Qj2l5HpFBVp+5afT74geiWhmmirWbw
x/JEEnlOcP2dHAvujfbv9Slst1B/OQ4xz7gf5IOlfX2MCsDcIyvqrw6a2D5G4PlX
DHPZ7PzA6edoG+qP1DvH6Myi19zs3edAKJp5sdCK92FgJBxW6nI1+ThoN3ApbuDm
U7HYeLzrQ/dXWiHMwA+2QP5Iqe/VfKCvqVcm9+m458dQLiN8ZiyGmUbbZtL0L0X8
KQB70mSQjawWRVLr1vsvZNAnCeqjT2hZz37EAwA3gGOeIfFjxFLNtBu4WKgt4soG
7Uq99irLBkfJKY56gQUc0LYOKhTMlDk3Z3xzv2Y2q29T+Cp0927ruHQdRFo3xQAu
kBc1pIjt2Rfuh2EJkyoprMuM/e5E8OZZv1uLbqUOYKxIHCHp05wqNEva5WOhS5Ah
05Un03Vw/CVWnXuNSYR+eKrMydIZALA+GTpQYZuiAo8+dT9R3dmTHjQRqOdck3lB
BU1MzU3kfflmmToi4Pto1PtNPeAaCWMcFNLOgky3L+Wfx2p4WQxYZP2g42be0K4U
m4G6RXHStHCy
=hMax
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20180619' into staging
- cleanup in virtio-ccw
- accommodate guests using vfio-ccw without specifying unlimited
prefetch, but actually working fine
- add cpu model for the z14 Model ZR1
- add support for pxelinux.cfg-style network booting to the s390x
firmware
# gpg: Signature made Tue 19 Jun 2018 10:33:06 BST
# gpg: using RSA key DECF6B93C6F02FAF
# gpg: Good signature from "Cornelia Huck <conny@cornelia-huck.de>"
# gpg: aka "Cornelia Huck <huckc@linux.vnet.ibm.com>"
# gpg: aka "Cornelia Huck <cornelia.huck@de.ibm.com>"
# gpg: aka "Cornelia Huck <cohuck@kernel.org>"
# gpg: aka "Cornelia Huck <cohuck@redhat.com>"
# Primary key fingerprint: C3D0 D66D C362 4FF6 A8C0 18CE DECF 6B93 C6F0 2FAF
* remotes/cohuck/tags/s390x-20180619:
pc-bios/s390-ccw: Update the s390-netboot.img binary
pc-bios/s390-ccw: Optimize the s390-netboot.img for size
pc-bios/s390-ccw/net: Try to load pxelinux.cfg file accoring to the UUID
pc-bios/s390-ccw/net: Add support for pxelinux-style config files
pc-bios/s390-ccw/net: Update code for the latest changes in SLOF
roms: Update SLOF submodule to current status
pc-bios/s390-ccw: define loadparm length
s390x/cpumodels: add z14 Model ZR1
s390x/ipl: Try to detect Linux vs non Linux for initial IPL PSW
vfio-ccw: remove orb.c64 (64 bit data addresses) check
vfio-ccw: add force unlimited prefetch property
virtio-ccw: clean up notify
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Next batch of ppc and spapr related patches for the 3.0 release.
* Improved handling of Spectre/Meltdown mitigations for POWER8
* Numerous Mac machine type cleanups and improvements
* Cleanup to cpu realize/unrealize path for spapr
* Create a place for machine-specific per-cpu information, and
start moving some things to it
* Assorted bugfixes
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAlsnLIUACgkQbDjKyiDZ
s5KrEQ//dzJthAN4beCbBKdeY6sOTsmOTMbrKAZyH0zMZBgn1QyfghwqY0iGmbDA
zDJQM62Q4hVUtELWslROqHbdVAo3wa8w3mOddqGeVr3Jla6VGpJxNzJVUgGgGDK0
OoInkljzqnlOGMKn09aW2v21dBbiLakplNDJOCJXw1ZKg75JYgAmo1no/1lgbAUQ
9qBGbGA5wswUK/o/cBatqG180BWDmRLiHyWWAMJzQrvAuz4WvkhRhkBecA14OK2E
Iw3l0LJ8acd4zKVeUhh/FOlsViRi/lTKYvDte14SIADEfvpBW77Ts49NG/vGJFHu
OOEhsdOBIIrMi05lUkeAqGJuECw0Y8ADBfmeB9Z03QeBCio45CFEifPSsFvmi4k9
RpRUHOOSJR0QOB9vp8Ryy68krFcX0NDuH7YnOn3hTFoJiiUC/SNs1O+3CYL0fYdm
IgdN3SoFWJ9vJciBGZuUwBxulVT5z6JoPfAzi8Gz9euebnhcCwBP2gqpUICon0SG
ELnSNRgnTp1GNbAOFb23nItJeyRtf0BisBuNR34Kv4OWUvWEKBE0BRsPi+VQnlxB
PzUgJpibfdMBBqsotNAt9zefQRamrphl12MMR96+melWPfNLsFz+sxMUNji+50lG
rLVgk1/ySEynuU6g3rYegUZSW9W7Iq/5e7CKQ5Iv9IV219f+reg=
=CAxa
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-3.0-20180618' into staging
ppc patch queue 2018-06-18
Next batch of ppc and spapr related patches for the 3.0 release.
* Improved handling of Spectre/Meltdown mitigations for POWER8
* Numerous Mac machine type cleanups and improvements
* Cleanup to cpu realize/unrealize path for spapr
* Create a place for machine-specific per-cpu information, and
start moving some things to it
* Assorted bugfixes
# gpg: Signature made Mon 18 Jun 2018 04:52:37 BST
# gpg: using RSA key 6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>"
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dgibson/tags/ppc-for-3.0-20180618: (28 commits)
spapr: fix xics_system_init() error path
target/ppc, spapr: Move VPA information to machine_data
ppc/pnv: introduce a pnv_chip_core_realize() routine
spapr_cpu_core: introduce spapr_create_vcpu()
spapr_cpu_core: add missing rollback on realization path
spapr_cpu_core: fix potential leak in spapr_cpu_core_realize()
spapr_cpu_core: convert last snprintf() to g_strdup_printf()
pnv: Add cpu unrealize path
pnv: Clean up cpu realize path
pnv_core: Allocate cpu thread objects individually
pnv: Fix some error handling cpu realize()
spapr: Clean up cpu realize/unrealize paths
sm501: Do not clear read only bits when writing registers
mos6522: expose mos6522_update_irq() through MOS6522DeviceClass
mos6522: remove additional interrupt flag filter from mos6522_update_irq()
mos6522: only clear the shift register interrupt upon write
xics_kvm: fix a build break
mac_newworld: add PMU device
adb: add property to disable direct reg 3 writes
adb: fix read reg 3 byte ordering
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This patch allows the user to specify whether to use active or only
background mode for mirror block jobs. Currently, this setting will
remain constant for the duration of the entire block job.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20180613181823.13618-14-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
This patch implements active synchronous mirroring. In active mode, the
passive mechanism will still be in place and is used to copy all
initially dirty clusters off the source disk; but every write request
will write data both to the source and the target disk, so the source
cannot be dirtied faster than data is mirrored to the target. Also,
once the block job has converged (BLOCK_JOB_READY sent), source and
target are guaranteed to stay in sync (unless an error occurs).
Active mode is completely optional and currently disabled at runtime. A
later patch will add a way for users to enable it.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20180613181823.13618-13-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180613181823.13618-12-mreitz@redhat.com
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
This will allow us to access the block job data when the mirror block
driver becomes more complex.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20180613181823.13618-11-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
This new function allows to look for a consecutively dirty area in a
dirty bitmap.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20180613181823.13618-10-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Add a function that wraps hbitmap_iter_next() and always calls it in
non-advancing mode first, and in advancing mode next. The result should
always be the same.
By using this function everywhere we called hbitmap_iter_next() before,
we should get good test coverage for non-advancing hbitmap_iter_next().
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20180613181823.13618-9-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
This new parameter allows the caller to just query the next dirty
position without moving the iterator.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20180613181823.13618-8-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Currently, bdrv_replace_node() refuses to create loops from one BDS to
itself if the BDS to be replaced is the backing node of the BDS to
replace it: Say there is a node A and a node B. Replacing B by A means
making all references to B point to A. If B is a child of A (i.e. A has
a reference to B), that would mean we would have to make this reference
point to A itself -- so we'd create a loop.
bdrv_replace_node() (through should_update_child()) refuses to do so if
B is the backing node of A. There is no reason why we should create
loops if B is not the backing node of A, though. The BDS graph should
never contain loops, so we should always refuse to create them.
If B is a child of A and B is to be replaced by A, we should simply
leave B in place there because it is the most sensible choice.
A more specific argument would be: Putting filter drivers into the BDS
graph is basically the same as appending an overlay to a backing chain.
But the main child BDS of a filter driver is not "backing" but "file",
so restricting the no-loop rule to backing nodes would fail here.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20180613181823.13618-7-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
With this, the mirror_top_bs is no longer just a technically required
node in the BDS graph but actually represents the block job operation.
Also, drop MirrorBlockJob.source, as we can reach it through
mirror_top_bs->backing.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20180613181823.13618-6-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
This patch makes the mirror code differentiate between simply waiting
for any operation to complete (mirror_wait_for_free_in_flight_slot())
and specifically waiting for all operations touching a certain range of
the virtual disk to complete (mirror_wait_on_conflicts()).
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20180613181823.13618-5-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Attach a CoQueue to each in-flight operation so if we need to wait for
any we can use it to wait instead of just blindly yielding and hoping
for some operation to wake us.
A later patch will use this infrastructure to allow requests accessing
the same area of the virtual disk to specifically wait for each other.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20180613181823.13618-4-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
In order to talk to the source BDS (and maybe in the future to the
target BDS as well) directly, we need to convert our existing AIO
requests into coroutine I/O requests.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20180613181823.13618-3-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
When converting mirror's I/O to coroutines, we are going to need a point
where these coroutines are created. mirror_perform() is going to be
that point.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20180613181823.13618-2-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Introduce a new global big lock for mon_fdsets. Take it where needed.
The monitor_fdset_get_fd() handling is a bit tricky: now we need to call
qemu_mutex_unlock() which might pollute errno, so we need to make sure
the correct errno be passed up to the callers. To make things simpler,
we let monitor_fdset_get_fd() return the -errno directly when error
happens, then in qemu_open() we move it back into errno.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20180608035511.7439-8-peterx@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>