spapr_kvm_type() is considering 'vm_type=NULL' as a valid input, where
the function returns 0. This is relying on the current QEMU machine
options handling logic, where the absence of the 'kvm-type' option
will be reflected as 'vm_type=NULL' in this function.
This is not robust, and will break if QEMU options code decides to propagate
something else in the case mentioned above (e.g. an empty string instead
of NULL).
Let's avoid this entirely by setting a non-NULL default value in case of
no user input for 'kvm-type'. spapr_kvm_type() was changed to handle 3 fixed
values of kvm-type: "auto", "hv", and "pr", with "auto" being the default
if no kvm-type was set by the user. This allows us to always be predictable
regardless of any enhancements/changes made in QEMU options mechanics.
While we're at it, let's also document in 'kvm-type' description the
already existing default mode, now named 'auto'. The information provided
about it is based on how the pseries kernel handles the KVM_CREATE_VM
ioctl(), where the default value '0' makes the kernel choose an available
KVM module to use, giving precedence to kvm_hv. This logic is described in
the kernel source file arch/powerpc/kvm/powerpc.c, function kvm_arch_init_vm().
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20201210145517.1532269-2-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Some functions in hw/ppc/spapr_events.c get a pointer to the machine
state using qdev_get_machine(). Convert them to get it from their
caller when possible.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20201209170052.1431440-6-groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
spapr_phb_realize() passes the sPAPR machine state as opaque data
for the I/O callbacks:
memory_region_init_io(&sphb->msiwindow, OBJECT(sphb), &spapr_msi_ops, spapr,
^^^^^
"msi", msi_window_size);
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20201209170052.1431440-5-groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This allows to drop a user of qdev_get_machine().
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20201209170052.1431440-4-groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Callers don't really need to know how 64-bit MMU model enums are
computed. Hide this in a helper.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20201209173536.1437351-3-groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The ppc_tr_init_disas_context() function currently checks whether the
MMU is 64-bit by ANDing its model type with POWERPC_MMU_64B. This is
wrong : POWERPC_MMU_64B isn't a mask, it is the generic MMU model for
pre-PowerISA-2.03 64-bit CPUs (ie. PowerPC 970 in QEMU).
Use POWERPC_MMU_64 instead of POWERPC_MMU_64B. This should fix a
potential bug with some 32-bit CPUs for which 'need_access_type'
was mis-computed because (POWERPC_MMU_32B & POWERPC_MMU_64B)
happens to be equal to 1. The end result being a crash in
ppc_hash32_direct_store() because the access type isn't set:
cpu_abort(cs, "ERROR: instruction should not need "
"address translation\n");
This doesn't change anything for 'lazy_tlb_flush' since POWERPC_MMU_32B
is checked first.
Fixes: 5f2a625452 ("ppc: Don't set access_type on all load/stores on hash64")
Signed-off-by: Stephane Duverger <stephane.duverger@free.fr>
[groug: - extended patch to address another misuse of POWERPC_MMU_64B
- updated title and changelog accordingly]
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20201209173536.1437351-2-groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
When running qom-test, a memory leak occurred in the ppce500_init function,
this patch free irqs array to fix it.
ASAN shows memory leak stack:
Direct leak of 40 byte(s) in 1 object(s) allocated from:
#0 0xfffc5ceee1f0 in __interceptor_calloc (/lib64/libasan.so.5+0xee1f0)
#1 0xfffc5c806800 in g_malloc0 (/lib64/libglib-2.0.so.0+0x56800)
#2 0xaaacf9999244 in ppce500_init qemu/hw/ppc/e500.c:859
#3 0xaaacf97434e8 in machine_run_board_init qemu/hw/core/machine.c:1134
#4 0xaaacf9c9475c in qemu_init qemu/softmmu/vl.c:4369
#5 0xaaacf94785a0 in main qemu/softmmu/main.c:49
Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Gan Qixin <ganqixin@huawei.com>
Message-Id: <20201204075822.359832-1-ganqixin@huawei.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Greg has agreed to be co-maintainer of the ppc target and machines.
This should avoid repeats of the problem we had in qemu-5.2 where a
last minute fix was needed while I was on holiday.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Greg Kurz <groug@kaod.org>
A guest with enough RAM, eg. 128G, is likely to detect savevm downtime
and to complain about stalled CPUs. This happens because we re-read
the timebase just before migrating it and we thus don't account for
all the time between VM stop and pre-save.
A very similar situation was already addressed for live migration of
paused guests (commit d14f339762). Extend the logic to do the same
with savevm.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1893787
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <160693010619.1111945.632640981169395440.stgit@bahia.lan>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This property has been deprecated since QEMU 5.0 by commit 22062e54bb.
We only kept a legacy hack that internally converts "compat" into the
official "max-cpu-compat" property of the pseries machine type.
According to our deprecation policy, we could have removed it for QEMU 5.2
already. Do it now ; since ppc_cpu_parse_featurestr() now just calls the
generic parent_parse_features handler, drop it as well.
Users are supposed to use the "max-cpu-compat" property of the pseries
machine type instead.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20201201131103.897430-1-groug@kaod.org>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
All users are passing &error_abort already. Document the fact
that spapr_drc_attach() should only be passed a free DRC, which
is supposedly the case if appropriate checking is done earlier.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20201201113728.885700-5-groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
spapr_core_pre_plug() already guarantees that the slot for the given core
ID is available. It is thus safe to assume that spapr_find_cpu_slot()
returns a slot during plug. Turn the error path into an assertion.
It is also safe to assume that no device is attached to the corresponding
DRC and that spapr_drc_attach() shouldn't fail.
Pass &error_abort to spapr_drc_attach() and simplify error handling.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20201201113728.885700-4-groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
When a CPU is hot-plugged, we set its compat mode to match the boot
CPU, which was either set by machine reset or by CAS. This is currently
handled in the plug handler after the core got realized. Potential errors
of ppc_set_compat() are propagated to the hot-plug logic.
Handling errors this late in the hot-plug sequence is generally frown
upon. Ideally, we should do sanity checks in a pre-plug handler and pass
&error_abort to ppc_set_compat() in the plug handler.
We can filter out some error cases of ppc_set_compat() by calling
ppc_check_compat() at pre-plug. But ppc_set_compat() also sets the
compat register in KVM, and KVM doesn't provide any API that would
allow to check valid compat mode settings beforehand.
However, at this point we know that the compat mode was already
successfully set for the boot CPU. Since this all boils down to
setting a register with the very same value that was valid
for the boot CPU, it should definitely not fail for hot-plugged
CPUS.
Pass &error_abort to ppc_set_compat().
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20201201113728.885700-3-groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This hack registers dummy VMState entries of ICPs in order to
support migration of old pseries machine types that used to
create all smp.max_cpus possible ICPs at machine init.
Part of the work is to unregister the dummy entries when plugging
an actual vCPU core, and to register them back when unplugging the
core. The code that unregisters the dummy ICPs in spapr_core_plug()
is misplaced: if ppc_set_compat() fails afterwards, the hotplug
operation will be cancelled and the dummy ICPs won't be registered
back since the unplug handler isn't called.
Unregister the dummy ICPs at the end of spapr_core_plug().
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20201201113728.885700-2-groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
I have been keeping those logging messages in an ugly form for
while. Make them clean !
Beware not to activate all of them, this is really verbose.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20201123163717.1368450-1-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The '%u' conversion specifier is for decimal notation.
When prefixing a format with '0x', we want the hexadecimal
specifier ('%x').
Inspired-by: Dov Murik <dovmurik@linux.vnet.ibm.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20201103112558.2554390-4-philmd@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Make the implementation match the lxvwsx one.
The code is now shorter smaller and potentially faster as the
translation will use the host SIMD capabilities if available.
No functional change.
Signed-off-by: Giuseppe Musacchio <thatlemon@gmail.com>
Message-Id: <a463dea379da4cb3a22de49c678932f74fb15dd7.1604912739.git.thatlemon@gmail.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The PowerISA reference states that the comparison operators update the
FPCC, CR and FPSCR and, if VE=1, jump to the exception handler.
Moving the exception-triggering code after the CC update sequence solves
the problem.
Signed-off-by: Giuseppe Musacchio <thatlemon@gmail.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20201112230130.65262-5-thatlemon@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Since we always perform a comparison between the two operands avoid
checking for NaN unless the result states they're unordered.
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Giuseppe Musacchio <thatlemon@gmail.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20201112230130.65262-4-thatlemon@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Giuseppe Musacchio <thatlemon@gmail.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20201112230130.65262-3-thatlemon@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
According to the PowerISA v3.1 reference, Table 68 "Actions for xscmpudp
- Part 1: Compare Unordered", whenever one of the two operands is a NaN
the SO bit is set while the other three bits are cleared.
Apply the same change to xscmpuqp.
The respective ordered counterparts are unaffected.
Signed-off-by: Giuseppe Musacchio <thatlemon@gmail.com>
Message-Id: <20201112230130.65262-2-thatlemon@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
When using -Wimplicit-fallthrough in our CFLAGS, the compiler showed warning:
hw/ppc/ppc.c: In function ‘ppc6xx_set_irq’:
hw/ppc/ppc.c:118:16: warning: this statement may fall through [-Wimplicit-fallthrough=]
118 | if (level) {
| ^
hw/ppc/ppc.c:123:9: note: here
123 | case PPC6xx_INPUT_INT:
| ^~~~
According to the discussion, a break statement needs to be added here.
Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Chen Qun <kuhn.chenqun@huawei.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20201116024810.2415819-7-kuhn.chenqun@huawei.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
When using -Wimplicit-fallthrough in our CFLAGS, the compiler showed warning:
target/ppc/mmu_helper.c: In function ‘dump_mmu’:
target/ppc/mmu_helper.c:1351:12: warning: this statement may fall through [-Wimplicit-fallthrough=]
1351 | if (ppc64_v3_radix(env_archcpu(env))) {
| ^
target/ppc/mmu_helper.c:1358:5: note: here
1358 | default:
| ^~~~~~~
Use "qemu_log_mask(LOG_UNIMP**)" instead of the TODO comment.
And add the break statement to fix it.
Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Chen Qun <kuhn.chenqun@huawei.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20201116024810.2415819-8-kuhn.chenqun@huawei.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
There can be only one TPM proxy at a time. This is currently
checked at plug time. But this can be detected at pre-plug in
order to error out earlier.
This allows to get rid of error handling in the plug handler.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20201120234208.683521-9-groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We currently detect that a PHB index is already in use at plug time.
But this can be decteted at pre-plug in order to error out earlier.
This allows to pass &error_abort to spapr_drc_attach() and to end
up with a plug handler that doesn't need to report errors anymore.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20201120234208.683521-8-groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Read documentation in "qapi/error.h" and changelog of commit
e3fe3988d7 ("error: Document Error API usage rules") for
rationale.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20201120234208.683521-7-groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Pre-plug of a memory device, be it an NVDIMM or a PC-DIMM, ensures
that the memory slot is available and that addresses don't overlap
with existing memory regions. The corresponding DRCs in the LMB
and PMEM namespaces are thus necessarily attachable at plug time.
Pass &error_abort to spapr_drc_attach() in spapr_add_lmbs() and
spapr_add_nvdimm(). This allows to greatly simplify error handling
on the plug path.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20201120234208.683521-3-groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The PHB acts as the hotplug handler for PCI devices. It does some
sanity checks on DR enablement, PCI bridge chassis numbers and
multifunction. These checks are currently performed at plug time,
but they would best sit in a pre-plug handler in order to error
out as early as possible.
Create a spapr_pci_pre_plug() handler and move all the checking
there. Add a check that the associated DRC doesn't already have
an attached device. This is equivalent to the slot availability
check performed by do_pci_register_device() upon realization of
the PCI device.
This allows to pass &error_abort to spapr_drc_attach() and to end
up with a plug handler that doesn't need to report errors anymore.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20201120234208.683521-2-groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Never used from the start.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20201120174646.619395-6-groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The sPAPR XIVE device is created by the machine in spapr_irq_init().
The latter overrides any value provided by the user with -global for
the "nr-irqs" and "nr-ends" properties with strictly positive values.
It seems reasonable to assume these properties should never be 0,
which wouldn't make much sense by the way.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20201120174646.619395-2-groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
I found that there are many spelling errors in the comments of qemu/target/m68k.
I used spellcheck to check the spelling errors and found some errors in the folder.
Signed-off-by: zhaolichang <zhaolichang@huawei.com>
Reviewed-by: David Edmondson <david.edmondson@oracle.com>
Reviewed-by: Philippe Mathieu-Daude<f4bug@amsat.org>
Reviewed-by: Laurent Vivier<laurent@vivier.eu>
Message-Id: <20201009064449.2336-9-zhaolichang@huawei.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
They are unused since the target has been converted to TCG.
Fixes: e1f3808e03 ("Convert m68k target to TCG.")
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20201022203000.1922749-2-laurent@vivier.eu>
The handling of the GLUE (General Logic Unit) device is
currently open-coded. Make this into a proper QOM device.
This minor piece of modernisation gets rid of the free
floating qemu_irq array 'pic', which Coverity points out
is technically leaked when we exit the machine init function.
(The replacement glue device is not leaked because it gets
added to the sysbus, so it's accessible via that.)
Fixes: Coverity CID 1421883
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Laurent vivier <laurent@vivier.eu>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20201106235109.7066-3-peter.maydell@linaro.org>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
The q800 board code connects both of the IRQ outputs of the ESCC
to the same pic[3] qemu_irq. Connecting two qemu_irqs outputs directly
to the same input is not valid as it produces subtly wrong behaviour
(for instance if both the IRQ lines are high, and then one goes
low, the PIC input will see this as a high-to-low transition
even though the second IRQ line should still be holding it high).
This kind of wiring needs an explicitly created OR gate; add one.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20201106235109.7066-2-peter.maydell@linaro.org>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
- Support for FUSE exports
- Fix deadlock in bdrv_co_yield_to_drain()
- Use lock guard macros
- Some preparational patches for 64 bit block layer
- file-posix: Fix request extension to INT64_MAX in raw_do_pwrite_zeroes()
-----BEGIN PGP SIGNATURE-----
iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAl/TpwsRHGt3b2xmQHJl
ZGhhdC5jb20ACgkQfwmycsiPL9alQxAAxBbtKSakFvRLRsJ/zoKSGF/8r/O7CSJ3
zvD3LetdRRn98y2rLIKde/82kMmdjF62+KAoz5QboOf/3FO+1QJfSDO16hDR4rDQ
iXLoJoMULNsugU0fJzJSbPt0CQ1din+zrr96VZcP5wkYEzRx3l7ANq0ZmD42C7rN
FNc4r7FTB3JuCb1QHS19Ql1XTJh50zWoO4CluQbxTv23YX5RrBoN0xCbmq67PCAs
nuJiA5D1J907nlh+xzfM2qOpuILfNULoch+dT0f0rYrmpANHzJX+VmjsTwQvkDdz
7CdVQsyXd1YmU0JMM41acQAe3WCxmIps2WaFRiVFRVrGWo29NJourb5OEYY7XN0T
DEKwLbQ9nXPLN6FnedwMBZ9r4FNThsu5tdjbW+MEw8xSukxfnUjLYiBKHroRckDv
7n5gBu9T9x3DI+SpFo1YVhdeS+W3eHtZeyYoP2nahuTbMjhRCYwpqEtrnMldAcFa
mYea9lqDeuAEPo5XBkQK7XFT8gpJp9CByr0jPkf/ZRa0TJhgA0TLmEdEEDScdk6M
RMkhvfcyOxVj30rw8yy/BjZ+boJ9eVAHkUUfbB0ZD9pcrYwMzEgw++aC1hHUGzxB
B1XPWk1ueOtGXPKG+6fvwVv9yrqW55eJ/8DSwjos+bvhzX6yGitpBiEFiHJIpcjY
e17e7M3ocf4=
=ISxg
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer patches:
- Support for FUSE exports
- Fix deadlock in bdrv_co_yield_to_drain()
- Use lock guard macros
- Some preparational patches for 64 bit block layer
- file-posix: Fix request extension to INT64_MAX in raw_do_pwrite_zeroes()
# gpg: Signature made Fri 11 Dec 2020 17:06:19 GMT
# gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6
# gpg: issuer "kwolf@redhat.com"
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6
* remotes/kevin/tags/for-upstream: (34 commits)
block: Fix deadlock in bdrv_co_yield_to_drain()
block: Fix locking in qmp_block_resize()
block: Simplify qmp_block_resize() error paths
block: introduce BDRV_MAX_LENGTH
block/io: bdrv_check_byte_request(): drop bdrv_is_inserted()
block/io: bdrv_refresh_limits(): use ERRP_GUARD
block/file-posix: fix workaround in raw_do_pwrite_zeroes()
can-host: Fix crash when 'canbus' property is not set
iotests/221: Discard image before qemu-img map
file-posix: check the use_lock before setting the file lock
iotests/308: Add test for FUSE exports
iotests: Enable fuse for many tests
iotests: Allow testing FUSE exports
iotests: Give access to the qemu-storage-daemon
storage-daemon: Call bdrv_close_all() on exit
iotests/287: Clean up subshell test image
iotests: Let _make_test_img guess $TEST_IMG_FILE
iotests: Restrict some Python tests to file
iotests/091: Use _cleanup_qemu instad of "wait"
iotests: Derive image names from $TEST_IMG
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
If bdrv_co_yield_to_drain() is called for draining a block node that
runs in a different AioContext, it keeps that AioContext locked while it
yields and schedules a BH in the AioContext to do the actual drain.
As long as executing the BH is the very next thing that the event loop
of the node's AioContext does, this actually happens to work, but when
it tries to execute something else that wants to take the AioContext
lock, it will deadlock. (In the bug report, this other thing is a
virtio-scsi device running virtio_scsi_data_plane_handle_cmd().)
Instead, always drop the AioContext lock across the yield and reacquire
it only when the coroutine is reentered. The BH needs to unconditionally
take the lock for itself now.
This fixes the 'block_resize' QMP command on a block node that runs in
an iothread.
Cc: qemu-stable@nongnu.org
Fixes: eb94b81a94
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1903511
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20201203172311.68232-4-kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The drain functions assume that we hold the AioContext lock of the
drained block node. Make sure to actually take the lock.
Cc: qemu-stable@nongnu.org
Fixes: eb94b81a94
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20201203172311.68232-3-kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The only thing that happens after the 'out:' label is blk_unref(blk).
However, blk = NULL in all of the error cases, so instead of jumping to
'out:', we can just return directly.
Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20201203172311.68232-2-kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We are going to modify block layer to work with 64bit requests. And
first step is moving to int64_t type for both offset and bytes
arguments in all block request related functions.
It's mostly safe (when widening signed or unsigned int to int64_t), but
switching from uint64_t is questionable.
So, let's first establish the set of requests we want to work with.
First signed int64_t should be enough, as off_t is signed anyway. Then,
obviously offset + bytes should not overflow.
And most interesting: (offset + bytes) being aligned up should not
overflow as well. Aligned to what alignment? First thing that comes in
mind is bs->bl.request_alignment, as we align up request to this
alignment. But there is another thing: look at
bdrv_mark_request_serialising(). It aligns request up to some given
alignment. And this parameter may be bdrv_get_cluster_size(), which is
often a lot greater than bs->bl.request_alignment.
Note also, that bdrv_mark_request_serialising() uses signed int64_t for
calculations. So, actually, we already depend on some restrictions.
Happily, bdrv_get_cluster_size() returns int and
bs->bl.request_alignment has 32bit unsigned type, but defined to be a
power of 2 less than INT_MAX. So, we may establish, that INT_MAX is
absolute maximum for any kind of alignment that may occur with the
request.
Note, that bdrv_get_cluster_size() is not documented to return power
of 2, still bdrv_mark_request_serialising() behaves like it is.
Also, backup uses bdi.cluster_size and is not prepared to it not being
power of 2.
So, let's establish that Qemu supports only power-of-2 clusters and
alignments.
So, alignment can't be greater than 2^30.
Finally to be safe with calculations, to not calculate different
maximums for different nodes (depending on cluster size and
request_alignment), let's simply set QEMU_ALIGN_DOWN(INT64_MAX, 2^30)
as absolute maximum bytes length for Qemu. Actually, it's not much less
than INT64_MAX.
OK, then, let's apply it to block/io.
Let's consider all block/io entry points of offset/bytes:
4 bytes/offset interface functions: bdrv_co_preadv_part(),
bdrv_co_pwritev_part(), bdrv_co_copy_range_internal() and
bdrv_co_pdiscard() and we check them all with bdrv_check_request().
We also have one entry point with only offset: bdrv_co_truncate().
Check the offset.
And one public structure: BdrvTrackedRequest. Happily, it has only
three external users:
file-posix.c: adopted by this patch
write-threshold.c: only read fields
test-write-threshold.c: sets obviously small constant values
Better is to make the structure private and add corresponding
interfaces.. Still it's not obvious what kind of interface is needed
for file-posix.c. Let's keep it public but add corresponding
assertions.
After this patch we'll convert functions in block/io.c to int64_t bytes
and offset parameters. We can assume that offset/bytes pair always
satisfy new restrictions, and make
corresponding assertions where needed. If we reach some offset/bytes
point in block/io.c missing bdrv_check_request() it is considered a
bug. As well, if block/io.c modifies a offset/bytes request, expanding
it more then aligning up to request_alignment, it's a bug too.
For all io requests except for discard we keep for now old restriction
of 32bit request length.
iotest 206 output error message changed, as now test disk size is
larger than new limit. Add one more test case with new maximum disk
size to cover too-big-L1 case.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201203222713.13507-5-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Move bdrv_is_inserted() calls into callers.
We are going to make bdrv_check_byte_request() a clean thing.
bdrv_is_inserted() is not about checking the request, it's about
checking the bs. So, it should be separate.
With this patch we probably change error path for some failure
scenarios. But depending on the fact that querying too big request on
empty cdrom (or corrupted qcow2 node with no drv) will result in EIO
and not ENOMEDIUM would be very strange. More over, we are going to
move to 64bit requests, so larger requests will be allowed anyway.
More over, keeping in mind that cdrom is the only driver that has
.bdrv_is_inserted() handler it's strange that we should care so much
about it in generic block layer, intuitively we should just do read and
write, and cdrom driver should return correct errors if it is not
inserted. But it's a work for another series.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201203222713.13507-4-vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This simplifies following commit.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201203222713.13507-3-vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We should not set overlap_bytes:
1. Don't worry: it is calculated by bdrv_mark_request_serialising() and
will be equal to or greater than bytes anyway.
2. If the request was already aligned up to some greater alignment,
than we may break things: we reduce overlap_bytes, and further
bdrv_mark_request_serialising() may not help, as it will not restore
old bigger alignment.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201203222713.13507-2-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Providing the 'if' property, but not 'canbus' segfaults like this:
#0 0x0000555555b0f14d in can_bus_insert_client (bus=0x0, client=0x555556aa9af0) at ../net/can/can_core.c:88
#1 0x00005555559c3803 in can_host_connect (ch=0x555556aa9ac0, errp=0x7fffffffd568) at ../net/can/can_host.c:62
#2 0x00005555559c386a in can_host_complete (uc=0x555556aa9ac0, errp=0x7fffffffd568) at ../net/can/can_host.c:72
#3 0x0000555555d52de9 in user_creatable_complete (uc=0x555556aa9ac0, errp=0x7fffffffd5c8) at ../qom/object_interfaces.c:23
Add the missing NULL check.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20201130105615.21799-5-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
See the new comment for why this should be done.
I do not have a reproducer on master, but when using FUSE block exports,
this test breaks depending on the underlying filesystem (for me, it
works on tmpfs, but fails on xfs, because the block allocated by
file-posix has 16 kB there instead of 4 kB).
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201207152245.66987-1-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The scenario is that when accessing a volume on an NFS filesystem
without supporting the file lock, Qemu will complain "Failed to lock
byte 100", even when setting the file.locking = off.
We should do file lock related operations only when the file.locking is
enabled, otherwise, the syscall of 'fcntl' will return non-zero.
Signed-off-by: Li Feng <fengli@smartx.com>
Message-Id: <1607341446-85506-1-git-send-email-fengli@smartx.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We have good coverage of the normal I/O paths now, but what remains is a
test that tests some more special cases: Exporting an image on itself
(thus turning a formatted image into a raw one), some error cases, and
non-writable and non-growable exports.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20201027190600.192171-21-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>