Calls to the KVM XIVE device are guarded by kvm_irqchip_in_kernel(). This
ensures that QEMU won't try to use the device if KVM is disabled or if
an in-kernel irqchip isn't required.
When using ic-mode=dual with the pseries machine, we have two possible
interrupt controllers: XIVE and XICS. The kvm_irqchip_in_kernel() helper
will return true as soon as any of the KVM device is created. It might
lure QEMU to think that the other one is also around, while it is not.
This is exactly what happens with ic-mode=dual at machine init when
claiming IRQ numbers, which must be done on all possible IRQ backends,
eg. RTAS event sources or the PHB0 LSI table : only the KVM XICS device
is active but we end up calling kvmppc_xive_source_reset_one() anyway,
which fails. This doesn't cause any trouble because of another bug :
kvmppc_xive_source_reset_one() lacks an error_setg() and callers don't
see the failure.
Most of the other kvmppc_xive_* functions have similar xive->fd
checks to filter out the case when KVM XIVE isn't active. It
might look safer to have idempotent functions but it doesn't
really help to understand what's going on when debugging.
Since we already have all the kvm_irqchip_in_kernel() in place,
also have the callers to check xive->fd as well before calling
KVM XIVE specific code. This is straight-forward for the spapr
specific XIVE code. Some more care is needed for the platform
agnostic XIVE code since it cannot access xive->fd directly.
Introduce new in_kernel() methods in some base XIVE classes
for this purpose and implement them only in spapr.
In all cases, we still need to call kvm_irqchip_in_kernel() so that
compilers can optimize the kvmppc_xive_* calls away when CONFIG_KVM
isn't defined, thus avoiding the need for stubs.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <159679993438.876294.7285654331498605426.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Depending on whether XIVE is emultated or backed with a KVM XIVE device,
the ESB MMIOs of a XIVE source point to an I/O memory region or a mapped
memory region.
This is currently handled by checking kvm_irqchip_in_kernel() returns
false in xive_source_realize(). This is a bit awkward as we usually
need to do extra things when we're using the in-kernel backend, not
less. But most important, we can do better: turn the existing "xive.esb"
memory region into a plain container, introduce an "xive.esb-emulated"
I/O subregion and rename the existing "xive.esb" subregion in the KVM
code to "xive.esb-kvm". Since "xive.esb-kvm" is added with overlap
and a higher priority, it prevails over "xive.esb-emulated" (ie.
a guest using KVM XIVE will interact with "xive.esb-kvm" instead of
the default "xive.esb-emulated" region.
While here, consolidate the computation of the MMIO region size in
a common helper.
Suggested-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <159679992680.876294.7520540158586170894.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Add PPC2_FEATURE2_ARCH_3_10 to the PowerPC AT_HWCAP2 definitions.
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Message-Id: <20200724045845.89976-2-ljp@linux.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
CONFIG_XEN is generated by configure and stored in "config-target.h",
which is (obviously) only include for target-specific objects.
This is a problem for target-agnostic objects as CONFIG_XEN is never
defined and xen_enabled() is always inlined as 'false'.
Fix by following the KVM schema, defining CONFIG_XEN_IS_POSSIBLE
when we don't know to force the call of the non-inlined function,
returning the xen_allowed boolean.
Fixes: da278d58a0 ("accel: Move Xen accelerator code under accel/xen/")
Reported-by: Paul Durrant <pdurrant@amazon.com>
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Message-Id: <20200804074930.13104-2-philmd@redhat.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
The NVIC provides an outbound qemu_irq "SYSRESETREQ" which it signals
when the guest sets the SYSRESETREQ bit in the AIRCR register. This
matches the hardware design (where the CPU has a signal of this name
and it is up to the SoC to connect that up to an actual reset
mechanism), but in QEMU it mostly results in duplicated code in SoC
objects and bugs where SoC model implementors forget to wire up the
SYSRESETREQ line.
Provide a default behaviour for the case where SYSRESETREQ is not
actually connected to anything: use qemu_system_reset_request() to
perform a system reset. This will allow us to remove the
implementations of SYSRESETREQ handling from the boards where that's
exactly what it does, and also fixes the bugs in the board models
which forgot to wire up the signal:
* microbit
* mps2-an385
* mps2-an505
* mps2-an511
* mps2-an521
* musca-a
* musca-b1
* netduino
* netduinoplus2
We still allow the board to wire up the signal if it needs to, in case
we need to model more complicated reset controller logic or to model
buggy SoC hardware which forgot to wire up the line itself. But
defaulting to "reset the system" is more often going to be correct
than defaulting to "do nothing".
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-id: 20200728103744.6909-3-peter.maydell@linaro.org
Mostly devices don't need to care whether one of their output
qemu_irq lines is connected, because functions like qemu_set_irq()
silently do nothing if there is nothing on the other end. However
sometimes a device might want to implement default behaviour for the
case where the machine hasn't wired the line up to anywhere.
Provide a function qemu_irq_is_connected() that devices can use for
this purpose. (The test is trivial but encapsulating it in a
function makes it easier to see where we're doing it in case we need
to change the implementation later.)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-id: 20200728103744.6909-2-peter.maydell@linaro.org
Minor bugfixes all over the places, including one CVE.
Additionally, a fix for an ancient bug in migration -
one has to wonder how come no one noticed.
The fix is also non-trivial since we dare not break all
existing machine types with pci - we have a work around
in the works, for now we just skip the work-around for
old machine types.
Great job by Hogan Wang noticing, debugging and fixing it,
and thanks to Dr. David Alan Gilbert for reviewing the patches.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAl8e9CIPHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRpsAIH/2EEq9rLpjqMJdzRvjq3/UAHsvm42zeTnJl7
81cM887Mrg2Nd7MXFoxurLK5UEehTzlD2DRTvaDFfJaJlrtkPM2QEU2X/6c3syAS
GbmOQaljQtR4zEFE81t84mZQS025Gp0s+uble7KvtXakgp1A/vdu93OEvJkhtRY8
JBdRMlTt2T0eizvHn1obBKjaQN7tAUKl5KagHWxP1ApGU0YibUbrBadpJ18ZcKMl
vwB3dwmoi4f7AjuC0GnxYKp7kC/MMhUPFoDxQKI7d+wMGFnbsAF4sBIN9EZKeOkv
xT2InNSAzk/PTSuQpnDnZQjmrf4dPuL/GNJ8vQk27eaFfVchJyc=
=Bu6o
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
virtio,pci: bugfixes
Minor bugfixes all over the places, including one CVE.
Additionally, a fix for an ancient bug in migration -
one has to wonder how come no one noticed.
The fix is also non-trivial since we dare not break all
existing machine types with pci - we have a work around
in the works, for now we just skip the work-around for
old machine types.
Great job by Hogan Wang noticing, debugging and fixing it,
and thanks to Dr. David Alan Gilbert for reviewing the patches.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# gpg: Signature made Mon 27 Jul 2020 16:34:58 BST
# gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg: issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full]
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full]
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* remotes/mst/tags/for_upstream:
virtio-pci: fix virtio_pci_queue_enabled()
MAINTAINERS: Cover the firmware JSON schema
vhost-vdpa :Fix Coverity CID 1430270 / CID 1420267
libvhost-user: Report descriptor index on panic
Fix vhost-user buffer over-read on ram hot-unplug
hw/pci-host: save/restore pci host config register
virtio-mem-pci: force virtio version 1
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
In legacy mode, virtio_pci_queue_enabled() falls back to
virtio_queue_enabled() to know if the queue is enabled.
But virtio_queue_enabled() calls again virtio_pci_queue_enabled()
if k->queue_enabled is set. This ends in a crash after a stack
overflow.
The problem can be reproduced with
"-device virtio-net-pci,disable-legacy=off,disable-modern=true
-net tap,vhost=on"
And a look to the backtrace is very explicit:
...
#4 0x000000010029a438 in virtio_queue_enabled ()
#5 0x0000000100497a9c in virtio_pci_queue_enabled ()
...
#130902 0x000000010029a460 in virtio_queue_enabled ()
#130903 0x0000000100497a9c in virtio_pci_queue_enabled ()
#130904 0x000000010029a460 in virtio_queue_enabled ()
#130905 0x0000000100454a20 in vhost_net_start ()
...
This patch fixes the problem by introducing a new function
for the legacy case and calls it from virtio_pci_queue_enabled().
It also calls it from virtio_queue_enabled() to avoid code duplication.
Fixes: f19bcdfedd ("virtio-pci: implement queue_enabled method")
Cc: Jason Wang <jasowang@redhat.com>
Cc: Cindy Lu <lulu@redhat.com>
CC: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-Id: <20200727153319.43716-1-lvivier@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The pci host config register is used to save PCI address for
read/write config data. If guest writes a value to config register,
and then QEMU pauses the vcpu to migrate, after the migration, the guest
will continue to write pci config data, and the write data will be ignored
because of new qemu process losing the config register state.
To trigger the bug:
1. guest is booting in seabios.
2. guest enables the SMRAM in seabios:piix4_apmc_smm_setup, and then
expects to disable the SMRAM by pci_config_writeb.
3. after guest writes the pci host config register, QEMU pauses vcpu
to finish migration.
4. guest write of config data(0x0A) fails to disable the SMRAM because
the config register state is lost.
5. guest continues to boot and crashes in ipxe option ROM due to SMRAM
in enabled state.
Example Reproducer:
step 1. Make modifications to seabios and qemu for increase reproduction
efficiency, write 0xf0 to 0x402 port notify qemu to stop vcpu after
0x0cf8 port wrote i440 configure register. qemu stop vcpu when catch
0x402 port wrote 0xf0.
seabios:/src/hw/pci.c
@@ -52,6 +52,11 @@ void pci_config_writeb(u16 bdf, u32 addr, u8 val)
writeb(mmconfig_addr(bdf, addr), val);
} else {
outl(ioconfig_cmd(bdf, addr), PORT_PCI_CMD);
+ if (bdf == 0 && addr == 0x72 && val == 0xa) {
+ dprintf(1, "stop vcpu\n");
+ outb(0xf0, 0x402); // notify qemu to stop vcpu
+ dprintf(1, "resume vcpu\n");
+ }
outb(val, PORT_PCI_DATA + (addr & 3));
}
}
qemu:hw/char/debugcon.c
@@ -60,6 +61,9 @@ static void debugcon_ioport_write(void *opaque, hwaddr addr, uint64_t val,
printf(" [debugcon: write addr=0x%04" HWADDR_PRIx " val=0x%02" PRIx64 "]\n", addr, val);
#endif
+ if (ch == 0xf0) {
+ vm_stop(RUN_STATE_PAUSED);
+ }
/* XXX this blocks entire thread. Rewrite to use
* qemu_chr_fe_write and background I/O callbacks */
qemu_chr_fe_write_all(&s->chr, &ch, 1);
step 2. start vm1 by the following command line, and then vm stopped.
$ qemu-system-x86_64 -machine pc-i440fx-5.0,accel=kvm\
-netdev tap,ifname=tap-test,id=hostnet0,vhost=on,downscript=no,script=no\
-device virtio-net-pci,netdev=hostnet0,id=net0,bus=pci.0,addr=0x13,bootindex=3\
-device cirrus-vga,id=video0,vgamem_mb=16,bus=pci.0,addr=0x2\
-chardev file,id=seabios,path=/var/log/test.seabios,append=on\
-device isa-debugcon,iobase=0x402,chardev=seabios\
-monitor stdio
step 3. start vm2 to accept vm1 state.
$ qemu-system-x86_64 -machine pc-i440fx-5.0,accel=kvm\
-netdev tap,ifname=tap-test1,id=hostnet0,vhost=on,downscript=no,script=no\
-device virtio-net-pci,netdev=hostnet0,id=net0,bus=pci.0,addr=0x13,bootindex=3\
-device cirrus-vga,id=video0,vgamem_mb=16,bus=pci.0,addr=0x2\
-chardev file,id=seabios,path=/var/log/test.seabios,append=on\
-device isa-debugcon,iobase=0x402,chardev=seabios\
-monitor stdio \
-incoming tcp:127.0.0.1:8000
step 4. execute the following qmp command in vm1 to migrate.
(qemu) migrate tcp:127.0.0.1:8000
step 5. execute the following qmp command in vm2 to resume vcpu.
(qemu) cont
Before this patch, we get KVM "emulation failure" error on vm2.
This patch fixes it.
Cc: qemu-stable@nongnu.org
Signed-off-by: Hogan Wang <hogan.wang@huawei.com>
Message-Id: <20200727084621.3279-1-hogan.wang@huawei.com>
Reported-by: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Plain MAP_FIXED has the undesirable behaviour of splatting exiting
maps so we don't actually achieve what we want when looking for gaps.
We should be using MAP_FIXED_NOREPLACE. As this isn't always available
we need to potentially check the returned address to see if the kernel
gave us what we asked for.
Fixes: ad592e37df ("linux-user: provide fallback pgd_find_hole for bare chroots")
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20200724064509.331-9-alex.bennee@linaro.org>
Quoting ISO C99 6.7.8p4, "All the expressions in an initializer for an
object that has static storage duration shall be constant expressions or
string literals".
The compound literal produced by the make_floatx80() macro is not such a
constant expression, per 6.6p7-9. (An implementation may accept it,
according to 6.6p10, but is not required to.)
Therefore using "floatx80_zero" and make_floatx80() for initializing
"f2xm1_table" and "fpatan_table" is not portable. And gcc-4.8 in RHEL-7.6
actually chokes on them:
> target/i386/fpu_helper.c:871:5: error: initializer element is not constant
> { make_floatx80(0xbfff, 0x8000000000000000ULL),
> ^
We've had the make_floatx80_init() macro for this purpose since commit
3bf7e40ab9 ("softfloat: fix for C99", 2012-03-17), so let's use that
macro again.
Fixes: eca30647fc ("target/i386: reimplement f2xm1 using floatx80 operations")
Fixes: ff57bb7b63 ("target/i386: reimplement fpatan using floatx80 operations")
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Cc: Aurelien Jarno <aurelien@aurel32.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Joseph Myers <joseph@codesourcery.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Link: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg06566.html
Link: https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg04714.html
Message-Id: <20200716144251.23004-1-lersek@redhat.com>
Message-Id: <20200724064509.331-8-alex.bennee@linaro.org>
This will be used in a future patch. For POSIX systems _SC_PHYS_PAGES
isn't standardised but at least appears in the man pages for
Open/FreeBSD. The result is advisory so any users of it shouldn't just
fail if we can't work it out.
The win32 stub currently returns 0 until someone with a Windows system
can develop and test a patch.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Cc: BALATON Zoltan <balaton@eik.bme.hu>
Cc: Christian Ehrhardt <christian.ehrhardt@canonical.com>
Message-Id: <20200724064509.331-5-alex.bennee@linaro.org>
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEuBi5yt+QicLVzsZrda1lgCoLQhEFAl8beSIACgkQda1lgCoL
QhEYJwf/X+mekAhZ1os77TvEbA+YVoPrXnPEu1UTWVHBWkmzQisk2eRMj1LlA2+T
a+kEpbw+a1oc3GSWBHXJIVWuFOcA+GEDFerpQypSowAVJHPKam3xPBraP/R+bjXq
e3D7WDMfjOE2sZ2Aj1I9sBZnKOI5yg9GcQ2PjB6btAB2eKJjns2myvhWiA5XEa/H
l+eKtej3u8CvQ51vIrTxV/pvEqhPl1b4kvjj+40/COSMIcQ5OWdVk3de1ty9I8dP
UjeaPRVXsJqYo/ZsUZS/uAdPIn8Ih77OtJRAEGSNe/KgttPQ/EG94mhiEws8xl+b
JiV5HehRW0LB+f6eVGrQ0SbH7TCkCA==
=/17e
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/stefanberger/tags/pull-tpm-2020-07-24-1' into staging
Merge tpm 2020/07/24 v1
# gpg: Signature made Sat 25 Jul 2020 01:13:22 BST
# gpg: using RSA key B818B9CADF9089C2D5CEC66B75AD65802A0B4211
# gpg: Good signature from "Stefan Berger <stefanb@linux.vnet.ibm.com>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: B818 B9CA DF90 89C2 D5CE C66B 75AD 6580 2A0B 4211
* remotes/stefanberger/tags/pull-tpm-2020-07-24-1:
tpm_emulator: Report an error if chardev is missing
tpm: Improve help on TPM types when none are available
Revert "tpm: Clean up error reporting in tpm_init_tpmdev()"
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This reverts commit d10e05f15d.
We report some -tpmdev failures, but then continue as if all was fine.
Reproducer:
$ qemu-system-x86_64 -nodefaults -S -display none -monitor stdio -chardev null,id=tpm0 -tpmdev emulator,id=tpm0,chardev=chrtpm -device tpm-tis,tpmdev=tpm0
qemu-system-x86_64: -tpmdev emulator,id=tpm0,chardev=chrtpm: tpm-emulator: tpm chardev 'chrtpm' not found.
qemu-system-x86_64: -tpmdev emulator,id=tpm0,chardev=chrtpm: tpm-emulator: Could not cleanly shutdown the TPM: No such file or directory
QEMU 5.0.90 monitor - type 'help' for more information
(qemu) qemu-system-x86_64: -device tpm-tis,tpmdev=tpm0: Property 'tpm-tis.tpmdev' can't find value 'tpm0'
$ echo $?
1
This is a regression caused by commit d10e05f15d "tpm: Clean up error
reporting in tpm_init_tpmdev()". It's incomplete: be->create(opts)
continues to use error_report(), and we don't set an error when it
fails.
I figure converting the create() methods to Error would make some
sense, but I'm not sure it's worth the effort right now. Revert the
broken commit instead, and add a comment to tpm_init_tpmdev().
Straightforward conflict in tpm.c resolved.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
error_propagate_prepend() "behaves like error_prepend()", and
error_prepend() uses "formatting @fmt, ... like printf()".
error_prepend() checks its format string argument, but
error_propagate_prepend() does not. Fix by addint the format
attribute to error_propagate_prepend() and error_vprepend().
This would have caught the bug fixed in the previous commit.
Missed in commit 4b5766488f "error: Fix use of error_prepend() with
&error_fatal, &error_abort".
Inspired-by: Stefan Weil <sw@weilnetz.de>
Suggested-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200723171205.14949-1-philmd@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Commit message tweaked]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
The main fix is the correction of the goldfish RTC time. On top of that
some small fixes to the recently added vector extensions have been added
(including an assert that fixed a coverity report). There is a change in
the SiFive E debug memory size to match hardware. Finally there is a fix
for PMP accesses.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEE9sSsRtSTSGjTuM6PIeENKd+XcFQFAl8YbM8ACgkQIeENKd+X
cFRusgf+NpstlmK/+35DimtJ9oYV2H+NSE7D9HDEdgm2PszYNDjEiWRMBCl1B2JS
3vTutR198USdcJtdXFrFooaxZaMNf0FQJ7p82BUnNOlUNy7vyFlsRQX687KWJh3+
F0t9MsaBVY3G1UiY6vke2vPdcHNG0cAPUwEWjKIU7E7nBmbvNnZZyXxYGC7yjCBI
GQ1TKqso9wKtvAyc6cNGPcsUUM8P+LI5H+UzQR8A1LZ5bohKIQW+xrdJe6HqGMs1
3xZ4tQS2AG5XaaKz74/AdTJSTf80plG2jDomI9fBoNjqRnyPRAlwgzO88Hc24Bcm
RLzL51UaQv+EddxspAW9gH9FHJRvfA==
=6MUF
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/alistair/tags/pull-riscv-to-apply-20200722-1' into staging
This PR contains a few RISC-V fixes.
The main fix is the correction of the goldfish RTC time. On top of that
some small fixes to the recently added vector extensions have been added
(including an assert that fixed a coverity report). There is a change in
the SiFive E debug memory size to match hardware. Finally there is a fix
for PMP accesses.
# gpg: Signature made Wed 22 Jul 2020 17:43:59 BST
# gpg: using RSA key F6C4AC46D4934868D3B8CE8F21E10D29DF977054
# gpg: Good signature from "Alistair Francis <alistair@alistair23.me>" [full]
# Primary key fingerprint: F6C4 AC46 D493 4868 D3B8 CE8F 21E1 0D29 DF97 7054
* remotes/alistair/tags/pull-riscv-to-apply-20200722-1:
target/riscv: Fix the range of pmpcfg of CSR funcion table
hw/riscv: sifive_e: Correct debug block size
target/riscv: fix vector index load/store constraints
target/riscv: Quiet Coverity complains about vamo*
goldfish_rtc: Fix non-atomic read behaviour of TIME_LOW/TIME_HIGH
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Fix bug in ACPI which were tripping up guests.
Fix a use-after-free with hotplug of virtio devices.
Block ability to create legacy devices which shouldn't have been
there in the first place.
Fix migration error handling with balloon.
Drop some dead code in virtio.
vtd emulation fixup.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAl8YK/4PHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRpCE4H/1+15xjUiKD0sxnvPdKezbDhtAW0YPY/cHC0
KJRWFDbK/+cl9ZkJQBqUXASV3KWnjSKjQrVph6vtg8huqhhDsnha1JGgamhOa9tC
7rH8RkMA6nUF/su8xnkNyNBfG2lHk6ETyKvvTtuLHzjbkzWd6OYtaQAQJTYI6TVB
aY+MCIT7xfucsL6JaHA8BTccOOjz7pxc6dL4NsQCR3cZkwTtB9JOE5UwgM3IyNP/
DcbFyVUDkXYtlpKU/xO+ZICbxCNsZmHzpnV8KJ07vyJdAhL1hRAayMkNG4xLzW0n
f/ZMlJna5jDP3fRqgVvu8XbY3TcCx1XOBD9ebH5E6hvhWnp8oHI=
=SJjI
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
acpi,virtio,pc: bugfixes
Fix bug in ACPI which were tripping up guests.
Fix a use-after-free with hotplug of virtio devices.
Block ability to create legacy devices which shouldn't have been
there in the first place.
Fix migration error handling with balloon.
Drop some dead code in virtio.
vtd emulation fixup.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# gpg: Signature made Wed 22 Jul 2020 13:07:26 BST
# gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg: issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full]
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full]
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* remotes/mst/tags/for_upstream:
virtio-pci: Changed vdev to proxy for VirtIO PCI BAR callbacks.
intel_iommu: Use correct shift for 256 bits qi descriptor
virtio: verify that legacy support is not accidentally on
virtio: list legacy-capable devices
virtio-balloon: Replace free page hinting references to 'report' with 'hint'
virtio-balloon: Add locking to prevent possible race when starting hinting
virtio-balloon: Prevent guest from starting a report when we didn't request one
virtio: Drop broken and superfluous object_property_set_link()
acpi: accept byte and word access to core ACPI registers
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The specification says:
0x00 TIME_LOW R: Get current time, then return low-order 32-bits.
0x04 TIME_HIGH R: Return high 32-bits from previous TIME_LOW read.
...
To read the value, the kernel must perform an IO_READ(TIME_LOW),
which returns an unsigned 32-bit value, before an IO_READ(TIME_HIGH),
which returns a signed 32-bit value, corresponding to the higher half
of the full value.
However, we were just returning the current time for both. If the guest
is unlucky enough to read TIME_LOW and TIME_HIGH either side of an
overflow of the lower half, it will see time be in the future, before
jumping backwards on the next read, and Linux currently relies on the
atomicity guaranteed by the spec so is affected by this. Fix this
violation of the spec by caching the correct value for TIME_HIGH
whenever TIME_LOW is read, and returning that value for any TIME_HIGH
read.
Signed-off-by: Jessica Clarke <jrtc27@jrtc27.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20200718004934.83174-1-jrtc27@jrtc27.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Several types of virtio devices had already been around before the
virtio standard was specified. These devices support virtio in legacy
(and transitional) mode.
Devices that have been added in the virtio standard are considered
non-transitional (i.e. with no support for legacy virtio).
Provide a helper function so virtio transports can figure that out
easily.
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20200707105446.677966-2-cohuck@redhat.com>
Cc: qemu-stable@nongnu.org
Acked-by: Halil Pasic <pasic@linux.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Recently a feature named Free Page Reporting was added to the virtio
balloon. In order to avoid any confusion we should drop the use of the word
'report' when referring to Free Page Hinting. So what this patch does is go
through and replace all instances of 'report' with 'hint" when we are
referring to free page hinting.
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Message-Id: <20200720175128.21935.93927.stgit@localhost.localdomain>
Cc: qemu-stable@nongnu.org
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Commits b6d7e9b66f..a43770df5d simplified the error propagation.
Similarly to commit 6fd5bef10b "qom: Make functions taking Error**
return bool, not void", let fw_cfg_add_from_generator() return a
boolean value, not void.
This allow to simplify parse_fw_cfg() and fixes the error handling
issue reported by Coverity (CID 1430396):
In parse_fw_cfg():
Variable assigned once to a constant guards dead code.
Local variable local_err is assigned only once, to a constant
value, making it effectively constant throughout its scope.
If this is not the intent, examine the logic to see if there
is a missing assignment that would make local_err not remain
constant.
It's the call of fw_cfg_add_from_generator():
Error *local_err = NULL;
fw_cfg_add_from_generator(fw_cfg, name, gen_id, errp);
if (local_err) {
error_propagate(errp, local_err);
return -1;
}
return 0;
If it fails, parse_fw_cfg() sets an error and returns 0, which is
wrong. Harmless, because the only caller passes &error_fatal.
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Fixes: Coverity CID 1430396: 'Constant' variable guards dead code (DEADCODE)
Fixes: 6552d87c48 ("softmmu/vl: Let -fw_cfg option take a 'gen_id' argument")
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200721131911.27380-3-philmd@redhat.com>
Document FWCfgDataGeneratorClass::get_data() return NULL
on error, and non-NULL on success. This allow us to simplify
fw_cfg_add_from_generator(). Since we don't need a local
variable to propagate the error, we can remove the ERRP_GUARD()
macro.
Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20200721131911.27380-2-philmd@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20200714160202.3121879-5-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
object_get_canonical_path_component() returns a malloced copy of a
property name on success, null on failure.
19 of its 25 callers immediately free the returned copy.
Change object_get_canonical_path_component() to return the property
name directly. Since modifying the name would be wrong, adjust the
return type to const char *.
Drop the free from the 19 callers become simpler, add the g_strdup()
to the other six.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20200714160202.3121879-4-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
Document qemu_find_file(), in particular the returned
value which must be freed.
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
Reviewed-by: Michael Rolnik <mrolnik@gmail.com>
Tested-by: Michael Rolnik <mrolnik@gmail.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20200714164257.23330-4-f4bug@amsat.org>
This comment is confuse, reword it a bit.
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Michael Rolnik <mrolnik@gmail.com>
Tested-by: Michael Rolnik <mrolnik@gmail.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20200714164257.23330-3-f4bug@amsat.org>
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1708065
With network backend with 'virtual header' - there was an issue
in 'plen' field. Overall, during TSO, 'plen' would be changed,
but with 'vheader' this field should be set to the size of the
payload itself instead of '0'.
Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Add documentation comments for the various qdev functions
related to creating and connecting GPIO lines.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200711142425.16283-4-peter.maydell@linaro.org
Add a doc comment for qdev_unrealize(), to go with the new
documentation for the realize part of the qdev lifecycle.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200711142425.16283-3-peter.maydell@linaro.org
The doc-comments which document the qdev API are split between the
header file and the C source files, because as a project we haven't
been consistent about where we put them.
Move all the doc-comments in qdev.c to the header files, so that
users of the APIs don't have to look at the implementation files for
this information.
In the process, unify them into our doc-comment format and expand on
them in some cases to clarify expected use cases.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200711142425.16283-2-peter.maydell@linaro.org
Control this cpu feature via a machine property, much as we do
with secure=on, since both require specialized support in the
machine setup to be functional.
Default MTE to off, since this feature implies extra overhead.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200713213341.590275-2-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Here are some assorted fixes for qemu-5.1:
* SLOF update with improved TPM handling, and fix for possible stack
overflows on many-vcpu machines
* Fix for NUMA distances on NVLink2 attached GPU memory nodes
* Fixes to fail more gracefully on attempting to plug unsupported PCI bridge types
* Don't allow pnv-psi device to be user created
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAl8VK7EACgkQbDjKyiDZ
s5K7HxAAjFAzlKD7AiF7u0TbuvBFx3J3zxIcCnd3W0ViBiZ4FOybjf7/q8R8Wu94
MrNv/15fZLbS6rcUCERFnEr+TpFgZ/mUn0JuoJWI0AUrI+FtUaCj9kznjwfzU0jN
gU75F6R5q1GzS8ENHZWm1xWHVTk3OBj1eWQu8ialx9Kx4TMc9hTdgIYhQoB6+WD3
nyIR6FUlMutYvPcODJS/HHLLT9Nc3w0zQAOYz7B+OgBKWkM61H3L17ITg9eo9YDz
/xPz+41DqYC1FsTcTB91572lbePCURScJc2xE8GvuGMwNmdDoTMq+EALCLlTawIJ
68w6e+y4uymnDwSGRn0j3Wopc6iggEbeIukgO1GZLUwyACOLWXtwGh3SOxEcmsYH
CiUgBkZ0k07lyXAlMmpIwrc90qPXh7Ox4m24DsH+A0eSNPUtuWOht4dLrHbuAkkf
5KMhTBWMOnLxUilrp+U3Xsuo5BUQVAy6eBI1sCYaLHTJIFoBg0G0g7xg7q/23nnc
DX0RtZgjJdlFjfbzFzetSJYzd8Xf5P9Giqx0XZ+w6vpPTXBsDA57MqpICXiEQGSk
OeVp51dWrWL1FIRoEL1O7YZBu57Oi1hpl1JVG3bxCKa+lxiVw6ZLXGL9m8otOc1/
iSr3WpTI9wOo5Ele3lkl0NQjNeGnJ401UpmGCkEclp2zmMdCGrU=
=CYAQ
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-5.1-20200720' into staging
ppc patch queue 20200720
Here are some assorted fixes for qemu-5.1:
* SLOF update with improved TPM handling, and fix for possible stack
overflows on many-vcpu machines
* Fix for NUMA distances on NVLink2 attached GPU memory nodes
* Fixes to fail more gracefully on attempting to plug unsupported PCI bridge types
* Don't allow pnv-psi device to be user created
# gpg: Signature made Mon 20 Jul 2020 06:29:21 BST
# gpg: using RSA key 75F46586AE61A66CC44E87DC6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" [full]
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" [full]
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" [full]
# gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" [unknown]
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dgibson/tags/ppc-for-5.1-20200720:
pseries: Update SLOF firmware image
spapr: Add a new level of NUMA for GPUs
spapr_pci: Robustify support of PCI bridges
ppc/pnv: Make PSI device types not user creatable
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
NUMA nodes corresponding to GPU memory currently have the same
affinity/distance as normal memory nodes. Add a third NUMA associativity
reference point enabling us to give GPU nodes more distance.
This is guest visible information, which shouldn't change under a
running guest across migration between different qemu versions, so make
the change effective only in new (pseries > 5.0) machine types.
Before, `numactl -H` output in a guest with 4 GPUs (nodes 2-5):
node distances:
node 0 1 2 3 4 5
0: 10 40 40 40 40 40
1: 40 10 40 40 40 40
2: 40 40 10 40 40 40
3: 40 40 40 10 40 40
4: 40 40 40 40 10 40
5: 40 40 40 40 40 10
After:
node distances:
node 0 1 2 3 4 5
0: 10 40 80 80 80 80
1: 40 10 80 80 80 80
2: 80 80 10 80 80 80
3: 80 80 80 10 80 80
4: 80 80 80 80 10 80
5: 80 80 80 80 80 10
These are the same distances as on the host, mirroring the change made
to host firmware in skiboot commit f845a648b8cb ("numa/associativity:
Add a new level of NUMA for GPU's").
Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
Message-Id: <20200716225655.24289-1-arbab@linux.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
In commit d88c42ff2c we added new prototype but neglected to
add their documentation. Fix that.
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Corey Minyard <cminyard@mvista.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20200705224154.16917-6-f4bug@amsat.org>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
We use "create_simple" names for functions that allocate, initialize,
configure and realize device objects: pci_create_simple(),
isa_create_simple(), usb_create_simple(). For consistency, rename
i2c_create_slave() as i2c_slave_create_simple(). Since we have
to update all the callers, also let it return a I2CSlave object.
Suggested-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Corey Minyard <cminyard@mvista.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20200705224154.16917-5-f4bug@amsat.org>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
The other i2c functions are called i2c_slave_FOO(). Rename as
i2c_slave_realize_and_unref() to be consistent.
Suggested-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20200705224154.16917-4-f4bug@amsat.org>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
We use "new" names for functions that allocate and initialize
device objects: pci_new(), isa_new(), usb_new().
Let's call this one i2c_slave_new(). Since we have to update
all the callers, also let it return a I2CSlave object.
Suggested-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20200705224154.16917-3-f4bug@amsat.org>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
All the callers of aspeed_i2c_get_bus() have a AspeedI2CState and
cast it to a DeviceState with DEVICE(), then aspeed_i2c_get_bus()
cast the DeviceState to an AspeedI2CState with ASPEED_I2C()...
Simplify aspeed_i2c_get_bus() callers by using AspeedI2CState
argument.
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Andrew Jeffery <andrew@aj.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20200705224154.16917-2-f4bug@amsat.org>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Version: GnuPG v1
iQEcBAABAgAGBQJfDwlTAAoJEO8Ells5jWIRQSAIAIXTZAn/Ui+9GpqTNtYRTu+n
RngmAtkPim7NFz0R6hv3CjvkKcMQHXvj1JsJkwV47ww+LRiKHTh6U6r9V637hhEc
gI1X1mLOUWcHe1Sj1hqvLUoLnPsnjoigShGbILFTRSInMYiuPbw7xihSyw+MPREK
yheEHztm7DdlnPHp1wCqFJkxYAQMwpThJUwQHbqoGNiYDGZZvfMaigi7bBmOgloz
i3aRc/J7skfK9GOwVXwqbDoHeWRk5No8y/sEXXUZva7fFol8Unfvw5ubSuQY6Nw0
fOB+C4N9o8lz9mIrbPkVqbZ3U+/+XIGUt2/JmOqEL6hhXMedh2261WjhC1K4cT8=
=UURQ
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into staging
# gpg: Signature made Wed 15 Jul 2020 14:49:07 BST
# gpg: using RSA key EF04965B398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>" [marginal]
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F 3562 EF04 965B 398D 6211
* remotes/jasowang/tags/net-pull-request:
ftgmac100: fix dblac write test
net: detect errors from probing vnet hdr flag for TAP devices
net: check if the file descriptor is valid before using it
qemu-options.hx: Clean up and fix typo for colo-compare
net/colo-compare.c: Expose compare "max_queue_size" to users
hw/net: Added CSO for IPv6
virtio-net: fix removal of failover device
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
qemu_set_nonblock() checks that the file descriptor can be used and, if
not, crashes QEMU. An assert() is used for that. The use of assert() is
used to detect programming error and the coredump will allow to debug
the problem.
But in the case of the tap device, this assert() can be triggered by
a misconfiguration by the user. At startup, it's not a real problem, but it
can also happen during the hot-plug of a new device, and here it's a
problem because we can crash a perfectly healthy system.
For instance:
# ip link add link virbr0 name macvtap0 type macvtap mode bridge
# ip link set macvtap0 up
# TAP=/dev/tap$(ip -o link show macvtap0 | cut -d: -f1)
# qemu-system-x86_64 -machine q35 -device pcie-root-port,id=pcie-root-port-0 -monitor stdio 9<> $TAP
(qemu) netdev_add type=tap,id=hostnet0,vhost=on,fd=9
(qemu) device_add driver=virtio-net-pci,netdev=hostnet0,id=net0,bus=pcie-root-port-0
(qemu) device_del net0
(qemu) netdev_del hostnet0
(qemu) netdev_add type=tap,id=hostnet1,vhost=on,fd=9
qemu-system-x86_64: .../util/oslib-posix.c:247: qemu_set_nonblock: Assertion `f != -1' failed.
Aborted (core dumped)
To avoid that, add a function, qemu_try_set_nonblock(), that allows to report the
problem without crashing.
In the same way, we also update the function for vhostfd in net_init_tap_one() and
for fd in net_init_socket() (both descriptors are provided by the user and can
be wrong).
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Any write to a device might cause a re-arrangement of memory
triggering a TLB flush and potential re-size of the TLB invalidating
previous entries. This would cause users of qemu_plugin_get_hwaddr()
to see the warning:
invalid use of qemu_plugin_get_hwaddr
because of the failed tlb_lookup which should always succeed. To
prevent this we save the IOTLB data in case it is later needed by a
plugin doing a lookup.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20200713200415.26214-7-alex.bennee@linaro.org>
- file-posix: Mitigate file fragmentation with extent size hints
- Tighten qemu-img rules on missing backing format
- qemu-img map: Don't limit block status request size
- Fix crash with virtio-scsi and iothreads
-----BEGIN PGP SIGNATURE-----
iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAl8NsgMRHGt3b2xmQHJl
ZGhhdC5jb20ACgkQfwmycsiPL9Z0tA//eqauxD7cTEpwrtLNrRtpiBtMG64BBpxz
QfkURzB38bMVahHlwq3Gt7Zcov8V4V7vxK66h688Z/fhw3vmqIeVe8+P6+Y5s9FL
jil8lewHuLTa6xELeugoV7SZXH8AAh1W2fQmiR7EPiOmpSE0wf7C5IShVlX8A04E
r0n09+61qGjRIe1hNTwTtldqQEfx6UGnxQWcQb81JUPA1lZhX3cnPg/j94Bofr+m
v/DbVTfsmUtTMjc0PdU7n4DKTWu8OS5B/X0unF21rTtO//cYBrhAeY3ax2jbFBWi
CIZK8HLI5m9/HFyltql1LOsd+B5TtfnXMfSdvDh2jaVUlto7wTeTnWU1fv4wxUB5
hk7XgJo/y203ebFNHpTmW8tvLfGTP8uqCVfOEFxzjy+JHGrarlbWkwL2LMOFFAZ2
s2WcwlfqiYGFTG4+OFdhPf9qPWKSqMr+jTdZJTse64/c6+YXWHk+pP9lfYEUOgSi
OYwdQUY9uiZ1K13q5Tif2TbFvs+c118xdTgVhAV7VtfPnWc3c647dX7iaq8Szknc
IT93670Iqf/PzEj+L7XUbbLIIsAcmxD0sr7QAQEt7bfiYIDRIQLiVPyzXplETFg2
SEkvtqBovm84ct7pWQzqA6lFvr3oIFDNquR40XFGozHNnlBeNi5s7pXQnqUBLElr
wDDuEi+z5QM=
=DB0q
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer patches:
- file-posix: Mitigate file fragmentation with extent size hints
- Tighten qemu-img rules on missing backing format
- qemu-img map: Don't limit block status request size
- Fix crash with virtio-scsi and iothreads
# gpg: Signature made Tue 14 Jul 2020 14:24:19 BST
# gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6
# gpg: issuer "kwolf@redhat.com"
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6
* remotes/kevin/tags/for-upstream:
block: Avoid stale pointer dereference in blk_get_aio_context()
qemu-img: Deprecate use of -b without -F
block: Add support to warn on backing file change without format
iotests: Specify explicit backing format where sensible
qcow2: Deprecate use of qemu-img amend to change backing file
block: Error if backing file fails during creation without -u
qcow: Tolerate backing_fmt=
vmdk: Add trivial backing_fmt support
sheepdog: Add trivial backing_fmt support
block: Finish deprecation of 'qemu-img convert -n -o'
qemu-img: Flush stdout before before potential stderr messages
file-posix: Mitigate file fragmentation with extent size hints
iotests/059: Filter out disk size with more standard filter
qemu-img map: Don't limit block status request size
iotests: Simplify _filter_img_create() a bit
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
For now, this is a mechanical addition; all callers pass false. But
the next patch will use it to improve 'qemu-img rebase -u' when
selecting a backing file with no format.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Message-Id: <20200706203954.341758-10-eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Especially when O_DIRECT is used with image files so that the page cache
indirection can't cause a merge of allocating requests, the file will
fragment on the file system layer, with a potentially very small
fragment size (this depends on the requests the guest sent).
On Linux, fragmentation can be reduced by setting an extent size hint
when creating the file (at least on XFS, it can't be set any more after
the first extent has been allocated), basically giving raw files a
"cluster size" for allocation.
This adds a create option to set the extent size hint, and changes the
default from not setting a hint to setting it to 1 MB. The main reason
why qcow2 defaults to smaller cluster sizes is that COW becomes more
expensive, which is not an issue with raw files, so we can choose a
larger size. The tradeoff here is only potentially wasted disk space.
For qcow2 (or other image formats) over file-posix, the advantage should
even be greater because they grow sequentially without leaving holes, so
there won't be wasted space. Setting even larger extent size hints for
such images may make sense. This can be done with the new option, but
let's keep the default conservative for now.
The effect is very visible with a test that intentionally creates a
badly fragmented file with qemu-img bench (the time difference while
creating the file is already remarkable) and then looks at the number of
extents and the time a simple "qemu-img map" takes.
Without an extent size hint:
$ ./qemu-img create -f raw -o extent_size_hint=0 ~/tmp/test.raw 10G
Formatting '/home/kwolf/tmp/test.raw', fmt=raw size=10737418240 extent_size_hint=0
$ ./qemu-img bench -f raw -t none -n -w ~/tmp/test.raw -c 1000000 -S 8192 -o 0
Sending 1000000 write requests, 4096 bytes each, 64 in parallel (starting at offset 0, step size 8192)
Run completed in 25.848 seconds.
$ ./qemu-img bench -f raw -t none -n -w ~/tmp/test.raw -c 1000000 -S 8192 -o 4096
Sending 1000000 write requests, 4096 bytes each, 64 in parallel (starting at offset 4096, step size 8192)
Run completed in 19.616 seconds.
$ filefrag ~/tmp/test.raw
/home/kwolf/tmp/test.raw: 2000000 extents found
$ time ./qemu-img map ~/tmp/test.raw
Offset Length Mapped to File
0 0x1e8480000 0 /home/kwolf/tmp/test.raw
real 0m1,279s
user 0m0,043s
sys 0m1,226s
With the new default extent size hint of 1 MB:
$ ./qemu-img create -f raw -o extent_size_hint=1M ~/tmp/test.raw 10G
Formatting '/home/kwolf/tmp/test.raw', fmt=raw size=10737418240 extent_size_hint=1048576
$ ./qemu-img bench -f raw -t none -n -w ~/tmp/test.raw -c 1000000 -S 8192 -o 0
Sending 1000000 write requests, 4096 bytes each, 64 in parallel (starting at offset 0, step size 8192)
Run completed in 11.833 seconds.
$ ./qemu-img bench -f raw -t none -n -w ~/tmp/test.raw -c 1000000 -S 8192 -o 4096
Sending 1000000 write requests, 4096 bytes each, 64 in parallel (starting at offset 4096, step size 8192)
Run completed in 10.155 seconds.
$ filefrag ~/tmp/test.raw
/home/kwolf/tmp/test.raw: 178 extents found
$ time ./qemu-img map ~/tmp/test.raw
Offset Length Mapped to File
0 0x1e8480000 0 /home/kwolf/tmp/test.raw
real 0m0,061s
user 0m0,040s
sys 0m0,014s
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200707142329.48303-1-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>