Here's the first ppc target pull request for qemu-4.1. This has a
number of things that have accumulated while qemu-4.0 was frozen.
* A number of emulated MMU improvements from Ben Herrenschmidt
* Assorted cleanups fro Greg Kurz
* A large set of mostly mechanical cleanups from me to make target/ppc
much closer to compliant with the modern coding style
* Support for passthrough of NVIDIA GPUs using NVLink2
As well as some other assorted fixes.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAlzCnusACgkQbDjKyiDZ
s5LfhhAAuem5UBGKPKPj33c87HC+GGG+S4y89ic3ebyKplWulGgouHCa4Dnc7Y5m
9MfIEcljRDpuRJCEONo6yg9aaRb3cW2Go9TpTwxmF8o1suG/v5bIQIdiRbBuMa2t
yhNujVg5kkWSU1G4mCZjL9FS2ADPsxsKZVd73DPEqjlNJg981+2qtSnfR8SXhfnk
dSSKxyfC6Hq1+uhGkLI+xtft+BCTWOstjz+efHpZ5l2mbiaMeh7zMKrIXXy/FtKA
ufIyxbZznMS5MAZk7t90YldznfwOCqfh3di1kx8GTZ40LkBKbuI5LLHTG0sT75z5
LHwFuLkBgWmS8RyIRRh9opr7ifrayHx8bQFpW368Qu+PbPzUCcTVIrWUfPmaNR74
CkYJvhiYZfTwKtUeP7b2wUkHpZF4KINI4TKNaS4QAlm3DNbO67DFYkBrytpXsSzv
smEpe+sqlbY40olw9q4ESP80r+kGdEPLkRjfdj0R7qS4fsqAH1bjuSkNqlPaCTJQ
hNsoz2D+f56z0bBq4x8FRzDpqnBkdy4x6PlLxkJuAaV7WAtvq7n7tiMA3TRr/rIB
OYFP2xPNajjP8MfyOB94+S4WDltmsgXoM7HyyvrKp2JBpe7mFjpep5fMp5GUpweV
OOYrTsN1Nuu3kFpeimEc+IOyp1BWXnJF4vHhKTOqHeqZEs5Fgus=
=RpAK
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-4.1-20190426' into staging
ppc patch queue 2019-04-26
Here's the first ppc target pull request for qemu-4.1. This has a
number of things that have accumulated while qemu-4.0 was frozen.
* A number of emulated MMU improvements from Ben Herrenschmidt
* Assorted cleanups fro Greg Kurz
* A large set of mostly mechanical cleanups from me to make target/ppc
much closer to compliant with the modern coding style
* Support for passthrough of NVIDIA GPUs using NVLink2
As well as some other assorted fixes.
# gpg: Signature made Fri 26 Apr 2019 07:02:19 BST
# gpg: using RSA key 75F46586AE61A66CC44E87DC6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" [full]
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" [full]
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" [full]
# gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" [unknown]
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dgibson/tags/ppc-for-4.1-20190426: (36 commits)
target/ppc: improve performance of large BAT invalidations
ppc/hash32: Rework R and C bit updates
ppc/hash64: Rework R and C bit updates
ppc/spapr: Use proper HPTE accessors for H_READ
target/ppc: Don't check UPRT in radix mode when in HV real mode
target/ppc/kvm: Convert DPRINTF to traces
target/ppc/trace-events: Fix trivial typo
spapr: Drop duplicate PCI swizzle code
spapr_pci: Get rid of duplicate code for node name creation
target/ppc: Style fixes for translate/spe-impl.inc.c
target/ppc: Style fixes for translate/vmx-impl.inc.c
target/ppc: Style fixes for translate/vsx-impl.inc.c
target/ppc: Style fixes for translate/fp-impl.inc.c
target/ppc: Style fixes for translate.c
target/ppc: Style fixes for translate_init.inc.c
target/ppc: Style fixes for monitor.c
target/ppc: Style fixes for mmu_helper.c
target/ppc: Style fixes for mmu-hash64.[ch]
target/ppc: Style fixes for mmu-hash32.[ch]
target/ppc: Style fixes for misc_helper.c
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
With MT-TCG, we are now running translation in a racy way, thus
we need to mimic hardware when it comes to updating the R and
C bits, by doing byte stores.
The current "store_hpte" abstraction is ill suited for this, we
replace it with two separate callbacks for setting R and C.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190411080004.8690-4-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
LSI mapping in spapr currently open-codes standard PCI swizzling. It thus
duplicates the code of pci_swizzle_map_irq_fn().
Expose the swizzling formula so that it can be used with a slot number
when building the device tree. Simply drop pci_spapr_map_irq() and call
pci_swizzle_map_irq_fn() instead.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155448184841.8446.13959787238854054119.stgit@bahia.lan>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Removing RTAS handlers will become necessary when the new pseries
machine supporting multiple interrupt mode is introduced.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190321144914.19934-9-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
NVIDIA V100 GPUs have on-board RAM which is mapped into the host memory
space and accessible as normal RAM via an NVLink bus. The VFIO-PCI driver
implements special regions for such GPUs and emulates an NVLink bridge.
NVLink2-enabled POWER9 CPUs also provide address translation services
which includes an ATS shootdown (ATSD) register exported via the NVLink
bridge device.
This adds a quirk to VFIO to map the GPU memory and create an MR;
the new MR is stored in a PCI device as a QOM link. The sPAPR PCI uses
this to get the MR and map it to the system address space.
Another quirk does the same for ATSD.
This adds additional steps to sPAPR PHB setup:
1. Search for specific GPUs and NPUs, collect findings in
sPAPRPHBState::nvgpus, manage system address space mappings;
2. Add device-specific properties such as "ibm,npu", "ibm,gpu",
"memory-block", "link-speed" to advertise the NVLink2 function to
the guest;
3. Add "mmio-atsd" to vPHB to advertise the ATSD capability;
4. Add new memory blocks (with extra "linux,memory-usable" to prevent
the guest OS from accessing the new memory until it is onlined) and
npuphb# nodes representing an NPU unit for every vPHB as the GPU driver
uses it for link discovery.
This allocates space for GPU RAM and ATSD like we do for MMIOs by
adding 2 new parameters to the phb_placement() hook. Older machine types
set these to zero.
This puts new memory nodes in a separate NUMA node to as the GPU RAM
needs to be configured equally distant from any other node in the system.
Unlike the host setup which assigns numa ids from 255 downwards, this
adds new NUMA nodes after the user configures nodes or from 1 if none
were configured.
This adds requirement similar to EEH - one IOMMU group per vPHB.
The reason for this is that ATSD registers belong to a physical NPU
so they cannot invalidate translations on GPUs attached to another NPU.
It is guaranteed by the host platform as it does not mix NVLink bridges
or GPUs from different NPU in the same IOMMU group. If more than one
IOMMU group is detected on a vPHB, this disables ATSD support for that
vPHB and prints a warning.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[aw: for vfio portions]
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Message-Id: <20190312082103.130561-1-aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
besides the existing 'shared' flags, we are going to add
'is_pmem' to qemu_ram_mmap(), which indicated the memory backend
file is a persist memory.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
Reviewed-by: Pankaj Gupta <pagupta@redhat.com>
Message-Id: <786c46862cfeb253ee0ea2f44d62ffe76edb7fa4.1549555521.git.yi.z.zhang@linux.intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Pankaj Gupta <pagupta@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
The "model[,option...]" string parsed by the function is not just
a CPU model. Rename the function and its argument to indicate it
expects the full "-cpu" option to be provided.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20190417025944.16154-2-ehabkost@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Function find_default_machine() is introduced by commit 2c8cffa599
"vl: make find_default_machine externally visible", and it was used
outside of vl.c until commit a904410af5 "pc_sysfw: remove the rom_only
property".
Commit a904410af5 "pc_sysfw: remove the rom_only property" removed the
only user of find_default_machine() outside vl.c, but neglected to make
it static. Do that now.
Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Message-Id: <20190405064121.23662-2-richardw.yang@linux.intel.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJcsHOvAAoJEC7Z13T+cC21IgoP/37yKOLOuagT4an9L573qWPp
xCS48CJ4rNkpWXP3SHuTe+UHSp20sk+5b6rgM/VkLT2d301WS4gxF/VVu85ZFxGX
tkqDwnljb87MqwTbH5Yj/U4mmq8tZkNg4CqmWuvJhWv4aFe6T/YtAzUkl7y7YCcT
QfKqErl363JJKkL7cz+QWopFn5Gl/hy+mvEhbEtexWLIV1UNZ1i2hPWURfkh8FWH
C1047CAfnC/rEJy0GcwkH4iCem8n4LWkMuf3Zehq+Yx+f2e8FfxMkoOtLJVCoKWj
qoMidAplGqUxLoamZsbU1wEzwH6YH28X+uNUULsgDIIBuyW35ymbGsGTmKmNm6ED
zVM1K7badvLeO3PXBxUkviZk7UFjxjXz3xCQMheY47wPoslfX+EN0xUvQ2gW2MvO
Dhb9oPWsr/PFrMMJ35D2OOFH5kJC8Sj30YiP5lsnRoUBi4ecHCIUSlw6esKuYI+H
JfDAYzxe6QqoGg5cxSNUXP+vAgU/FQq+nGNGHzHnsIR4Udt/JsAtNwxQ9DCbYR0C
LA5qxZZTqkDtLPAHynqOzd8m7AoNaBSAgP2qp7Yp8ItXMyemlW2OIYR7yRxuL5bH
zWO/deGYHp3+j9/Z0quzSUL5G85m0o1xgRJcJe9T2fYjWgsy271WFqaZg1JEvpI6
zHcXEw71B7WQuAFaFO1+
=tJvv
-----END PGP SIGNATURE-----
Merge tag 's390-ccw-bios-2019-04-12' into s390-next-staging
Support for booting from a vfio-ccw passthrough dasd device
# gpg: Signature made Fri 12 Apr 2019 01:17:03 PM CEST
# gpg: using RSA key 2ED9D774FE702DB5
# gpg: Good signature from "Thomas Huth <th.huth@gmx.de>" [full]
# gpg: aka "Thomas Huth <thuth@redhat.com>" [undefined]
# gpg: aka "Thomas Huth <huth@tuxfamily.org>" [undefined]
# gpg: aka "Thomas Huth <th.huth@posteo.de>" [unknown]
* tag 's390-ccw-bios-2019-04-12':
pc-bios/s390: Update firmware images
s390-bios: Use control unit type to find bootable devices
s390-bios: Support booting from real dasd device
s390-bios: Add channel command codes/structs needed for dasd-ipl
s390-bios: Use control unit type to determine boot method
s390-bios: Refactor virtio to run channel programs via cio
s390-bios: Factor finding boot device out of virtio code path
s390-bios: Extend find_dev() for non-virtio devices
s390-bios: cio error handling
s390-bios: Support for running format-0/1 channel programs
s390-bios: ptr2u32 and u32toptr
s390-bios: Map low core memory
s390-bios: Decouple channel i/o logic from virtio
s390-bios: Clean up cio.h
s390-bios: decouple common boot logic from virtio
s390-bios: decouple cio setup from virtio
s390 vfio-ccw: Add bootindex property and IPLB data
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Rename qemu_getrampagesize() to qemu_minrampagesize(). While at it,
properly rename find_max_supported_pagesize() to
find_min_backend_pagesize().
s390x is actually interested into the maximum ram pagesize, so
introduce and use qemu_maxrampagesize().
Add a TODO, indicating that looking at any mapped memory backends is not
100% correct in some cases.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20190417113143.5551-3-david@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
In order to handle TB's that translate to too much code, we
need to place the control of the length of the translation
in the hands of the code gen master loop.
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The previous commits have eliminated fprintf_function outside
disassemblers, simplifying code and cleaning up the ugly type-punning
fprintf_function seems to attract. Move fprintf_function to
include/disas/dis-asm.h to reduce the temptation to abuse it.
I considered renaming it to fprintf_ftype (reverting that part of
commit 6e2d864edf, v0.14.0) to get us closer to binutils, but I
figure the fork is too distant to make this worthwhile.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190417191805.28198-18-armbru@redhat.com>
Commit dc99065b5f (v0.1.0) added dis-asm.h from binutils.
Commit 43d4145a98 (v0.1.5) inlined bfd.h into dis-asm.h to remove the
dependency on binutils.
Commit 76cad71136 (v1.4.0) moved dis-asm.h to include/disas/bfd.h.
The new name is confusing when you try to match against (pre GPLv3+)
binutils. Rename it back. Keep it in the same directory, of course.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190417191805.28198-17-armbru@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
INIT_DISASSEMBLE_INFO() takes an fprintf()-like callback and a FILE *
to pass to it. monitor_disas() passes monitor_fprintf() and the
current monitor cast to FILE *. monitor_fprintf() casts it right
back, and is otherwise identical to monitor_printf(). The
type-punning is ugly.
Pass qemu_fprintf() and NULL instead.
monitor_fprintf() is now unused; delete it.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190417191805.28198-16-armbru@redhat.com>
[Commit message typo corrected]
CPUClass method dump_statistics() takes an fprintf()-like callback and
a FILE * to pass to it. Most callers pass fprintf() and stderr.
log_cpu_state() passes fprintf() and qemu_log_file.
hmp_info_registers() passes monitor_fprintf() and the current monitor
cast to FILE *. monitor_fprintf() casts it right back, and is
otherwise identical to monitor_printf().
The callback gets passed around a lot, which is tiresome. The
type-punning around monitor_fprintf() is ugly.
Drop the callback, and call qemu_fprintf() instead. Also gets rid of
the type-punning, since qemu_fprintf() takes NULL instead of the
current monitor cast to FILE *.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190417191805.28198-15-armbru@redhat.com>
Code that doesn't want to know about current monitor vs. stdout
vs. stderr takes an fprintf_function callback and a FILE * argument to
pass to it. Actual arguments are either fprintf() and stdout or
stderr, or monitor_fprintf() and the current monitor cast to FILE *.
monitor_fprintf() casts it right back, and is otherwise identical to
monitor_printf(). The type-punning is ugly.
New qemu_fprintf() and qemu_vprintf() address this need without type
punning: they are like fprintf() and vfprintf(), except they print to
the current monitor when passed a null FILE *. The next commits will
put them to use.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190417191805.28198-14-armbru@redhat.com>
CPUClass method dump_statistics() takes an fprintf()-like callback and
a FILE * to pass to it.
Its only caller hmp_info_cpustats() (via cpu_dump_statistics()) passes
monitor_fprintf() and the current monitor cast to FILE *.
monitor_fprintf() casts it right back, and is otherwise identical to
monitor_printf(). The type-punning is ugly.
Drop the callback, and call qemu_printf() instead.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190417191805.28198-13-armbru@redhat.com>
The various TARGET_cpu_list() take an fprintf()-like callback and a
FILE * to pass to it. Their callers (vl.c's main() via list_cpus(),
bsd-user/main.c's main(), linux-user/main.c's main()) all pass
fprintf() and stdout. Thus, the flexibility provided by the (rather
tiresome) indirection isn't actually used.
Drop the callback, and call qemu_printf() instead.
Calling printf() would also work, but would make the code unsuitable
for monitor context without making it simpler.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190417191805.28198-10-armbru@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
mtree_info() takes an fprintf()-like callback and a FILE * to pass to
it, and so do its helper functions. Passing around callback and
argument is rather tiresome.
Its only caller hmp_info_mtree() passes monitor_printf() cast to
fprintf_function and the current monitor cast to FILE *.
The type-punning is technically undefined behaviour, but works in
practice. Clean up: drop the callback, and call qemu_printf()
instead.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190417191805.28198-9-armbru@redhat.com>
bdrv_snapshot_dump(), bdrv_image_info_specific_dump(),
bdrv_image_info_dump() and their helpers take an fprintf()-like
callback and a FILE * to pass to it.
hmp.c passes monitor_printf() cast to fprintf_function and the current
monitor cast to FILE *.
qemu-img.c and qemu-io-cmds.c pass fprintf and stdout.
The type-punning is technically undefined behaviour, but works in
practice. Clean up: drop the callback, and call qemu_printf()
instead.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190417191805.28198-8-armbru@redhat.com>
qsp_report() takes an fprintf()-like callback and a FILE * to pass to
it.
Its only caller hmp_sync_profile() passes monitor_fprintf() and the
current monitor cast to FILE *. monitor_fprintf() casts it right
back, and is otherwise identical to monitor_printf(). The
type-punning is ugly.
Drop the callback, and call qemu_printf() instead.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190417191805.28198-7-armbru@redhat.com>
dump_drift_info() takes an fprintf()-like callback and a FILE * to pass
to it.
Its only caller hmp_info_jit() passes monitor_fprintf() and a Monitor
* cast to FILE *. monitor_fprintf() casts it right back, and is
otherwise identical to monitor_printf(). The type-punning is ugly.
Drop the callback, and call qemu_printf() instead.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190417191805.28198-6-armbru@redhat.com>
dump_exec_info() takes an fprintf()-like callback and a FILE * to pass
to it.
Its only caller hmp_info_jit() passes monitor_fprintf() and the
current monitor cast to FILE *. monitor_fprintf() casts it right
back, and is otherwise identical to monitor_printf(). The
type-punning is ugly.
Drop the callback, and call qemu_printf() instead.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190417191805.28198-5-armbru@redhat.com>
dump_opcount_info() takes an fprintf()-like callback and a FILE * to
pass to it.
Its only caller hmp_info_opcount() passes monitor_fprintf() and the
current monitor cast to FILE *. monitor_fprintf() casts it right
back, and is otherwise identical to monitor_printf(). The
type-punning is ugly.
Drop the callback, and call qemu_printf() instead.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190417191805.28198-4-armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190417191805.28198-2-armbru@redhat.com>
Commit a95db58f21 added monitor_vfprintf() as an error_printf()
generalized from stderr to arbitrary streams, then used it wrapped in
helper out_printf() to print -device/device_add help to stdout. Use
qemu_printf() instead, and delete monitor_vfprintf() and out_printf().
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190417190641.26814-16-armbru@redhat.com>
We commonly want to print to the current monitor if we have one, else
to stdout/stderr. For stderr, have error_printf(). For stdout, all
we have is monitor_vfprintf(), which is rather unwieldy. We often
print to stderr just because error_printf() is easier.
New qemu_printf() and qemu_vprintf() do exactly what's needed. The
next commits will put them to use.
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190417190641.26814-12-armbru@redhat.com>
printf() & friends return the number of characters written on success,
negative value on error.
monitor_printf(), monitor_vfprintf(), monitor_vprintf(),
error_printf(), error_printf_unless_qmp(), error_vprintf(), and
error_vprintf_unless_qmp() return void. Some of them carry a TODO
comment asking for int instead.
Improve them to return int like printf() does.
This makes our use of monitor_printf() as fprintf_function slightly
less dirty: the function cast no longer adds a return value that isn't
there. It still changes a parameter's pointer type. That will be
addressed in a future commit.
monitor_vfprintf() always returns zero. Improve it to return the
proper value.
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190417190641.26814-11-armbru@redhat.com>
This commit adds a error_init() helper which calls
g_log_set_default_handler() so that glib logs (g_log, g_warning, ...)
are handled similarly to other QEMU logs. This means they will get a
timestamp if timestamps are enabled, and they will go through the
HMP monitor if one is configured.
This commit also adds a call to error_init() to the binaries
installed by QEMU. Since error_init() also calls error_set_progname(),
this means that *-linux-user, *-bsd-user and qemu-pr-helper messages
output with error_report, info_report, ... will slightly change: they
will be prefixed by the binary name.
glib debug messages are enabled through G_MESSAGES_DEBUG similarly to
the glib default log handler.
At the moment, this change will mostly impact SPICE logging if your
spice version is >= 0.14.1. With older spice versions, this is not going
to work as expected, but will not have any ill effect, so this call is
not conditional on the SPICE version.
Signed-off-by: Christophe Fergeau <cfergeau@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20190131164614.19209-3-cfergeau@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Add bootindex property and iplb data for vfio-ccw devices. This allows us to
forward boot information into the bios for vfio-ccw devices.
Refactor s390_get_ccw_device() to return device type. This prevents us from
having to use messy casting logic in several places.
Signed-off-by: Jason J. Herne <jjherne@linux.ibm.com>
Acked-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1554388475-18329-2-git-send-email-jjherne@linux.ibm.com>
[thuth: fixed "typedef struct VFIOCCWDevice" build failure with clang]
Signed-off-by: Thomas Huth <thuth@redhat.com>
In the accessor functions ld*_he_p() and st*_he_p() we use memcpy()
to perform a load or store to a pointer which might not be aligned
for the size of the type. We rely on the compiler to optimize this
memcpy() into an efficient load or store instruction where possible.
This is required for good performance, but at the moment it is also
required for correct operation, because some users of these functions
require that the access is atomic if the pointer is aligned, which
will only be the case if the compiler has optimized out the memcpy().
(The particular example where we discovered this is the virtio
vring_avail_idx() which calls virtio_lduw_phys_cached() which
eventually ends up calling lduw_he_p().)
Unfortunately some compile environments, such as the fortify-source
setup used in Alpine Linux, define memcpy() to a wrapper function
in a way that inhibits this compiler optimization.
The correct long-term fix here is to add a set of functions for
doing atomic accesses into AddressSpaces (and to other relevant
families of accessor functions like the virtio_*_phys_cached()
ones), and make sure that callsites which want atomic behaviour
use the correct functions.
In the meantime, switch to using __builtin_memcpy() in the
bswap.h accessor functions. This will make us robust against things
like this fortify library in the short term. In the longer term
it will mean that we don't end up with these functions being really
badly-performing even if the semantics of the out-of-line memcpy()
are correct.
Reported-by: Fernando Casas Schössow <casasfernando@outlook.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20190318112938.8298-1-peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Some PHB implementations, eg. PAPR used on pseries machine, act like
a regular PCI bus rather than a PCIe bus, but allow access to the
PCIe extended config space anyway.
Introduce a new PCI bus class method to modelize this behaviour and
use it when adjusting the config space size limit during accesses.
No behaviour change for existing PCI bus types.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155414130271.574858.4253514266378127489.stgit@bahia.lan>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This patch fixes four different things, to maintain bisectability they
have been merged into a single patch. The following fixes are below:
sifive_plic: Fix incorrect irq calculation
The irq is incorrectly calculated to be off by one. It has worked in the
past as the priority_base offset has also been set incorrectly. We are
about to fix the priority_base offset so first first the irq
calculation.
sifive_u: Fix PLIC priority base offset and numbering
According to the FU540 manual the PLIC source priority address starts at
an offset of 0x04 and not 0x00. The same manual also specifies that the
PLIC only has 53 source priorities. Fix these two incorrect header
files.
We also need to over extend the plic_gpios[] array as the PLIC sources
count from 1 and not 0.
riscv: sifive_e: Fix PLIC priority base offset
According to the FE31 manual the PLIC source priority address starts at
an offset of 0x04 and not 0x00.
riscv: virt: Fix PLIC priority base offset
Update the virt offsets based on the newly updated SiFive U and SiFive E
offsets.
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
VTD_RTADDR_RTT is dropped even by the VT-d spec, so QEMU should
probably do the same thing (after all we never really implemented it).
Since we've had a field for that in the migration stream, to keep
compatibility we need to fill the hole up.
Please refer to VT-d spec 10.4.6.
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20190329061422.7926-3-peterx@redhat.com>
Reviewed-by: Liu, Yi L <yi.l.liu@intel.com>
Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Watch IDs are allocated from incrementing a int counter against
the QFileMonitor object. In very long life QEMU processes with
a huge amount of USB MTP activity creating & deleting directories
it is just about conceivable that the int counter can wrap
around. This would result in incorrect behaviour of the file
monitor watch APIs due to clashing watch IDs.
Instead of trying to detect this situation, this patch changes
the way watch IDs are allocated. It is turned into an int64_t
variable where the high 32 bits are set from the underlying
inotify "int" ID. This gives an ID that is guaranteed unique
for the directory as a whole, and we can rely on the kernel
to enforce this. QFileMonitor then sets the low 32 bits from
a per-directory counter.
The USB MTP device only sets watches on the directory as a
whole, not files within, so there is no risk of guest
triggered wrap around on the low 32 bits.
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
This reverts commit 3df663e575.
This reverts commit b605c47b57.
Command line option --only-migratable is for disallowing any
configuration that can block migration.
Initially, --only-migratable set global variable @only_migratable.
Commit 3df663e575 "migration: move only_migratable to MigrationState"
replaced it by MigrationState member @only_migratable. That was a
mistake.
First, it doesn't make sense on the design level. MigrationState
captures the state of an individual migration, but --only-migratable
isn't a property of an individual migration, it's a restriction on
QEMU configuration. With fault tolerance, we could have several
migrations at once. --only-migratable would certainly protect all of
them. Storing it in MigrationState feels inappropriate.
Second, it contributes to a dependency cycle that manifests itself as
a bug now.
Putting @only_migratable into MigrationState means its available only
after migration_object_init().
We can't set it before migration_object_init(), so we delay setting it
with a global property (this is fixup commit b605c47b57 "migration:
fix handling for --only-migratable").
We can't get it before migration_object_init(), so anything that uses
it can only run afterwards.
Since migrate_add_blocker() needs to obey --only-migratable, any code
adding migration blockers can run only afterwards. This contributes
to the following dependency cycle:
* configure_blockdev() must run before machine_set_property()
so machine properties can refer to block backends
* machine_set_property() before configure_accelerator()
so machine properties like kvm-irqchip get applied
* configure_accelerator() before migration_object_init()
so that Xen's accelerator compat properties get applied.
* migration_object_init() before configure_blockdev()
so configure_blockdev() can add migration blockers
The cycle was closed when recent commit cda4aa9a5a "Create block
backends before setting machine properties" added the first
dependency, and satisfied it by violating the last one. Broke block
backends that add migration blockers.
Moving @only_migratable into MigrationState was a mistake. Revert it.
This doesn't quite break the "migration_object_init() before
configure_blockdev() dependency, since migrate_add_blocker() still has
another dependency on migration_object_init(). To be addressed the
next commit.
Note that the reverted commit made -only-migratable sugar for -global
migration.only-migratable=on below the hood. Documentation has only
ever mentioned -only-migratable. This commit removes the arcane &
undocumented alternative to -only-migratable again. Nobody should be
using it.
Conflicts:
include/migration/misc.h
migration/migration.c
migration/migration.h
vl.c
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190401090827.20793-3-armbru@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
The next patch needs access to a device's minimum permitted
alignment, since NBD wants to advertise this to clients. Add
an accessor function, borrowing from blk_get_max_transfer()
for accessing a backend's block limits.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20190329042750.14704-6-eblake@redhat.com>
27461d69a0 "ppc: add host-serial and host-model machine attributes
(CVE-2019-8934)" introduced 'host-serial' and 'host-model' machine
properties for spapr to explicitly control the values advertised to the
guest in device tree properties with the same names.
The previous behaviour on KVM was to unconditionally populate the device
tree with the real host serial number and model, which leaks possibly
sensitive information about the host to the guest.
To maintain compatibility for old machine types, we allowed those props
to be set to "passthrough" to take the value from the host as before. Or
they could be set to "none" to explicitly omit the device tree items.
Special casing specific values on what's otherwise a user supplied string
is very ugly. So, this patch simplifies things by implementing the
backwards compatibility in a different way: we have a machine class flag
set for the older machines, and we only load the host values into the
device tree if A) they're not set by the user and B) we have that flag set.
This does mean that the "passthrough" functionality is no longer available
with the current machine type. That's ok though: if a user or management
layer really wants the information passed through they can read it
themselves (OpenStack Nova already does something similar for x86).
It also means the user can't explicitly ask for the values to be omitted
on the old machine types. I think that's an acceptable trade-off: if you
care enough about not leaking the host information you can either move to
the new machine type, or use a dummy value for the properties.
For the new machine type, this also removes an odd inconsistency
between running on a POWER and non-POWER (or non-Linux) hosts: if the
host information couldn't be read from where we expect (in the host's
device tree as exposed by Linux), we'd fallback to omitting the guest
device tree items.
While we're there, improve some poorly worded comments, and the help text
for the properties.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Greg Kurz <groug@kaod.org>
We know that the kernel implements a slow fallback code path for
BLKZEROOUT, so if BDRV_REQ_NO_FALLBACK is given, we shouldn't call it.
The other operations we call in the context of .bdrv_co_pwrite_zeroes
should usually be quick, so no modification should be needed for them.
If we ever notice that there are additional problematic cases, we can
still make these conditional as well.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Eric Blake <eblake@redhat.com>
For qemu-img convert, we want an operation that zeroes out the whole
image if this can be done efficiently, but that returns an error
otherwise so we don't write explicit zeroes and immediately overwrite
them with the real data, potentially doubling the amount of data to be
written.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Eric Blake <eblake@redhat.com>
We reject undersized backends with a rather enigmatic "failed to read
the initial flash content" error. For instance:
$ qemu-system-ppc64 -S -display none -M sam460ex -drive if=pflash,format=raw,file=eins.img
qemu-system-ppc64: Initialization of device cfi.pflash02 failed: failed to read the initial flash content
We happily accept oversized images, ignoring their tail. Throwing
away parts of firmware that way is pretty much certain to end in an
even more enigmatic failure to boot.
Require the backend's size to match the device's size exactly. Report
mismatch like this:
qemu-system-ppc64: Initialization of device cfi.pflash01 failed: device requires 1048576 bytes, block backend provides 512 bytes
Improve the error for actual read failures to "can't read block
backend".
To avoid duplicating even more code between the two pflash device
models, do all that in new helper blk_check_size_and_read_all().
The error reporting can still be confusing. For instance:
qemu-system-ppc64 -S -display none -M taihu -drive if=pflash,format=raw,file=eins.img -drive if=pflash,unit=1,format=raw,file=zwei.img
qemu-system-ppc64: Initialization of device cfi.pflash02 failed: device requires 2097152 bytes, block backend provides 512 bytes
Leaves the user guessing which of the two -drive is wrong. Mention
the issue in a TODO comment.
Suggested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190319163551.32499-2-armbru@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
TYPE_QAUTHZ is an abstract object of type TYPE_OBJECT. All other
are children of TYPE_QAUTHZ, thus also objects.
Keep INTERFACE_CHECK() for interfaces, and use OBJECT_CHECK() on
objects.
Reported-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Previously we have per-device system memory aliases when DMAR is
disabled by the system. It will slow the system down if there are
lots of devices especially when DMAR is disabled, because each of the
aliased system address space will contain O(N) slots, and rendering
such N address spaces will be O(N^2) complexity.
This patch introduces a shared nodmar memory region and for each
device we only create an alias to the shared memory region. With the
aliasing, QEMU memory core API will be able to detect when devices are
sharing the same address space (which is the nodmar address space)
when rendering the FlatViews and the total number of FlatViews can be
dramatically reduced when there are a lot of devices.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20190313094323.18263-1-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Refer to the RISC-V PSABI specification for details:
- https://github.com/riscv/riscv-elf-psabi-doc/blob/master/riscv-elf.md
Cc: Michael Tokarev <mjt@tls.msk.ru>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Alistair Francis <Alistair.Francis@wdc.com>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Michael Clark <mjc@sifive.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
If renderer_blocked is set do not call virtio_gpu_virgl_reset().
Instead set a flag indicating that virglrenderer needs a reset.
When renderer_blocked gets cleared do the actual reset call.
Without this we can trigger an assert in spice due to calling
spice_qxl_gl_scanout() while another operation is still running:
spice_qxl_gl_scanout: condition `qxl_state->gl_draw_cookie == GL_DRAW_COOKIE_INVALID' failed
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20190314115358.26678-2-kraxel@redhat.com
Allow interrogating device internals through HMP interface.
The exposed indicators can be used for troubleshooting by developers or
sysadmin.
There is no need to expose these attributes to a management system (e.x.
libvirt) because (1) most of them are not "device-management' related
info and (2) there is no guarantee the interface is stable.
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1552300155-25216-6-git-send-email-yuval.shaia@oracle.com>
Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
The BCM2836 control logic module includes a simple
"local timer" which is a programmable down-counter that
can generates an interrupt. Implement this functionality.
Signed-off-by: Zoltán Baldaszti <bztemail@gmail.com>
[PMM: wrote commit message; wrapped long line; tweaked
some comments to match the final version of the code]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
- file-posix: Make auto-read-only dynamic
- Add x-blockdev-reopen QMP command
- Finalize block-latency-histogram QMP command
- gluster: Build fixes for newer lib version
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJciAjXAAoJEH8JsnLIjy/WwMcP/3VawpEBO4I94gp7KNUO1yZa
rW7Rk0VO0gcvjk4fyTpQ1I3U2dfX6NwZLFMrk4oUS382QcKL9ky/TXVeeCaWxYxi
51OH6+wHQKu1MuAjM9acXRD59pfOwmI6wbKAgrungeFzHF3TvCYcLD0rY9Mhz1wp
Q7Oqkk2au6cFrmqZChCF2S5guZc0JOuwzd+LdDshRNDek2Px8a3etVq37VBUuxzK
WDvIws1IZkFI5y2WE3T8kn7YJ8NgMZ1p47tgkymDX7fkn3V766tec8ZYBy1Qz9ab
+I3UlzijuXB8vq+egEtzQfJvvTyoPrb65VFjW94ITu9onuclYo1oV5XVgx2c/NiR
WnUagbu9nft1E4+zmSrVB3Y4I7Pbwi+At/2L2dMQXIrrebK50Cqg8GW2fthhq/KM
5NavsqgdH14gOGS1yUGu06J0HO87XiJUKta4Th9M6iKvcGJqZ+F1WZGSiVEhhk1W
w0FSmWdB/XdUwWoWdQnfx8d43OK34Q3spqsAe59DvKKyORw8Uug1i43yWvSepwSf
SILmRYLOpMIrKUihh9NPIxB6QCzv/Mt7WjSEtaif+EXgQIGZhoQBjWpde0h5Dwh5
DrVz6NqDNnz0VDARPq2hgWOs4RlNpMvx5eZFJcK66RsOfLom3CL7mpnVaa+jd6e6
Nf0HPh/ubFB0IeEYbV0n
=dQt+
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer patches:
- file-posix: Make auto-read-only dynamic
- Add x-blockdev-reopen QMP command
- Finalize block-latency-histogram QMP command
- gluster: Build fixes for newer lib version
# gpg: Signature made Tue 12 Mar 2019 19:30:31 GMT
# gpg: using RSA key 7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6
* remotes/kevin/tags/for-upstream: (28 commits)
qemu-iotests: Test the x-blockdev-reopen QMP command
block: Add an 'x-blockdev-reopen' QMP command
block: Remove the AioContext parameter from bdrv_reopen_multiple()
block: Add bdrv_reset_options_allowed()
block: Add a 'mutable_opts' field to BlockDriver
block: Allow changing the backing file on reopen
block: Allow omitting the 'backing' option in certain cases
block: Handle child references in bdrv_reopen_queue()
block: Add 'keep_old_opts' parameter to bdrv_reopen_queue()
block: Freeze the backing chain for the duration of the stream job
block: Freeze the backing chain for the duration of the mirror job
block: Freeze the backing chain for the duration of the commit job
block: Allow freezing BdrvChild links
nvme: fix write zeroes offset and count
file-posix: Make auto-read-only dynamic
file-posix: Prepare permission code for fd switching
file-posix: Lock new fd in raw_reopen_prepare()
file-posix: Store BDRVRawState.reopen_state during reopen
file-posix: Factor out raw_reconfigure_getfd()
file-posix: Fix bdrv_open_flags() for snapshot=on
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Currently we do device realization like below:
hotplug_handler_pre_plug()
dc->realize()
hotplug_handler_plug()
Before we do device realization and plug, we should allocate necessary
resources and check if memory-hotplug-support property is enabled.
At the piix4 and ich9, the memory-hotplug-support property is checked at
plug stage. This means that device has been realized and mapped into guest
address space 'pc_dimm_plug()' by the time acpi plug handler is called,
where it might fail and crash QEMU due to reaching g_assert_not_reached()
(piix4) or error_abort (ich9).
Fix it by checking if memory hotplug is enabled at pre_plug stage
where we can gracefully abort hotplug request.
Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
CC: Igor Mammedov <imammedo@redhat.com>
CC: Eric Blake <eblake@redhat.com>
Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Message-Id: <20190301033548.6691-1-richardw.yang@linux.intel.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Claim ACS support in the generic PCIe root port to allow
passthrough of individual functions of a device to different
guests (in a nested virt.setting) with VFIO.
Without this patch, all functions of a device, such as all VFs of
an SR/IOV device, will end up in the same IOMMU group.
A similar situation occurs on Windows with Hyper-V.
In the single function device case, it also has a small cosmetic
benefit in that the root port itself is not grouped with
the device. VFIO handles that situation in that binding rules
only apply to endpoints, so it does not limit passthrough in
those cases.
Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Message-Id: <319460b483f566dd57487eb3dd340ed4c10aa53c.1550768238.git-series.knut.omang@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Implementing an ACS capability on downstream ports and multifunction
endpoints indicates isolation and IOMMU visibility to a finer
granularity. This creates smaller IOMMU groups in the guest and thus
more flexibility in assigning endpoints to guest userspace or an L2
guest.
Signed-off-by: Knut Omang <knut.omang@oracle.com>
Message-Id: <07489975121696f5573b0a92baaf3486ef51e35d.1550768238.git-series.knut.omang@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
This patch adds support for vhost-user-blk device to get/set
inflight buffer from/to backend.
Signed-off-by: Xie Yongji <xieyongji@baidu.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Message-Id: <20190228085355.9614-6-xieyongji@baidu.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This patch introduces two new messages VHOST_USER_GET_INFLIGHT_FD
and VHOST_USER_SET_INFLIGHT_FD to support transferring a shared
buffer between qemu and backend.
Firstly, qemu uses VHOST_USER_GET_INFLIGHT_FD to get the
shared buffer from backend. Then qemu should send it back
through VHOST_USER_SET_INFLIGHT_FD each time we start vhost-user.
This shared buffer is used to track inflight I/O by backend.
Qemu should retrieve a new one when vm reset.
Signed-off-by: Xie Yongji <xieyongji@baidu.com>
Signed-off-by: Chai Wen <chaiwen@baidu.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Message-Id: <20190228085355.9614-2-xieyongji@baidu.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This patch adds an option to provide flexibility for user to expose
Scalable Mode to guest. User could expose Scalable Mode to guest by
the config as below:
"-device intel-iommu,caching-mode=on,scalable-mode=on"
The Linux iommu driver has supported scalable mode. Please refer below
patch set:
https://www.spinics.net/lists/kernel/msg2985279.html
Signed-off-by: Liu, Yi L <yi.l.liu@intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Message-Id: <1551753295-30167-4-git-send-email-yi.y.sun@linux.intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Per Intel(R) VT-d 3.0, the qi_desc is 256 bits in Scalable
Mode. This patch adds emulation of 256bits qi_desc.
Signed-off-by: Liu, Yi L <yi.l.liu@intel.com>
[Yi Sun is co-developer to rebase and refine the patch.]
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <1551753295-30167-3-git-send-email-yi.y.sun@linux.intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Intel(R) VT-d 3.0 spec introduces scalable mode address translation to
replace extended context mode. This patch extends current emulator to
support Scalable Mode which includes root table, context table and new
pasid table format change. Now intel_iommu emulates both legacy mode
and scalable mode (with legacy-equivalent capability set).
The key points are below:
1. Extend root table operations to support both legacy mode and scalable
mode.
2. Extend context table operations to support both legacy mode and
scalable mode.
3. Add pasid tabled operations to support scalable mode.
Signed-off-by: Liu, Yi L <yi.l.liu@intel.com>
[Yi Sun is co-developer to contribute much to refine the whole commit.]
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Message-Id: <1551753295-30167-2-git-send-email-yi.y.sun@linux.intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Take a VhostUserState* that can be pre-allocated, and initialize it
with the associated chardev.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Message-Id: <20190308140454.32437-4-marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
If we reopen a BlockDriverState and there is an option that is present
in bs->options but missing from the new set of options then we have to
return an error unless the driver is able to reset it to its default
value.
This patch adds a new 'mutable_opts' field to BlockDriver. This is
a list of runtime options that can be modified during reopen. If an
option in this list is unspecified on reopen then it must be reset (or
return an error).
Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch allows the user to change the backing file of an image that
is being reopened. Here's what it does:
- In bdrv_reopen_prepare(): check that the value of 'backing' points
to an existing node or is null. If it points to an existing node it
also needs to make sure that replacing the backing file will not
create a cycle in the node graph (i.e. you cannot reach the parent
from the new backing file).
- In bdrv_reopen_commit(): perform the actual node replacement by
calling bdrv_set_backing_hd().
There may be temporary implicit nodes between a BDS and its backing
file (e.g. a commit filter node). In these cases bdrv_reopen_prepare()
looks for the real (non-implicit) backing file and requires that the
'backing' option points to it. Replacing or detaching a backing file
is forbidden if there are implicit nodes in the middle.
Although x-blockdev-reopen is meant to be used like blockdev-add,
there's an important thing that must be taken into account: the only
way to set a new backing file is by using a reference to an existing
node (previously added with e.g. blockdev-add). If 'backing' contains
a dictionary with a new set of options ({"driver": "qcow2", "file": {
... }}) then it is interpreted that the _existing_ backing file must
be reopened with those options.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Children in QMP are specified with BlockdevRef / BlockdevRefOrNull,
which can contain a set of child options, a child reference, or
NULL. In optional attributes like "backing" it can also be missing.
Only the first case (set of child options) is being handled properly
by bdrv_reopen_queue(). This patch deals with all the others.
Here's how these cases should be handled when bdrv_reopen_queue() is
deciding what to do with each child of a BlockDriverState:
1) Set of child options: if the child was implicitly created (i.e
inherits_from points to the parent) then the options are removed
from the parent's options QDict and are passed to the child with
a recursive bdrv_reopen_queue() call. This case was already
working fine.
2) Child reference: there's two possibilites here.
2a) Reference to the current child: if the child was implicitly
created then it is put in the reopen queue, keeping its
current set of options (since this was a child reference
there was no way to specify a different set of options).
If the child is not implicit then it keeps its current set
of options but it is not reopened (and therefore does not
inherit any new option from the parent).
2b) Reference to a different BDS: the current child is not put
in the reopen queue at all. Passing a reference to a
different BDS can be used to replace a child, although at
the moment no driver implements this, so it results in an
error. In any case, the current child is not going to be
reopened (and might in fact disappear if it's replaced)
3) NULL: This is similar to (2b). Although no driver allows this
yet it can be used to detach the current child so it should not
be put in the reopen queue.
4) Missing option: at the moment "backing" is the only case where
this can happen. With "blockdev-add", leaving "backing" out
means that the default backing file is opened. We don't want to
open a new image during reopen, so we require that "backing" is
always present. We'll relax this requirement a bit in the next
patch. If keep_old_opts is true and "backing" is missing then
this behaves like 2a (the current child is reopened).
Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The bdrv_reopen_queue() function is used to create a queue with
the BDSs that are going to be reopened and their new options. Once
the queue is ready bdrv_reopen_multiple() is called to perform the
operation.
The original options from each one of the BDSs are kept, with the new
options passed to bdrv_reopen_queue() applied on top of them.
For "x-blockdev-reopen" we want a function that behaves much like
"blockdev-add". We want to ignore the previous set of options so that
only the ones actually specified by the user are applied, with the
rest having their default values.
One of the things that we need is a way to tell bdrv_reopen_queue()
whether we want to keep the old set of options or not, and that's what
this patch does. All current callers are setting this new parameter to
true and x-blockdev-reopen will set it to false.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Our permission system is useful to define what operations are allowed
on a certain block node and includes things like BLK_PERM_WRITE or
BLK_PERM_RESIZE among others.
One of the permissions is BLK_PERM_GRAPH_MOD which allows "changing
the node that this BdrvChild points to". The exact meaning of this has
never been very clear, but it can be understood as "change any of the
links connected to the node". This can be used to prevent changing a
backing link, but it's too coarse.
This patch adds a new 'frozen' attribute to BdrvChild, which forbids
detaching the link from the node it points to, and new API to freeze
and unfreeze a backing chain.
After this change a few functions can fail, so they need additional
checks.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Commit a88b179f introduced the ability to set and query bitmap
persistence, but with an atypical spelling.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 20190308205845.25734-1-eblake@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
Instead of checking against busy, inconsistent, or read only directly,
use a check function with permissions bits that let us streamline the
checks without reproducing them in many places.
Included in this patch are permissions changes that simply add the
inconsistent check to existing permissions call spots, without
addressing existing bugs.
In general, this means that busy+readonly checks become BDRV_BITMAP_DEFAULT,
which checks against all three conditions. busy-only checks become
BDRV_BITMAP_ALLOW_RO.
Notably, remove allows inconsistent bitmaps, so it doesn't follow the pattern.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190301191545.8728-4-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
Add an inconsistent bit to dirty-bitmaps that allows us to report a bitmap as
persistent but potentially inconsistent, i.e. if we find bitmaps on a qcow2
that have been marked as "in use".
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190301191545.8728-2-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
These mean the same thing now. Unify them and rename the merged call
bdrv_dirty_bitmap_busy to indicate semantically what we are describing,
as well as help disambiguate from the various _locked and _unlocked
versions of bitmap helpers that refer to mutex locks.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190223000614.13894-8-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
"Frozen" was a good description a long time ago, but it isn't adequate now.
Rename the frozen predicate to has_successor to make the semantics of the
predicate more clear to outside callers.
In the process, remove some calls to frozen() that no longer semantically
make sense. For bdrv_enable_dirty_bitmap_locked and
bdrv_disable_dirty_bitmap_locked, it doesn't make sense to prohibit QEMU
internals from performing this action when we only wished to prohibit QMP
users from issuing these commands. All of the QMP API commands for bitmap
manipulation already check against user_locked() to prohibit these actions.
Several other assertions really want to check that the bitmap isn't in-use
by another operation -- use the bitmap_user_locked function for this instead,
which presently also checks for has_successor. This leaves some redundant
checks of has_successor through different helpers that are addressed in
forthcoming patches.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190223000614.13894-3-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
This makes vfio_get_region_info_cap() to be used in quirks.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Message-Id: <20190307050518.64968-3-aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The qemu coding standard is to use CamelCase for type and structure names,
and the pseries code follows that... sort of. There are quite a lot of
places where we bend the rules in order to preserve the capitalization of
internal acronyms like "PHB", "TCE", "DIMM" and most commonly "sPAPR".
That was a bad idea - it frequently leads to names ending up with hard to
read clusters of capital letters, and means they don't catch the eye as
type identifiers, which is kind of the point of the CamelCase convention in
the first place.
In short, keeping type identifiers look like CamelCase is more important
than preserving standard capitalization of internal "words". So, this
patch renames a heap of spapr internal type names to a more standard
CamelCase.
In addition to case changes, we also make some other identifier renames:
VIOsPAPR* -> SpaprVio*
The reverse word ordering was only ever used to mitigate the capital
cluster, so revert to the natural ordering.
VIOsPAPRVTYDevice -> SpaprVioVty
VIOsPAPRVLANDevice -> SpaprVioVlan
Brevity, since the "Device" didn't add useful information
sPAPRDRConnector -> SpaprDrc
sPAPRDRConnectorClass -> SpaprDrcClass
Brevity, and makes it clearer this is the same thing as a "DRC"
mentioned in many other places in the code
This is 100% a mechanical search-and-replace patch. It will, however,
conflict with essentially any and all outstanding patches touching the
spapr code.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The POWER9 processor does not support per-core frequency control. The
cores are arranged in groups of four, along with their respective L2
and L3 caches, into a structure known as a Quad. The frequency must be
managed at the Quad level.
Provide a basic Quad model to fake the settings done by the firmware
on the Non-Cacheable Unit (NCU). Each core pair (EX) needs a special
BAR setting for the TIMA area of XIVE because it resides on the same
address on all chips.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190307223548.20516-12-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Provide a new class attribute to define XSCOM operations per CPU
family and add a couple of XSCOM addresses controlling the power
management states of the core on POWER9.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190307223548.20516-11-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The OCC on POWER9 is very similar to the one found on POWER8. Provide
the same routines with P9 values for the registers and IRQ number.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190307223548.20516-10-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
To ease the introduction of the OCC model for POWER9, provide a new
class attributes to define XSCOM operations per CPU family and a PSI
IRQ number.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20190307223548.20516-9-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This is just a simple reminder that SerIRQ routing should be
addressed.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190307223548.20516-8-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The LPC Controller on POWER9 is very similar to the one found on
POWER8 but accesses are now done via on MMIOs, without the XSCOM and
ECCB logic. The device tree is populated differently so we add a
specific POWER9 routine for the purpose.
SerIRQ routing is yet to be done.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190307223548.20516-7-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The ISA bus has a different DT nodename on POWER9. Compute the name
when the PnvChip is realized, that is before it is used by the machine
to populate the device tree with the ISA devices.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190307223548.20516-6-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
It will ease the introduction of the LPC Controller model for POWER9.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20190307223548.20516-5-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The PSI bridge on POWER9 is very similar to POWER8. The BAR is still
set through XSCOM but the controls are now entirely done with MMIOs.
More interrupts are defined and the interrupt controller interface has
changed to XIVE. The POWER9 model is a first example of the usage of
the notify() handler of the XiveNotifier interface, linking the PSI
XiveSource to its owning device model.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190307223548.20516-3-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
To ease the introduction of the PSI bridge model for POWER9, abstract
the POWER chip differences in a PnvPsi class model and introduce a
specific Pnv8Psi type for POWER8. POWER8 interface to the interrupt
controller is still XICS whereas POWER9 uses the new XIVE model.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190307223548.20516-2-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
On sPAPR vfio_listener_region_add() is called in 2 situations:
1. a new listener is registered from vfio_connect_container();
2. a new IOMMU Memory Region is added from rtas_ibm_create_pe_dma_window().
In both cases vfio_listener_region_add() calls
memory_region_iommu_replay() to notify newly registered IOMMU notifiers
about existing mappings which is totally desirable for case 1.
However for case 2 it is nothing but noop as the window has just been
created and has no valid mappings so replaying those does not do anything.
It is barely noticeable with usual guests but if the window happens to be
really big, such no-op replay might take minutes and trigger RCU stall
warnings in the guest.
For example, a upcoming GPU RAM memory region mapped at 64TiB (right
after SPAPR_PCI_LIMIT) causes a 64bit DMA window to be at least 128TiB
which is (128<<40)/0x10000=2.147.483.648 TCEs to replay.
This mitigates the problem by adding an "skipping_replay" flag to
sPAPRTCETable and defining sPAPR own IOMMU MR replay() hook which does
exactly the same thing as the generic one except it returns early if
@skipping_replay==true.
Another way of fixing this would be delaying replay till the very first
H_PUT_TCE but this does not work if in-kernel H_PUT_TCE handler is
enabled (a likely case).
When "ibm,create-pe-dma-window" is complete, the guest will map only
required regions of the huge DMA window.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20190307050518.64968-2-aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The POWER9 and POWER8 processors have different interrupt controllers,
and reporting their state requires calling different helper routines.
However, the interrupt presenters are still handled in the higher
level pic_print_info() routine because they are not related to the
chip.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190306085032.15744-9-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The POWER9 and POWER8 processors have a different set of devices and a
different device tree layout.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190306085032.15744-8-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This is a simple model of the POWER9 XIVE interrupt controller for the
PowerNV machine which only addresses the needs of the skiboot
firmware. The PowerNV model reuses the common XIVE framework developed
for sPAPR as the fundamentals aspects are quite the same. The
difference are outlined below.
The controller initial BAR configuration is performed using the XSCOM
bus from there, MMIO are used for further configuration.
The MMIO regions exposed are :
- Interrupt controller registers
- ESB pages for IPIs and ENDs
- Presenter MMIO (Not used)
- Thread Interrupt Management Area MMIO, direct and indirect
The virtualization controller MMIO region containing the IPI ESB pages
and END ESB pages is sub-divided into "sets" which map portions of the
VC region to the different ESB pages. These are modeled with custom
address spaces and the XiveSource and XiveENDSource objects are sized
to the maximum allowed by HW. The memory regions are resized at
run-time using the configuration of EDT set translation table provided
by the firmware.
The XIVE virtualization structure tables (EAT, ENDT, NVTT) are now in
the machine RAM and not in the hypervisor anymore. The firmware
(skiboot) configures these tables using Virtual Structure Descriptor
defining the characteristics of each table : SBE, EAS, END and
NVT. These are later used to access the virtual interrupt entries. The
internal cache of these tables in the interrupt controller is updated
and invalidated using a set of registers.
Still to address to complete the model but not fully required is the
support for block grouping. Escalation support will be necessary for
KVM guests.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190306085032.15744-7-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The POWER9 PowerNV machine will use a XIVE interrupt presenter type.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190306085032.15744-6-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The PowerNV machine with need to encode the block id in the source
interrupt number before forwarding the source event notification to
the Router.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190306085032.15744-5-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The PowerNV machine can perform indirect loads and stores on the TIMA
on behalf of another CPU. Give the controller the possibility to call
the TIMA memory accessors with a XiveTCTX of its choice.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190306085032.15744-4-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We will use it to get the CPU interrupt presenter in XIVE when the
TIMA is accessed from the indirect page.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190306085032.15744-3-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
SPAPR_MEMORY_BLOCK_SIZE is logically a difference in memory addresses, and
hence of type hwaddr which is 64-bit. Previously it wasn't marked as such
which means that it could be treated as 32-bit. That will work in some
circumstances but if multiplied by another 32-bit value it could lead to
a 32-bit overflow and an incorrect result.
One specific instance of this in spapr_lmb_dt_populate() was spotted by
Coverity (CID 1399145).
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Introduce a new spapr_cap SPAPR_CAP_CCF_ASSIST to be used to indicate
the requirement for a hw-assisted version of the count cache flush
workaround.
The count cache flush workaround is a software workaround which can be
used to flush the count cache on context switch. Some revisions of
hardware may have a hardware accelerated flush, in which case the
software flush can be shortened. This cap is used to set the
availability of such hardware acceleration for the count cache flush
routine.
The availability of such hardware acceleration is indicated by the
H_CPU_CHAR_BCCTR_FLUSH_ASSIST flag being set in the characteristics
returned from the KVM_PPC_GET_CPU_CHAR ioctl.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Message-Id: <20190301031912.28809-2-sjitindarsingh@gmail.com>
[dwg: Small style fixes]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The spapr_cap SPAPR_CAP_IBS is used to indicate the level of capability
for mitigations for indirect branch speculation. Currently the available
values are broken (default), fixed-ibs (fixed by serialising indirect
branches) and fixed-ccd (fixed by diabling the count cache).
Introduce a new value for this capability denoted workaround, meaning that
software can work around the issue by flushing the count cache on
context switch. This option is available if the hypervisor sets the
H_CPU_BEHAV_FLUSH_COUNT_CACHE flag in the cpu behaviours returned from
the KVM_PPC_GET_CPU_CHAR ioctl.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Message-Id: <20190301031912.28809-1-sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Add spapr_cap SPAPR_CAP_LARGE_DECREMENTER to be used to control the
availability of the large decrementer for a guest.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Message-Id: <20190301024317.22137-1-sjitindarsingh@gmail.com>
[dwg: Trivial style fix]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The PC machines put firmware in ROM by default. To get it put into
flash memory (required by OVMF), you have to use -drive
if=pflash,unit=0,... and optionally -drive if=pflash,unit=1,...
Why two -drive? This permits setting up one part of the flash memory
read-only, and the other part read/write. It also makes upgrading
firmware on the host easier. Below the hood, it creates two separate
flash devices, because we were too lazy to improve our flash device
models to support sector protection.
The problem at hand is to do the same with -blockdev somehow, as one
more step towards deprecating -drive.
Mapping -drive if=none,... to -blockdev is a solved problem. With
if=T other than if=none, -drive additionally configures a block device
frontend. For non-onboard devices, that part maps to -device. Also a
solved problem. For onboard devices such as PC flash memory, we have
an unsolved problem.
This is actually an instance of a wider problem: our general device
configuration interface doesn't cover onboard devices. Instead, we have
a zoo of ad hoc interfaces that are much more limited. One of them is
-drive, which we'd rather deprecate, but can't until we have suitable
replacements for all its uses.
Sadly, I can't attack the wider problem today. So back to the narrow
problem.
My first idea was to reduce it to its solved buddy by using pluggable
instead of onboard devices for the flash memory. Workable, but it
requires some extra smarts in firmware descriptors and libvirt. Paolo
had an idea that is simpler for libvirt: keep the devices onboard, and
add machine properties for their block backends.
The implementation is less than straightforward, I'm afraid.
First, block backend properties are *qdev* properties. Machines can't
have those, as they're not devices. I could duplicate these qdev
properties as QOM properties, but I hate that.
More seriously, the properties do not belong to the machine, they
belong to the onboard flash devices. Adding them to the machine would
then require bad magic to somehow transfer them to the flash devices.
Fortunately, QOM provides the means to handle exactly this case: add
alias properties to the machine that forward to the onboard devices'
properties.
Properties need to be created in .instance_init() methods. For PC
machines, that's pc_machine_initfn(). To make alias properties work,
we need to create the onboard flash devices there, too. Requires
several bug fixes, in the previous commits. We also have to realize
the devices. More on that below.
If the user sets pflash0, firmware resides in flash memory.
pc_system_firmware_init() maps and realizes the flash devices.
Else, firmware resides in ROM. The onboard flash devices aren't used
then. pc_system_firmware_init() destroys them unrealized, along with
the alias properties.
The existing code to pick up drives defined with -drive if=pflash is
replaced by code to desugar into the machine properties.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <87ftrtux81.fsf@dusky.pond.sub.org>
pc_system_firmware_init() parameter @isapc_ram_fw is PCMachineState
member pci_enabled negated. The next commit will need more of
PCMachineState. To prepare for that, pass a PCMachineState *, and
drop the now redundant parameter @isapc_ram_fw.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20190308131445.17502-11-armbru@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Add an helper to access the opaque struct PFlashCFI01.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20190308131445.17502-9-armbru@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
See the previous commit for rationale.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20190308131445.17502-3-armbru@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Compatibility properties started life as a qdev property thing: we
supported them only for qdev properties, and implemented them with the
machinery backing command line option -global.
Recent commit fa0cb34d22 put them to use (tacitly) with memory
backend objects (subtypes of TYPE_MEMORY_BACKEND). To make that
possible, we first moved the work of applying them from the -global
machinery into TYPE_DEVICE's .instance_post_init() method
device_post_init(), in commits ea9ce8934c and b66bbee39f, then made
it available to TYPE_MEMORY_BACKEND's .instance_post_init() method
host_memory_backend_post_init() as object_apply_compat_props(), in
commit 1c3994f6d2.
Note the code smell: we now have function name starting with object_
in hw/core/qdev.c. It has to be there rather than in qom/, because it
calls qdev_get_machine() to find the current accelerator's and
machine's compat_props.
Turns out calling qdev_get_machine() there is problematic. If we
qdev_create() from a machine's .instance_init() method, we call
device_post_init() and thus qdev_get_machine() before main() can
create "/machine" in QOM. qdev_get_machine() tries to get it with
container_get(), which "helpfully" creates it as "container" object,
and returns that. object_apply_compat_props() tries to paper over the
problem by doing nothing when the value of qdev_get_machine() isn't a
TYPE_MACHINE. But the damage is done already: when main() later
attempts to create the real "/machine", it fails with "attempt to add
duplicate property 'machine' to object (type 'container')", and
aborts.
Since no machine .instance_init() calls qdev_create() so far, the bug
is latent. But since I want to do that, I get to fix the bug first.
Observe that object_apply_compat_props() doesn't actually need the
MachineState, only its the compat_props member of its MachineClass and
AccelClass. This permits a simple fix: register MachineClass and
AccelClass compat_props with the object_apply_compat_props() machinery
right after these classes get selected.
This is actually similar to how things worked before commits
ea9ce8934c and b66bbee39f, except we now register much earlier. The
old code registered them only after the machine's .instance_init()
ran, which would've broken compatibility properties for any devices
created there.
Cc: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20190308131445.17502-2-armbru@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Our pflash devices are simplistically modelled has having
"num-blocks" sectors of equal size "sector-length". Real hardware
commonly has sectors of different sizes. How our "sector-length"
property is related to the physical device's multiple sector sizes
is unclear.
Helper functions pflash_cfi01_register() and pflash_cfi02_register()
create a pflash device, set properties including "sector-length" and
"num-blocks", and realize. They take parameters @size, @sector_len
and @nb_blocs.
QOMification left parameter @size unused. Obviously, @size should
match @sector_len and @nb_blocs, i.e. size == sector_len * nb_blocs.
All callers satisfy this.
Remove @nb_blocs and compute it from @size and @sector_len.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20190308094610.21210-16-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
QOMification left parameter @qdev unused in pflash_cfi01_register()
and pflash_cfi02_register(). All callers pass NULL. Remove.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20190308094610.21210-15-armbru@redhat.com>
We have two open-coded copies of macro PFLASH_CFI01(). Move the macro
to the header, so we can ditch the copies. Move PFLASH_CFI02() to the
header for symmetry.
We define macros TYPE_PFLASH_CFI01 and TYPE_PFLASH_CFI02 for type name
strings, then mostly use the strings. If the macros are worth
defining, they are worth using. Replace the strings by the macros.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20190308094610.21210-6-armbru@redhat.com>
pflash_cfi01.c and pflash_cfi02.c start their identifiers with
pflash_cfi01_ and pflash_cfi02_ respectively, except for
CFI_PFLASH01(), TYPE_CFI_PFLASH01, CFI_PFLASH02(), TYPE_CFI_PFLASH02.
Rename for consistency.
Suggested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20190308094610.21210-5-armbru@redhat.com>
flash.h's incomplete struct pflash_t is completed both in
pflash_cfi01.c and in pflash_cfi02.c. The complete types are
incompatible. This can hide type errors, such as passing a pflash_t
created with pflash_cfi02_register() to pflash_cfi01_get_memory().
Furthermore, POSIX reserves typedef names ending with _t.
Rename the two structs to PFlashCFI01 and PFlashCFI02.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20190308094610.21210-2-armbru@redhat.com>
Kick the display link up event with a 0.1 sec delay,
so the guest has a chance to notice the link down first.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
[update for redefined macro]
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
This patch adds EDID support to the vfio display (aka vgpu) code.
When supported by the mdev driver qemu will generate a EDID blob
and pass it on using the new vfio edid region. The EDID blob will
be updated on UI changes (i.e. window resize), so the guest can
adapt.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
[remove control flow via macro, use unsigned format specifier]
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
The 'boot_splash_filedata_size' was introduced as a global variable
in 3d3b8303c6. This variable is used as a 'size' argument to the
fw_cfg_add_file(). This function has an interface contract with its
'data' argument, but there is no such contract for 'size' (this is
not a referenced pointer). We can simply remove it.
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20190308013222.12524-7-philmd@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
As NVDIMM support is looming for ARM and SPAPR, let's
move the acpi_nvdimm_state to the generic machine struct
instead of duplicating the same code in several machines.
It is also renamed into nvdimms_state and becomes a pointer.
nvdimm and nvdimm-persistence become generic machine options.
They become guarded by a nvdimm_supported machine class member.
We also add a description for those options.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20190308182053.5487-3-eric.auger@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
As we intend to migrate the acpi_nvdimm_state into
the base machine with a new dimms_state name, let's
also rename the datatype.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20190308182053.5487-2-eric.auger@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Guests started with NVDIMMs larger than the underlying host file produce
confusing errors inside the guest. This happens because the guest
accesses pages beyond the end of the file.
Check the pmem file size on startup and print a clear error message if
the size is invalid.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1669053
Cc: Wei Yang <richardw.yang@linux.intel.com>
Cc: Zhang Yi <yi.z.zhang@linux.intel.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20190214031004.32522-3-stefanha@redhat.com>
Reviewed-by: Wei Yang <richardw.yang@linux.intel.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Pankaj Gupta <pagupta@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-id: 20190307080244.9011-4-kraxel@redhat.com
- qcow2: Support for external data files
- qcow2: Default to 4KB for the qcow2 cache entry size
- Apply block driver whitelist for -drive format=help
- Several qemu-iotests improvements
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJcgmYDAAoJEH8JsnLIjy/WiEgP/jirn5n4bFHDSzpofRxgpcEG
2SoBpYtJQvAVgXmg8P1ZKeaX5yiiZpJS475ShuH6C3dnWuaHBBjvQfDkLLrh3v05
4KJyZQUFx0WQZqkEiHxtgmE3iIOAsWe8VzBCZjsFITdp+fN8HlRjKVofyYP0y48G
k9PzMlEe30Wu2s+lq6PEIXgpwIZqOVl1V3C7wDF8Vg/1OOsv+rK0vKD8yra/oQAc
mthM+hjkpa+ohjE5aTeCsDVxO56AStvXv0d3bE0aF/ZtCvQdbVh5coYj1ldNz6VY
Bvx+UP7kPHJw0wesZJXLeVyZUMyHAuu9vW6zKlDYJ7PoIXSLXC7rYyoEofvkAzQL
awSluFj4HpYU1dKseJ8LzrMyUqjJ2eLJ+K48iNIh0vNlVuvX8aR62dIv0Obo3rod
Y7zgyd6mSgDGulDa3xVsD0+eAUUEbLaUDBV80+M7S2/V9YP4Bt7lKbm0SQB4dUxm
eOJfnbMTpIYUUjYtpCkURED3MlTIA51fy28O87TMJwa11sJTXRscysE9oNQNsvwW
2UPhMPLykMI023glhV0vCwgXQ5kktOvpaB3U7LGQhhHd1ed9sdLEB1bO9eKWYr4N
h6QPLBRLGUYd+BFMMIBLQtx8r+mAn1hUZoX4zxTn0lD50Rch3pRnCjMoC1iWqRkS
J204Lzum5kgNPheyVxhy
=GCRI
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer patches:
- qcow2: Support for external data files
- qcow2: Default to 4KB for the qcow2 cache entry size
- Apply block driver whitelist for -drive format=help
- Several qemu-iotests improvements
# gpg: Signature made Fri 08 Mar 2019 12:54:27 GMT
# gpg: using RSA key 7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6
* remotes/kevin/tags/for-upstream: (33 commits)
qcow2 spec: Describe string header extensions
qemu-iotests: Add dependency to qemu-nbd tool
ahci-test: Add dependency to qemu-img tool
qemu-iotests: amend with external data file
qemu-iotests: General tests for qcow2 with external data file
qemu-iotests: Preallocation with external data file
qcow2: Implement data-file-raw create option
qcow2: Store data file name in the image
qcow2: Creating images with external data file
qcow2: Add basic data-file infrastructure
qcow2: Support external data file in qemu-img check
qcow2: Return error for snapshot operation with data file
qcow2: External file I/O
qcow2: Prepare qcow2_co_block_status() for data file
qcow2: Return 0/-errno in qcow2_alloc_compressed_cluster_offset()
qcow2: Don't assume 0 is an invalid cluster offset
qcow2: Prepare count_contiguous_clusters() for external data file
qcow2: Prepare qcow2_get_cluster_type() for external data file
qcow2: Pass bs to qcow2_get_cluster_type()
qcow2: Basic definitions for external data files
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Provide an option to force QEMU to always keep the external data file
consistent as a standalone read-only raw image.
At the moment, this means making sure that write_zeroes requests are
forwarded to the data file instead of just updating the metadata, and
checking that no backing file is used.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
bdrv_iterate_format (which is currently only used for printing out the
formats supported by the block layer) doesn't take format whitelisting
into account.
This creates a problem for tests: they enumerate supported formats to
decide which tests to enable, but then discover that QEMU doesn't let
them actually use some of those formats.
To avoid that, exclude formats that are not whitelisted from
enumeration, if whitelisting is in use. Since we have separate
whitelists for r/w and r/o, take this a parameter to
bdrv_iterate_format, and print two lists of supported formats (r/w and
r/o) in main qemu.
Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
In existing code we create the gcontext dynamically at the first
access of the gcontext from caller. That can bring some complexity
and potential races during using iothread. Since the context itself
is not that big a resource, and we won't have millions of iothread,
let's simply create the gcontext unconditionally.
This will also be a preparation work further to move the thread
context push operation earlier than before (now it's only pushed right
before we want to start running the gmainloop).
Removing the g_once since it's not necessary, while introducing a new
run_gcontext boolean to show whether we want to run the gcontext.
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-id: 20190306115532.23025-3-peterx@redhat.com
Message-Id: <20190306115532.23025-3-peterx@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Only sending an init-done message using lock+cond seems an overkill to
me. Replacing it with a simpler semaphore.
Meanwhile, init the semaphore unconditionally, then we can destroy it
unconditionally too in finalize which seems cleaner.
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-id: 20190306115532.23025-2-peterx@redhat.com
Message-Id: <20190306115532.23025-2-peterx@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Introduced in 64b40bc54a, this definition is no more used since
a0b753dfd3. Remove it.
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Add qgraph API that allows to add/remove nodes and edges from the graph,
implementation of Depth First Search to discover the paths and basic unit
test to check correctness of the API.
Included also a main executable that takes care of starting the framework,
create the nodes, set the available drivers/machines, discover the path and
run tests.
graph.h provides the public API to manage the graph nodes/edges
graph_extra.h provides a more private API used successively by the gtest integration part
qos-test.c provides the main executable
Signed-off-by: Emanuele Giuseppe Esposito <e.emanuelegiuseppe@gmail.com>
[Paolo's changes compared to the Google Summer of Code submission:
* added subprocess to test options
* refactored object creation to support live migration tests
* removed driver .before callback (unused)
* removed test .after callbacks (replaced by GTest destruction queue)]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
slirp migration code uses QEMU vmstate so far, when building WITH_QEMU.
Introduce slirp_state_{load,save,version}() functions to move the
state saving handling to libslirp side.
So far, the bitstream compatibility should remain equal with current
QEMU, as this is effectively using the same code, with the same format
etc. When libslirp is made standalone, we will need some mechanism to
ensure bitstream compatibility regardless of the libslirp version
installed. See the FIXME note in the code.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20190212162524.31504-3-marcandre.lureau@redhat.com>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
As with the previous patch to qemu-nbd, the nbd-server-start QMP command
also needs to be able to specify authorization when enabling TLS encryption.
First the client must create a QAuthZ object instance using the
'object-add' command:
{
'execute': 'object-add',
'arguments': {
'qom-type': 'authz-list',
'id': 'authz0',
'parameters': {
'policy': 'deny',
'rules': [
{
'match': '*CN=fred',
'policy': 'allow'
}
]
}
}
}
They can then reference this in the new 'tls-authz' parameter when
executing the 'nbd-server-start' command:
{
'execute': 'nbd-server-start',
'arguments': {
'addr': {
'type': 'inet',
'host': '127.0.0.1',
'port': '9000'
},
'tls-creds': 'tls0',
'tls-authz': 'authz0'
}
}
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <20190227162035.18543-3-berrange@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Currently any client which can complete the TLS handshake is able to use
the NBD server. The server admin can turn on the 'verify-peer' option
for the x509 creds to require the client to provide a x509 certificate.
This means the client will have to acquire a certificate from the CA
before they are permitted to use the NBD server. This is still a fairly
low bar to cross.
This adds a '--tls-authz OBJECT-ID' option to the qemu-nbd command which
takes the ID of a previously added 'QAuthZ' object instance. This will
be used to validate the client's x509 distinguished name. Clients
failing the authorization check will not be permitted to use the NBD
server.
For example to setup authorization that only allows connection from a client
whose x509 certificate distinguished name is
CN=laptop.example.com,O=Example Org,L=London,ST=London,C=GB
escape the commas in the name and use:
qemu-nbd --object tls-creds-x509,id=tls0,dir=/home/berrange/qemutls,\
endpoint=server,verify-peer=yes \
--object 'authz-simple,id=auth0,identity=CN=laptop.example.com,,\
O=Example Org,,L=London,,ST=London,,C=GB' \
--tls-creds tls0 \
--tls-authz authz0 \
....other qemu-nbd args...
NB: a real shell command line would not have leading whitespace after
the line continuation, it is just included here for clarity.
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <20190227162035.18543-2-berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: split long line in --help text, tweak 233 to show that whitespace
after ,, in identity= portion is actually okay]
Signed-off-by: Eric Blake <eblake@redhat.com>
Let's use a wrapper instead of looking it up manually. This function can
than be reused when we explicitly want to have the bus hotplug handler
(e.g. when the bus hotplug handler was overwritten by the machine
hotplug handler).
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20190228122849.4296-4-david@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
it will allow to return another hotplug handler than the default
one for a specific bus based device type. Which is needed to handle
non trivial plug/unplug sequences that need the access to resources
configured outside of bus where device is attached.
That will allow for returned hotplug handler to orchestrate wiring
in arbitrary order, by chaining other hotplug handlers when
it's needed.
PS:
It could be used for hybrid virtio-mem and virtio-pmem devices
where it will return machine as hotplug handler which will do
necessary wiring at machine level and then pass control down
the chain to bus specific hotplug handler.
Example of top level hotplug handler override and custom plug sequence:
some_machine_get_hotplug_handler(machine){
if (object_dynamic_cast(OBJECT(dev), TYPE_SOME_BUS_DEVICE)) {
return HOTPLUG_HANDLER(machine);
}
return NULL;
}
some_machine_device_plug(hotplug_dev, dev) {
if (object_dynamic_cast(OBJECT(dev), TYPE_SOME_BUS_DEVICE)) {
/* do machine specific initialization */
some_machine_init_special_device(dev)
/* pass control to bus specific handler */
hotplug_handler_plug(dev->parent_bus->hotplug_handler, dev)
}
}
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20190228122849.4296-3-david@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
The qbus_is_full(BusState *bus) function (qdev_monitor.c) compares the max_index
value of the BusState structure with the max_dev value of the BusClass structure
to determine whether the maximum number of children has been reached for the
bus. The problem is, the max_index field of the BusState structure does not
necessarily reflect the number of devices that have been plugged into
the bus.
Whenever a child device is plugged into the bus, the bus's max_index value is
assigned to the child device and then incremented. If the child is subsequently
unplugged, the value of the max_index does not change and no longer reflects the
number of children.
When the bus's max_index value reaches the maximum number of devices
allowed for the bus (i.e., the max_dev field in the BusClass structure),
attempts to plug another device will be rejected claiming that the bus is
full -- even if the bus is actually empty.
To resolve the problem, a new 'num_children' field is being added to the
BusState structure to keep track of the number of children plugged into the
bus. It will be incremented when a child is plugged, and decremented when a
child is unplugged.
Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Pierre Morel<pmorel@linux.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Message-Id: <1545062250-7573-1-git-send-email-akrowiak@linux.ibm.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
(This replaces the pull sent yesterday)
a) 4 small fixes including the cancel problem
that caused the ahci migration test to fail
intermittently
b) Yury's ignore-shared feature
c) Juan's extra tests
d) Wei Wang's free page hinting
e) Some Colo fixes from Zhang Chen
Diff from yesterdays pull:
1) A missing fix of mine (cleanup during exit)
2) Changes from Eric/Markus on 'Create socket-address parameter'
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJcf7GJAAoJEAUWMx68W/3n+lMP/Rl/d7hpi0Ve2fm3VEwoFJea
IRiqo7Yk6heyTCIutFq15pD2ef49AXHpLeGBp9gFNb4bdFTQzHmwOPxeJWig8YXV
m+j5sGRaM9sV8XX24DsZM7yFhpVJmWky8ivMSv3LeEmjx251B9CNL13dc/qVUQHv
lYP6ewnOmtjvR+x+z9Q/+vafnpLWJSxup1G0pZWdQfLpl71E2sMf7FY/G5EroVnf
AXmJb1sjdFXF7n968myfcgYETHsnY0SUa89Bcnd+i40DXvSfa4njXSdE4FOhyIim
n8c4SyRA/Ah2EUl+UGxn8TQ78C4RA3dUS+uXJDmjL1e4ACvqq//nhsfIqTJ9AbgF
Jhx5ArwqrGf7D+/PM5ivDocNplT5JFcCB4OCmZO96Kn0/F6M3UHuL1+IvpQcFMm8
1Ar1REB7BZ6f+QLfY8KKuzVrVRzUBi0DbqFHj5TNIStizOkuUEMMRpcWImBMzslG
531YgTnsSeFfFr13ZJlXDscZSZ5i+fJMjNbH9QpTNy8qmLJoZzbKqpmP4pZmHVI2
w3g1pCHpFejuQtUTNMR3+9mVH5hO+MNrANsTH0yfAXYDNToJ6NkY1nnILHp4P7t1
tqHYN7AO2ZXTTTMSnfyv1+2wh3HZRFB/y7uF6uEowBfuZTRHBHnkaVQp5WbVVSJu
4ovMmHDkcX2bM7VWwTHS
=dk89
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/dgilbert/tags/pull-migration-20190306a' into staging
Migation pull 2019-03-06
(This replaces the pull sent yesterday)
a) 4 small fixes including the cancel problem
that caused the ahci migration test to fail
intermittently
b) Yury's ignore-shared feature
c) Juan's extra tests
d) Wei Wang's free page hinting
e) Some Colo fixes from Zhang Chen
Diff from yesterdays pull:
1) A missing fix of mine (cleanup during exit)
2) Changes from Eric/Markus on 'Create socket-address parameter'
# gpg: Signature made Wed 06 Mar 2019 11:39:53 GMT
# gpg: using RSA key 0516331EBC5BFDE7
# gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>" [full]
# Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A 9FA9 0516 331E BC5B FDE7
* remotes/dgilbert/tags/pull-migration-20190306a: (22 commits)
qapi/migration.json: Remove a variable that doesn't exist in example
Migration/colo.c: Make COLO node running after failover
Migration/colo.c: Fix double close bug when occur COLO failover
virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
migration/ram.c: add the free page optimization enable flag
migration/ram.c: add a notifier chain for precopy
migration: API to clear bits of guest free pages from the dirty bitmap
migration: use bitmap_mutex in migration_bitmap_clear_dirty
bitmap: bitmap_count_one_with_offset
bitmap: fix bitmap_count_one
tests: Add basic migration precopy tcp test
migration: Create socket-address parameter
tests: Add migration xbzrle test
migration: Add capabilities validation
tests/migration-test: Add a test for ignore-shared capability
migration: Add an ability to ignore shared RAM blocks
migration: Introduce ignore-shared capability
exec: Change RAMBlockIterFunc definition
migration/rdma: clang compilation fix
migration: Cleanup during exit
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The new feature enables the virtio-balloon device to receive hints of
guest free pages from the free page vq.
A notifier is registered to the migration precopy notifier chain. The
notifier calls free_page_start after the migration thread syncs the dirty
bitmap, so that the free page optimization starts to clear bits of free
pages from the bitmap. It calls the free_page_stop before the migration
thread syncs the bitmap, which is the end of the current round of ram
save. The free_page_stop is also called to stop the optimization in the
case when there is an error occurred in the process of ram saving.
Note: balloon will report pages which were free at the time of this call.
As the reporting happens asynchronously, dirty bit logging must be
enabled before this free_page_start call is made. Guest reporting must be
disabled before the migration dirty bitmap is synchronized.
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
CC: Michael S. Tsirkin <mst@redhat.com>
CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
CC: Juan Quintela <quintela@redhat.com>
CC: Peter Xu <peterx@redhat.com>
Message-Id: <1544516693-5395-8-git-send-email-wei.w.wang@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
dgilbert: Dropped kernel header update, fixed up CMD_ID_* name change
This patch adds the free page optimization enable flag, and a function
to set this flag. When the free page optimization is enabled, not all
the pages are needed to be sent in the bulk stage.
Why using a new flag, instead of directly disabling ram_bulk_stage when
the optimization is running?
Thanks for Peter Xu's reminder that disabling ram_bulk_stage will affect
the use of compression. Please see save_page_use_compression. When
xbzrle and compression are used, if free page optimizaion causes the
ram_bulk_stage to be disabled, save_page_use_compression will return
false, which disables the use of compression. That is, if free page
optimization avoids the sending of half of the guest pages, the other
half of pages loses the benefits of compression in the meantime. Using a
new flag to let migration_bitmap_find_dirty skip the free pages in the
bulk stage will avoid the above issue.
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
CC: Juan Quintela <quintela@redhat.com>
CC: Michael S. Tsirkin <mst@redhat.com>
CC: Peter Xu <peterx@redhat.com>
Message-Id: <1544516693-5395-7-git-send-email-wei.w.wang@intel.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
This patch adds a notifier chain for the memory precopy. This enables various
precopy optimizations to be invoked at specific places.
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
CC: Juan Quintela <quintela@redhat.com>
CC: Michael S. Tsirkin <mst@redhat.com>
CC: Peter Xu <peterx@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <1544516693-5395-6-git-send-email-wei.w.wang@intel.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
This patch adds an API to clear bits corresponding to guest free pages
from the dirty bitmap. Spilt the free page block if it crosses the QEMU
RAMBlock boundary.
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
CC: Juan Quintela <quintela@redhat.com>
CC: Michael S. Tsirkin <mst@redhat.com>
CC: Peter Xu <peterx@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <1544516693-5395-5-git-send-email-wei.w.wang@intel.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Count the number of 1s in a bitmap starting from an offset.
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
CC: Juan Quintela <quintela@redhat.com>
CC: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <1544516693-5395-3-git-send-email-wei.w.wang@intel.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
BITMAP_LAST_WORD_MASK(nbits) returns 0xffffffff when "nbits=0", which
makes bitmap_count_one fail to handle the "nbits=0" case. It appears to be
preferred to remain BITMAP_LAST_WORD_MASK identical to the kernel
implementation that it is ported from.
So this patch fixes bitmap_count_one to handle the nbits=0 case.
Inital Discussion Link:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg554316.html
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
CC: Juan Quintela <quintela@redhat.com>
CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
CC: Peter Xu <peterx@redhat.com>
Message-Id: <1544516693-5395-2-git-send-email-wei.w.wang@intel.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
If ignore-shared capability is set then skip shared RAMBlocks during the
RAM migration.
Also, move qemu_ram_foreach_migratable_block (and rename) to the
migration code, because it requires access to the migration capabilities.
Signed-off-by: Yury Kotov <yury-kotov@yandex-team.ru>
Message-Id: <20190215174548.2630-4-yury-kotov@yandex-team.ru>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Currently, qemu_ram_foreach_* calls RAMBlockIterFunc with many
block-specific arguments. But often iter func needs RAMBlock*.
This refactoring is needed for fast access to RAMBlock flags from
qemu_ram_foreach_block's callback. The only way to achieve this now
is to call qemu_ram_block_from_host (which also enumerates blocks).
So, this patch reduces complexity of
qemu_ram_foreach_block() -> cb() -> qemu_ram_block_from_host()
from O(n^2) to O(n).
Fix RAMBlockIterFunc definition and add some functions to read
RAMBlock* fields witch were passed.
Signed-off-by: Yury Kotov <yury-kotov@yandex-team.ru>
Message-Id: <20190215174548.2630-2-yury-kotov@yandex-team.ru>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>