qemu-e2k

Commit Graph

Author	SHA1	Message	Date
Philippe Mathieu-Daudé	664b4be5e8	i386/pc: move vmmouse.c to hw/i386/ It's a x86-only device, so it does not make sense to keep it in the shared misc folder. Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2017-12-18 17:07:02 +03:00
Philippe Mathieu-Daudé	323d7d1d99	i386/pc: move vmport.c to hw/i386/ It's a x86-only device, so it does not make sense to keep it in the shared misc folder. Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2017-12-18 17:07:02 +03:00
Philippe Mathieu-Daudé	0d5d8a3a90	hw/misc/pvpanic: extract public API from i386/pc to "hw/misc/pvpanic.h" and remove the old i386/pc dependency. Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2017-12-18 17:07:02 +03:00
Philippe Mathieu-Daudé	489983d6b4	hw/net/ne2000: extract ne2k-isa code from i386/pc to ne2000-isa.c - add "hw/net/ne2000-isa.h" - remove the old i386 dependency Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Hervé Poussineau <hpoussin@reactos.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> [PPC] Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2017-12-18 17:07:02 +03:00
Philippe Mathieu-Daudé	6c646a11bf	hw/timer/mc146818: rename rtc_init() -> mc146818_rtc_init() Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Hervé Poussineau <hpoussin@reactos.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2017-12-18 17:07:02 +03:00
Philippe Mathieu-Daudé	acf695eca6	hw/timer/i8254: rename pit_init() -> i8254_pit_init() and remove the old i386/pc dependency Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Hervé Poussineau <hpoussin@reactos.org> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2017-12-18 17:07:02 +03:00
Philippe Mathieu-Daudé	09db4d37d2	misc: remove old i386 dependency Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Anthony PERARD <anthony.perard@citrix.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2017-12-18 17:07:02 +03:00
Philippe Mathieu-Daudé	433545d569	amd_iommu: avoid needless includes in header file instead move them to the source file Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2017-12-18 17:07:02 +03:00
Michael McConville	ab1ce9bd48	mmap(2) returns MAP_FAILED, not NULL, on failure Signed-off-by: Michael McConville <mmcco@mykolab.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2017-12-18 17:07:02 +03:00
Marc-André Lureau	ff5ce21e1b	acpi: change TPM TIS data conditions The device should be exposed if present. It shouldn't have an undefined version (or else backend init failed, and device should fail too). Finally, make the fields specific to TIS device model. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Stefan Berger <stefanb@linux.vnet.ibm.com> Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>	2017-12-14 23:39:15 -05:00
Marc-André Lureau	3dfd5a2a50	tpm: lookup the the TPM interface instead of TIS device This will allow to introduce new devices implementing TPM. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Stefan Berger <stefanb@linux.vnet.ibm.com> Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>	2017-12-14 23:39:14 -05:00
Igor Mammedov	75ba2ddb18	pc: fix crash on attempted cpu unplug when qemu is started with '-no-acpi' CLI option, an attempt to unplug a CPU using device_del results in null pointer dereference at: #0 object_get_class #1 pc_machine_device_unplug_request_cb #2 qmp_marshal_device_del which is caused by pcms->acpi_dev == NULL due to ACPI support being disabled. Considering that ACPI support is necessary for unplug to work, check that it's enabled and fail unplug request gracefully if no acpi device were found. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-12-01 19:05:58 +02:00
Dou Liyang	7b8be49d36	NUMA: Enable adding NUMA node implicitly Linux and Windows need ACPI SRAT table to make memory hotplug work properly, however currently QEMU doesn't create SRAT table if numa options aren't present on CLI. Which breaks both linux and windows guests in certain conditions: * Windows: won't enable memory hotplug without SRAT table at all * Linux: if QEMU is started with initial memory all below 4Gb and no SRAT table present, guest kernel will use nommu DMA ops, which breaks 32bit hw drivers when memory is hotplugged and guest tries to use it with that drivers. Fix above issues by automatically creating a numa node when QEMU is started with memory hotplug enabled but without '-numa' options on CLI. (PS: auto-create numa node only for new machine types so not to break migration). Which would provide SRAT table to guests without explicit -numa options on CLI and would allow: * Windows: to enable memory hotplug * Linux: switch to SWIOTLB DMA ops, to bounce DMA transfers to 32bit allocated buffers that legacy drivers/hw can handle. [Rewritten by Igor] Reported-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Suggested-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Marcel Apfelbaum <marcel@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Thomas Huth <thuth@redhat.com> Cc: Alistair Francis <alistair23@gmail.com> Cc: Takao Indoh <indou.takao@jp.fujitsu.com> Cc: Izumi Taku <izumi.taku@jp.fujitsu.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-11-16 17:46:53 +02:00
Marcel Apfelbaum	9fa99d2519	hw/pci-host: Fix x86 Host Bridges 64bit PCI hole Currently there is no MMIO range over 4G reserved for PCI hotplug. Since the 32bit PCI hole depends on the number of cold-plugged PCI devices and other factors, it is very possible is too small to hotplug PCI devices with large BARs. Fix it by reserving 2G for I4400FX chipset in order to comply with older Win32 Guest OSes and 32G for Q35 chipset. Even if the new defaults of pci-hole64-size will appear in "info qtree" also for older machines, the property was not implemented so no changes will be visible to guests. Note this is a regression since prev QEMU versions had some range reserved for 64bit PCI hotplug. Reviewed-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Marcel Apfelbaum <marcel@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-11-16 17:46:53 +02:00
Paolo Bonzini	ab37bfc7d6	pci-assign: Remove Legacy PCI device assignment has been removed from Linux in 4.12, and had been deprecated 2 years ago there. We can remove it from QEMU as well. The ROM loading code was shared with Xen PCI passthrough, so move it to hw/xen. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-11-05 14:52:10 +01:00
Peter Maydell	ab752f237d	x86/cpu/numa queue, 2017-10-27 -----BEGIN PGP SIGNATURE----- iQIcBAABCAAGBQJZ8z/oAAoJECgHk2+YTcWmqGAQALy+nNFpQi048pydLmtbnv3y EDo4lsq29uimpcgim4GfZKLcrUygqVY6vorGsvRPw6/efWSzLoMHdsYee4cv5V6S fwXI6TIjyDbMZEoQ+TPf6zBPiGT7Ep5nMAl0zspvqZS7ssp0dWGrvAtpXzj3FFfB VwCs3eF7PILMBzMNeRoGrCweJu7mOMhTa7FF3o1FG135AoDljnL2oGj0TA/Z333N n3wOzl+rvXvYPGc8wEoixTzFQ1kw6vZwJk3sT77o+Zi2C0ihqcJ04F6cUFEyYdT3 O+nH4rZcCWIEjHDYt4BgjfIigcih75zYHIFdMxzdfeofGThmf4UwnFl7LzJ1KF27 RS3fnqZCGw41hUJ2usmSMkxbliXmohGczx+lN4/la/lSLp/LVoTxBhyfoMZHa8kL E52Waj2D1LfSg0KXJEYy8LgUrrLLABQ0fCEnZTSBcAlGJdd4m8nOTlbGjpovgwPf GKLiu6yHvfpoQuTyRHx/Nojq1UzhZ0BRm7N/2bJZWevvAsbGIP5TRXbnnwiMG5z5 VRwwQ2vqG69SiI+gvjCvaJRGRbxuGANWYVUZAnjNuzL5V8WR+F7D3AaUlucp8Ui5 7C822NHxvKxQoJvPRJspkrdoMu9FPkKdsFoOCywWJlZKn3mSxDkO1k1Gmw1UVSWi Ynvckz9k4NPG8ADj6ySE =ApN/ -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/ehabkost/tags/x86-and-machine-pull-request' into staging x86/cpu/numa queue, 2017-10-27 # gpg: Signature made Fri 27 Oct 2017 15:17:12 BST # gpg: using RSA key 0x2807936F984DC5A6 # gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>" # Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF D1AA 2807 936F 984D C5A6 * remotes/ehabkost/tags/x86-and-machine-pull-request: (39 commits) x86: Skip check apic_id_limit for Xen numa: fixup parsed NumaNodeOptions earlier mips: r4k: replace cpu_model with cpu_type mips: mipssim: replace cpu_model with cpu_type mips: Magnum/Acer Pica 61: replace cpu_model with cpu_type mips: fulong2e: replace cpu_model with cpu_type mips: malta/boston: replace cpu_model with cpu_type mips: use object_new() instead of gnew()+object_initialize() sparc: leon3: use generic cpu_model parsing sparc: sparc: use generic cpu_model parsing sparc: sun4u/sun4v/niagara: use generic cpu_model parsing sparc: cleanup cpu type name composition tricore: use generic cpu_model parsing tricore: cleanup cpu type name composition unicore32: use generic cpu_model parsing unicore32: cleanup cpu type name composition xtensa: lx60/lx200/ml605/kc705: use generic cpu_model parsing xtensa: sim: use generic cpu_model parsing xtensa: cleanup cpu type name composition sh4: remove SuperHCPUClass::name field ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-10-30 10:11:22 +00:00
Lan Tianyu	1a26f46692	x86: Skip check apic_id_limit for Xen Xen vIOMMU device model will be in Xen hypervisor. Skip vIOMMU check for Xen here when vcpu number is more than 255. Signed-off-by: Lan Tianyu <tianyu.lan@intel.com> Message-Id: <1502842933-8323-1-git-send-email-tianyu.lan@intel.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-10-27 16:04:28 +02:00
Ross Lagerwall	7cdcca725b	xen: Log errno rather than return value xen_modified_memory() sets errno to communicate what went wrong so log this rather than the return value which is not interesting. Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Acked-by: Anthony PERARD <anthony.perard@citrix.com> Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>	2017-10-26 14:26:48 -07:00
Peter Maydell	a8b392ac9a	* TCG 8-byte atomic accesses bugfix (Andrew) * Report disk rotation rate (Daniel) * Report invalid scsi-disk block size configuration (Mark) * KVM and memory API MemoryListener fixes (David, Maxime, Peter Xu) * x86 CPU hotplug crash fix (Igor) * Load/store API documentation (Peter Maydell) * Small fixes by myself and Thomas * qdev DEVICE_DELETED deferral (Michael) -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAlnnJUgUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroMifwf/dTZwtGqvAV4+jezCiZ3MTknz39dM HOGnD3m2xy04QT5LHiwDmaLFXy1y/AUVQm79JMPN4dKoFvtruREoWUq8EU0FCsLZ PkdCbJuXKGiBYMRXkQQxeT8lAyaBQwZdc+O9mYuOrSGZOQscA7SxgClYmzVdVzcy ZNTqkuaw1NDIAapdfGv94WLza4Nb8XX8bFwohgkf4mLDXifhjYHQTbBTfB0NqPxH Rk3HU+wgYUCJRYXpvktESgzRo5sm1aozCRq3f0Y6RV12ylgF6GG4CyN7YcKRn8eh NZbyehHiF5YU2kuvO9SmAB+FqM2+aMtq8uuNuI1Nxgd222MOVaChyWc3jg== =gmUj -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging * TCG 8-byte atomic accesses bugfix (Andrew) * Report disk rotation rate (Daniel) * Report invalid scsi-disk block size configuration (Mark) * KVM and memory API MemoryListener fixes (David, Maxime, Peter Xu) * x86 CPU hotplug crash fix (Igor) * Load/store API documentation (Peter Maydell) * Small fixes by myself and Thomas * qdev DEVICE_DELETED deferral (Michael) # gpg: Signature made Wed 18 Oct 2017 10:56:24 BST # gpg: using RSA key 0xBFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: (29 commits) scsi: reject configurations with logical block size > physical block size qdev: defer DEVICE_DEL event until instance_finalize() Revert "qdev: Free QemuOpts when the QOM path goes away" qdev: store DeviceState's canonical path to use when unparenting qemu-pr-helper: use new libmultipath API watch_mem_write: implement 8-byte accesses notdirty_mem_write: implement 8-byte accesses memory: reuse section_from_flat_range() kvm: simplify kvm_align_section() kvm: region_add and region_del is not called on updates kvm: fix error message when failing to unregister slot kvm: tolerate non-existing slot for log_start/log_stop/log_sync kvm: fix alignment of ram address memory: call log_start after region_add target/i386: trap on instructions longer than >15 bytes target/i386: introduce x86_ld*_code tco: add trace events docs/devel/loads-stores.rst: Document our various load and store APIs nios2: define tcg_env build: remove CONFIG_LIBDECNUMBER ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-10-19 15:38:07 +01:00
Laurent Vivier	46202d85d7	pc: remove useless hot_add_cpu initialisation Since `4458fb3a79` (pc: Eliminate pc_default_machine_options()), hot_add_cpu is set in pc_machine_class_init(), so we don't need to set it in pc_q35_machine_options(), pc_i440fx_machine_options() and xenfv_machine_options(), except to clear it in pc_i440fx_1_4_machine_opt(). Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Acked-by: Anthony PERARD <anthony.perard@citrix.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-10-15 05:54:44 +03:00
Eduardo Habkost	b5dac42492	isapc: Remove unnecessary migration compatibility code We don't touch isapc when we change guest ABI and add new entries to PC_COMPAT_* or new PCMachineClass compat flags. This means isapc never guaranteed guest ABI and cross-QEMU-version live migration compatibility. There's no point in keeping code for kvm-pv-eoi and APIC ID compatibility in pc_init_isa(). Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-10-15 05:54:44 +03:00
Eduardo Habkost	fd3b02c889	pci: Add INTERFACE_CONVENTIONAL_PCI_DEVICE to Conventional PCI devices Add INTERFACE_CONVENTIONAL_PCI_DEVICE to all direct subtypes of TYPE_PCI_DEVICE, except: 1) The ones that already have INTERFACE_PCIE_DEVICE set: * base-xhci * e1000e * nvme * pvscsi * vfio-pci * virtio-pci * vmxnet3 2) base-pci-bridge Not all PCI bridges are Conventional PCI devices, so INTERFACE_CONVENTIONAL_PCI_DEVICE is added only to the subtypes that are actually Conventional PCI: * dec-21154-p2p-bridge * i82801b11-bridge * pbm-bridge * pci-bridge The direct subtypes of base-pci-bridge not touched by this patch are: * xilinx-pcie-root: Already marked as PCIe-only. * pcie-pci-bridge: Already marked as PCIe-only. * pcie-port: all non-abstract subtypes of pcie-port are already marked as PCIe-only devices. 3) megasas-base Not all megasas devices are Conventional PCI devices, so the interface names are added to the subclasses registered by megasas_register_types(), according to information in the megasas_devices[] array. "megasas-gen2" already implements INTERFACE_PCIE_DEVICE, so add INTERFACE_CONVENTIONAL_PCI_DEVICE only to "megasas". Acked-by: Alberto Garcia <berto@igalia.com> Acked-by: John Snow <jsnow@redhat.com> Acked-by: Anthony PERARD <anthony.perard@citrix.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Marcel Apfelbaum <marcel@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-10-15 05:54:43 +03:00
Marc-André Lureau	5f9252f7cc	fw_cfg: add write callback Reintroduce the write callback that was removed when write support was removed in commit `023e314856`. Contrary to the previous callback implementation, the write_cb callback is called whenever a write happened, so handlers must be ready to handle partial write as necessary. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-10-15 05:54:40 +03:00
Igor Mammedov	6970c5ff13	pc: make sure that plugged CPUs are of the same type heterogeneous cpus are not supported and hotplugging different cpu model crashes QEMU: qemu-system-x86_64 -cpu qemu64 -smp 1,maxcpus=2 (qemu) device_add host-x86_64-cpu,socket-id=1,core-id=0,thread-id=0,id=foo (qemu) info cpus error: failed to get MSR 0x38d qemu-system-x86_64: target/i386/kvm.c:2121: kvm_get_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed. Aborted (core dumped) Gracefully fail hotplug process in case of user mistake. Reported-by: Greg Kurz <groug@kaod.org> Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <1507638879-200718-1-git-send-email-imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-10-12 12:10:38 +02:00
Jim Somerville	346b1215b1	kvmclock: use the updated system_timer_msr Fixes `e2b6c17` (kvmclock: update system_time_msr address forcibly) which makes a call to get the latest value of the address stored in system_timer_msr, but then uses the old address anyway. Signed-off-by: Jim Somerville <Jim.Somerville@windriver.com> Message-Id: <59b67db0bd15a46ab47c3aa657c81a4c11f168ea.1506702472.git.Jim.Somerville@windriver.com> Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-10-02 14:39:51 +02:00
Dr. David Alan Gilbert	44b1ff319c	migration: pre_save return int Modify the pre_save method on VMStateDescription to return an int rather than void so that it potentially can fail. Changed zillions of devices to make them return 0; the only case I've made it return non-0 is hw/intc/s390_flic_kvm.c that already had an error_report/return case. Note: If you add an error exit in your pre_save you must emit an error_report to say why. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20170925112917.21340-2-dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-09-27 11:35:59 +01:00
Peter Maydell	d3f5433c7b	Machine/CPU/NUMA queue, 2017-09-19 -----BEGIN PGP SIGNATURE----- iQIcBAABCAAGBQJZwXs9AAoJECgHk2+YTcWmS18P/1OceEmetatwBQ6YWURA0VfU CB+nCHxijlWXU8UiwoTVDOXc11P00V5VO0mMFouUbv+o+/80qyAfNl2DhDVq7ZXo nhIFHKXUR1BX3YbcqfIonH8xCMGtrAexghhhaGPQbx450PqmGyyjBsZsXttNge1S Yt+jzgD9drsZPYw4ZLJjT/tnWI/+8kDPn37jVujzRzApTD9/fq+77ZZq7q25RzQH ISa5OXOQk6pq+tPHvaIFXfOqfhILcpM7u/X7MYVwiA1oBqcLXTDCjDX3cKavKade +i3yrH7ahNgUO1DyT2bMZX6NAFoZDBrwlYgpkw8n+Yf+EUcdPDOHHEcEeaMpdTGx wgWbQrs+xzIg/ulRb8Qqe9FwdXGQbehfFGofd+gnGQ0XuxekT82in+ucMOivQO3x W/azGnzoz6D83stJFIZ93S69SRswqBuj2R8mu821yzqx1EUSNXgfKXz9OPwwFed5 El9YN127F/VAyp0av1CsOg1XgqIujMUGRxGf7eQBfkh1R3C/g2XNPTvR3yaY9L5B zuMJfWLF6r6zL53ymt7/9UVEim295Lia3mNGS5/Min5QGms7edphsDsrubzXsZGq 2owWfAU/KeDH9gNVNNkZdLcEcS5TEz+2oGPR5oeDeB/QlzVdNQ3FeTVlzFavxNQa 8nrzeFcw7VNrIx2gdvsY =LQ8U -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/ehabkost/tags/machine-next-pull-request' into staging Machine/CPU/NUMA queue, 2017-09-19 # gpg: Signature made Tue 19 Sep 2017 21:17:01 BST # gpg: using RSA key 0x2807936F984DC5A6 # gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>" # Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF D1AA 2807 936F 984D C5A6 * remotes/ehabkost/tags/machine-next-pull-request: MAINTAINERS: Update git URLs for my trees hw/acpi-build: Fix SRAT memory building in case of node 0 without RAM NUMA: Replace MAX_NODES with nb_numa_nodes in for loop numa: cpu: calculate/set default node-ids after all -numa CLI options are parsed arm: drop intermediate cpu_model -> cpu type parsing and use cpu type directly pc: use generic cpu_model parsing vl.c: convert cpu_model to cpu type and set of global properties before machine_init() cpu: make cpu_generic_init() abort QEMU on error qom: cpus: split cpu_generic_init() on feature parsing and cpu creation parts hostmem-file: Add "discard-data" option osdep: Define QEMU_MADV_REMOVE vl: Clean up user-creatable objects when exiting Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-09-20 17:35:36 +01:00
Eduardo Habkost	4926403c25	hw/acpi-build: Fix SRAT memory building in case of node 0 without RAM Currently, Using the fisrt node without memory on the machine makes QEMU unhappy. With this example command line: ... \ -m 1024M,slots=4,maxmem=32G \ -numa node,nodeid=0 \ -numa node,mem=1024M,nodeid=1 \ -numa node,nodeid=2 \ -numa node,nodeid=3 \ Guest reports "No NUMA configuration found" and the NUMA topology is wrong. This is because when QEMU builds ACPI SRAT, it regards node 0 as the default node to deal with the memory hole(640K-1M). this means the node0 must have some memory(>1M), but, actually it can have no memory. Fix this problem by cut out the 640K hole in the same way the PCI 4G hole does. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Message-Id: <1504231805-30957-2-git-send-email-douly.fnst@cn.fujitsu.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-09-19 16:51:33 -03:00
Igor Mammedov	79e0793614	numa: cpu: calculate/set default node-ids after all -numa CLI options are parsed Calculating default node-ids for CPUs in possible_cpu_arch_ids() is rather fragile since defaults calculation uses nb_numa_nodes but callback might be potentially called early before all -numa CLI options are parsed, which would lead to cpus assigned only upto nb_numa_nodes at the time possible_cpu_arch_ids() is called. Issue was introduced by (`7c88e65` numa: mirror cpu to node mapping in MachineState::possible_cpus) and for example CLI: -smp 4 -numa node,cpus=0 -numa node would set props.node-id in possible_cpus array for every non explicitly mapped CPU to the first node. Issue is not visible to guest nor to mgmt interface due to 1) implictly mapped cpus are forced to the first node in case of partial mapping 2) in case of default mapping possible_cpu_arch_ids() is called after all -numa options are parsed (resulting in correct mapping). However it's fragile to rely on late execution of possible_cpu_arch_ids(), therefore add machine specific callback that returns node-id for CPU and use it to calculate/ set defaults at machine_numa_finish_init() time when all -numa options are parsed. Reported-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <1496314408-163972-1-git-send-email-imammedo@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-09-19 16:51:33 -03:00
Alistair Francis	b62e39b469	General warn report fixups Tidy up some of the warn_report() messages after having converted them to use warn_report(). Signed-off-by: Alistair Francis <alistair.francis@xilinx.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-Id: <9cb1d23551898c9c9a5f84da6773e99871285120.1505158760.git.alistair.francis@xilinx.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-09-19 14:09:34 +02:00
Alistair Francis	8297be80f7	Convert multi-line fprintf() to warn_report() Convert all the multi-line uses of fprintf(stderr, "warning:"..."\n"... to use warn_report() instead. This helps standardise on a single method of printing warnings to the user. All of the warnings were changed using these commands: find ./* -type f -exec sed -i \ 'N; {s\|fprintf(.".warning[,:] $.$\\n"$.$);\|warn_report("\1"\2);\|Ig}' \ {} + find ./* -type f -exec sed -i \ 'N;N; {s\|fprintf(.".warning[,:] $.$\\n"$.$);\|warn_report("\1"\2);\|Ig}' \ {} + find ./* -type f -exec sed -i \ 'N;N;N; {s\|fprintf(.".warning[,:] $.$\\n"$.$);\|warn_report("\1"\2);\|Ig}' \ {} + find ./* -type f -exec sed -i \ 'N;N;N;N {s\|fprintf(.".warning[,:] $.$\\n"$.$);\|warn_report("\1"\2);\|Ig}' \ {} + find ./* -type f -exec sed -i \ 'N;N;N;N;N {s\|fprintf(.".warning[,:] $.$\\n"$.$);\|warn_report("\1"\2);\|Ig}' \ {} + find ./* -type f -exec sed -i \ 'N;N;N;N;N;N {s\|fprintf(.".warning[,:] $.$\\n"$.$);\|warn_report("\1"\2);\|Ig}' \ {} + find ./* -type f -exec sed -i \ 'N;N;N;N;N;N;N; {s\|fprintf(.".warning[,:] $.$\\n"$.$);\|warn_report("\1"\2);\|Ig}' \ {} + Indentation fixed up manually afterwards. Some of the lines were manually edited to reduce the line length to below 80 charecters. Some of the lines with newlines in the middle of the string were also manually edit to avoid checkpatch errrors. The #include lines were manually updated to allow the code to compile. Several of the warning messages can be improved after this patch, to keep this patch mechanical this has been moved into a later patch. Signed-off-by: Alistair Francis <alistair.francis@xilinx.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Max Reitz <mreitz@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Anthony Perard <anthony.perard@citrix.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Aurelien Jarno <aurelien@aurel32.net> Cc: Yongbok Kim <yongbok.kim@imgtec.com> Cc: Cornelia Huck <cohuck@redhat.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Alexander Graf <agraf@suse.de> Cc: Jason Wang <jasowang@redhat.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Gerd Hoffmann <kraxel@redhat.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-Id: <5def63849ca8f551630c6f2b45bcb1c482f765a6.1505158760.git.alistair.francis@xilinx.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-09-19 14:09:34 +02:00
Alistair Francis	2ab4b13563	Convert single line fprintf(.../n) to warn_report() Convert all the single line uses of fprintf(stderr, "warning:"..."\n"... to use warn_report() instead. This helps standardise on a single method of printing warnings to the user. All of the warnings were changed using this command: find ./* -type f -exec sed -i \ 's\|fprintf(.".warning[,:] $.$\\n"$.$);\|warn_report("\1"\2);\|Ig' \ {} + Some of the lines were manually edited to reduce the line length to below 80 charecters. The #include lines were manually updated to allow the code to compile. Signed-off-by: Alistair Francis <alistair.francis@xilinx.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Max Reitz <mreitz@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Michael Roth <mdroth@linux.vnet.ibm.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: Aurelien Jarno <aurelien@aurel32.net> Cc: Yongbok Kim <yongbok.kim@imgtec.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: James Hogan <james.hogan@imgtec.com> [mips] Message-Id: <ae8f8a7f0a88ded61743dff2adade21f8122a9e7.1505158760.git.alistair.francis@xilinx.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-09-19 14:09:34 +02:00
Alistair Francis	9e5d2c5273	hw/i386: Improve some of the warning messages Signed-off-by: Alistair Francis <alistair.francis@xilinx.com> Suggested-by: Eduardo Habkost <ehabkost@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <1d6ef2ccd9667878ed5820fcf17eef35957ea5d8.1505158760.git.alistair.francis@xilinx.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-09-19 14:09:34 +02:00
Prasad J Pandit	ed4f86e8b6	multiboot: validate multiboot header address values While loading kernel via multiboot-v1 image, (flags & 0x00010000) indicates that multiboot header contains valid addresses to load the kernel image. These addresses are used to compute kernel size and kernel text offset in the OS image. Validate these address values to avoid an OOB access issue. This is CVE-2017-14167. Reported-by: Thomas Garnier <thgarnie@google.com> Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org> Message-Id: <20170907063256.7418-1-ppandit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-09-19 14:09:33 +02:00
Igor Mammedov	311ca98d16	pc: use generic cpu_model parsing define default CPU type in generic way in pc_machine_class_init() and let common machine code to handle cpu_model parsing Patch also introduces TARGET_DEFAULT_CPU_TYPE define for 2 purposes: * make foo_machine_class_init() look uniform on every target * use define in [bsd\|linux]-user targets to pick default cpu type Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <1505318697-77161-5-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-09-19 09:09:32 -03:00
Peter Xu	66a4a0318e	intel_iommu: fix missing BQL in pt fast path In vtd_switch_address_space() we did the memory region switch, however it's possible that the caller of it has not taken the BQL at all. Make sure we have it. CC: Paolo Bonzini <pbonzini@redhat.com> CC: Jason Wang <jasowang@redhat.com> CC: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-09-08 16:15:17 +03:00
Anthony PERARD	ab938ae43f	hw/acpi: Move acpi_set_pci_info to pcihp HW part of ACPI PCI hotplug in QEMU depends on ACPI_PCIHP_PROP_BSEL being set on a PCI bus that supports ACPI hotplug. It should work regardless of the source of ACPI tables (QEMU generator/legacy SeaBIOS/Xen). So move ACPI_PCIHP_PROP_BSEL initialization into HW ACPI implementation part from QEMU's ACPI table generator. To do PCI passthrough with Xen, the property ACPI_PCIHP_PROP_BSEL needs to be set, but this was done only when ACPI tables are built which is not needed for a Xen guest. The need for the property starts with commit "pc: pcihp: avoid adding ACPI_PCIHP_PROP_BSEL twice" (`f0c9d64a68`). Adding find_i440fx into stubs so that mips-softmmu target can be built. Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-09-08 16:15:17 +03:00
Marcel Apfelbaum	a6fd5b0e05	pc: add 2.11 machine types Signed-off-by: Marcel Apfelbaum <marcel@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-09-08 16:15:17 +03:00
Marc-André Lureau	ad2c19937e	i386: replace g_malloc()+memcpy() with g_memdup() I found these pattern via grepping the source tree. I don't have a coccinelle script for it! Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Richard Henderson <rth@twiddle.net>	2017-08-31 12:29:07 +02:00
Eduardo Habkost	3da2bd8c4a	numa: Move numa_legacy_auto_assign_ram to pc-i440fx-2.9 The 'm->numa_auto_assign_ram = numa_legacy_auto_assign_ram;' line was supposed to be in pc_i440fx_2_9_machine_options() (see commit `3bfe5716` "numa: equally distribute memory on nodes"), but the merge commit `adb354dd` ("Merge remote-tracking branch 'mst/tags/for_upstream' into staging") moved it to the pc_i440fx_2_10_machine_options(). Move the line back to pc_i440fx_2_9_machine_options(). Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-id: 20170818190943.23858-1-ehabkost@redhat.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-08-23 13:53:15 +01:00
Thomas Huth	0479097859	hw/ppc/spapr: Fix segfault when instantiating a 'pc-dimm' without 'memdev' QEMU currently crashes when trying to use a 'pc-dimm' on the pseries machine without specifying its 'memdev' property. This happens because pc_dimm_get_memory_region() does not check whether the 'memdev' property has properly been set by the user. Looking closer at this function, it's also obvious that it is using &error_abort to call another function - and this is bad in a function that is used in the hot-plugging calling chain since this can also cause QEMU to exit unexpectedly. So let's fix these issues in a proper way now: Add a "Error **errp" parameter to pc_dimm_get_memory_region() which we use in case the 'memdev' property has not been set by the user, and which we can use instead of the &error_abort, and change the callers of get_memory_region() to make use of this "errp" parameter for proper error checking. Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-08-22 21:26:46 +10:00
Aleksandr Bezzubikov	a41c78c135	hw/i386: allow SHPC for Q35 machine Unmask previously masked SHPC feature in _OSC method. Signed-off-by: Aleksandr Bezzubikov <zuban32s@gmail.com> Reviewed-by: Marcel Apfelbaum <marcel@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-08-08 00:31:09 +03:00
Igor Mammedov	3a3fcc75f9	pc: acpi: force FADT rev1 for 440fx based machine types w2k used to boot on QEMU until revision of FADT has been bumped to rev3 (commit `77af8a2b` hw/i386: Use Rev3 FADT (ACPI 2.0) instead of Rev1 to improve guest OS support.) Keep PC machine at rev1 to remain compatible and Q35 at rev3 where w2k isn't supported anyway so OSX could run as well. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Tested-by: John Arbuckle <programmingkidx@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-08-02 00:13:26 +03:00
Igor Mammedov	208fa0e436	pc: make 'pc.rom' readonly when machine has PCI enabled looking at bios ROM mapping in QEMU it seems that only isapc (i.e. not PCI enabled machine) requires ROM being mapped as RW in other cases BIOS is mapped as RO. Do the same for option ROM 'pc.rom' when machine has PCI enabled. As useful side-effect pc.rom MemoryRegion stops being put in vhost memory map (filtered out by vhost_section()), which reduces number of entries by 1. Coincidentally it fixes migration failure reported in "[PATCH V2] vhost: fix a migration failed because of vhost region merge" where following destination CLI with /sys/module/vhost/parameters/max_mem_regions = 8 export DIMMSCOUNT=6 QEMU -enable-kvm \ -netdev type=tap,id=guest0,vhost=on,script=no,vhostforce \ -device virtio-net-pci,netdev=guest0 \ -m 256,slots=256,maxmem=2G \ `i=0; while [ $i -lt $DIMMSCOUNT ]; do echo \ "-object memory-backend-ram,id=m$i,size=128M \ -device pc-dimm,id=d$i,memdev=m$i"; i=$(($i + 1)); \ done` will fail to startup with error: "-device pc-dimm,id=d5,memdev=m5: a used vhost backend has no free memory slots left" while it's possible to add the 6th DIMM during hotplug on source. Issue is caused by the fact that number of entries in vhost map is bigger on 1 entry, when -device is processed, than after guest boots up, and that offending entry belongs to 'pc.rom', it's not like vhost intends to do IO in ROM range so making it RO hides region from vhost and makes number of entries in vhost memory map at -device/machine_done time match number of entries after guest boots. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reported-by: Peng Hao <peng.hao2@zte.com.cn> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-08-02 00:13:26 +03:00
Peter Xu	07f7b73398	intel_iommu: use access_flags for iotlb It was cached by read/write separately. Let's merge them. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-08-02 00:13:25 +03:00
Peter Xu	892721d91d	intel_iommu: fix iova for pt IOMMUTLBEntry.iova is returned incorrectly on one PT path (though mostly we cannot really trigger this path, even if we do, we are mostly disgarding this value, so it didn't break anything). Fix it by converting the VTD_PAGE_MASK into the correct definition VTD_PAGE_MASK_4K, then remove VTD_PAGE_MASK. Fixes: b93130 ("intel_iommu: cleanup vtd_{do_}iommu_translate()") Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2017-08-02 00:13:25 +03:00
Vladimir Sementsov-Ogievskiy	8908eb1a4a	trace-events: fix code style: print 0x before hex numbers The only exception are groups of numers separated by symbols '.', ' ', ':', '/', like 'ab.09.7d'. This patch is made by the following: > find . -name trace-events \| xargs python script.py where script.py is the following python script: ========================= #!/usr/bin/env python import sys import re import fileinput rhex = '%[-+ .0-9](?:[hljztL]\|ll\|hh)?(?:x\|X\|"\sPRI[xX][^"]"?)' rgroup = re.compile('((?:' + rhex + '[.:/ ])+' + rhex + ')') rbad = re.compile('(?<!0x)' + rhex) files = sys.argv[1:] for fname in files: for line in fileinput.input(fname, inplace=True): arr = re.split(rgroup, line) for i in range(0, len(arr), 2): arr[i] = re.sub(rbad, '0x\g<0>', arr[i]) sys.stdout.write(''.join(arr)) ========================= Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Message-id: 20170731160135.12101-5-vsementsov@virtuozzo.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-08-01 12:13:07 +01:00
Vladimir Sementsov-Ogievskiy	db73ee4bc8	trace-events: fix code style: %# -> 0x% In trace format '#' flag of printf is forbidden. Fix it to '0x%'. This patch is created by the following: check that we have a problem > find . -name trace-events \| xargs grep '%#' \| wc -l 56 check that there are no cases with additional printf flags before '#' > find . -name trace-events \| xargs grep "%[-+ 0'I]+#" \| wc -l 0 check that there are no wrong usage of '#' and '0x' together > find . -name trace-events \| xargs grep '0x%#' \| wc -l 0 fix the problem > find . -name trace-events \| xargs sed -i 's/%#/0x%/g' [Eric Blake noted that xargs grep '%[-+ 0'I]+#' should be xargs grep "%[-+ 0'I]+#" instead so the shell quoting is correct. --Stefan] Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20170731160135.12101-3-vsementsov@virtuozzo.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-08-01 12:13:07 +01:00
Philippe Mathieu-Daudé	87e0331c5a	docs: fix broken paths to docs/devel/tracing.txt With the move of some docs/ to docs/devel/ on `ac06724a71`, no references were updated. Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2017-07-31 13:12:53 +03:00
Alexey G	7fb394ad8a	xen-mapcache: Fix the bug when overlapping emulated DMA operations may cause inconsistency in guest memory mappings Under certain circumstances normal xen-mapcache functioning may be broken by guest's actions. This may lead to either QEMU performing exit() due to a caught bad pointer (and with QEMU process gone the guest domain simply appears hung afterwards) or actual use of the incorrect pointer inside QEMU address space -- a write to unmapped memory is possible. The bug is hard to reproduce on a i440 machine as multiple DMA sources are required (though it's possible in theory, using multiple emulated devices), but can be reproduced somewhat easily on a Q35 machine using an emulated AHCI controller -- each NCQ queue command slot may be used as an independent DMA source ex. using READ FPDMA QUEUED command, so a single storage device on the AHCI controller port will be enough to produce multiple DMAs (up to 32). The detailed description of the issue follows. Xen-mapcache provides an ability to map parts of a guest memory into QEMU's own address space to work with. There are two types of cache lookups: - translating a guest physical address into a pointer in QEMU's address space, mapping a part of guest domain memory if necessary (while trying to reduce a number of such (re)mappings to a minimum) - translating a QEMU's pointer back to its physical address in guest RAM These lookups are managed via two linked-lists of structures. MapCacheEntry is used for forward cache lookups, while MapCacheRev -- for reverse lookups. Every guest physical address is broken down into 2 parts: address_index = phys_addr >> MCACHE_BUCKET_SHIFT; address_offset = phys_addr & (MCACHE_BUCKET_SIZE - 1); MCACHE_BUCKET_SHIFT depends on a system (32/64) and is equal to 20 for a 64-bit system (which assumed for the further description). Basically, this means that we deal with 1 MB chunks and offsets within those 1 MB chunks. All mappings are created with 1MB-granularity, i.e. 1MB/2MB/3MB etc. Most DMA transfers typically are less than 1MB, however, if the transfer crosses any 1MB border(s) - than a nearest larger mapping size will be used, so ex. a 512-byte DMA transfer with the start address 700FFF80h will actually require a 2MB range. Current implementation assumes that MapCacheEntries are unique for a given address_index and size pair and that a single MapCacheEntry may be reused by multiple requests -- in this case the 'lock' field will be larger than 1. On other hand, each requested guest physical address (with 'lock' flag) is described by each own MapCacheRev. So there may be multiple MapCacheRev entries corresponding to a single MapCacheEntry. The xen-mapcache code uses MapCacheRev entries to retrieve the address_index & size pair which in turn used to find a related MapCacheEntry. The 'lock' field within a MapCacheEntry structure is actually a reference counter which shows a number of corresponding MapCacheRev entries. The bug lies in ability for the guest to indirectly manipulate with the xen-mapcache MapCacheEntries list via a special sequence of DMA operations, typically for storage devices. In order to trigger the bug, guest needs to issue DMA operations in specific order and timing. Although xen-mapcache is protected by the mutex lock -- this doesn't help in this case, as the bug is not due to a race condition. Suppose we have 3 DMA transfers, namely A, B and C, where - transfer A crosses 1MB border and thus uses a 2MB mapping - transfers B and C are normal transfers within 1MB range - and all 3 transfers belong to the same address_index In this case, if all these transfers are to be executed one-by-one (without overlaps), no special treatment necessary -- each transfer's mapping lock will be set and then cleared on unmap before starting the next transfer. The situation changes when DMA transfers overlap in time, ex. like this: \|===== transfer A (2MB) =====\| \|===== transfer B (1MB) =====\| \|===== transfer C (1MB) =====\| time ---> In this situation the following sequence of actions happens: 1. transfer A creates a mapping to 2MB area (lock=1) 2. transfer B (1MB) tries to find available mapping but cannot find one because transfer A is still in progress, and it has 2MB size + non-zero lock. So transfer B creates another mapping -- same address_index, but 1MB size. 3. transfer A completes, making 1st mapping entry available by setting its lock to 0 4. transfer C starts and tries to find available mapping entry and sees that 1st entry has lock=0, so it uses this entry but remaps the mapping to a 1MB size 5. transfer B completes and by this time - there are two locked entries in the MapCacheEntry list with the SAME values for both address_index and size - the entry for transfer B actually resides farther in list while transfer C's entry is first 6. xen_ram_addr_from_mapcache() for transfer B gets correct address_index and size pair from corresponding MapCacheRev entry, but then it starts looking for MapCacheEntry with these values and finds the first entry -- which belongs to transfer C. At this point there may be following possible (bad) consequences: 1. xen_ram_addr_from_mapcache() will use a wrong entry->vaddr_base value in this statement: raddr = (reventry->paddr_index << MCACHE_BUCKET_SHIFT) + ((unsigned long) ptr - (unsigned long) entry->vaddr_base); resulting in an incorrent raddr value returned from the function. The (ptr - entry->vaddr_base) expression may produce both positive and negative numbers and its actual value may differ greatly as there are many map/unmap operations take place. If the value will be beyond guest RAM limits then a "Bad RAM offset" error will be triggered and logged, followed by exit() in QEMU. 2. If raddr value won't exceed guest RAM boundaries, the same sequence of actions will be performed for xen_invalidate_map_cache_entry() on DMA unmap, resulting in a wrong MapCacheEntry being unmapped while DMA operation which uses it is still active. The above example must be extended by one more DMA transfer in order to allow unmapping as the first mapping in the list is sort of resident. The patch modifies the behavior in which MapCacheEntry's are added to the list, avoiding duplicates. Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com> Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>	2017-07-21 17:37:06 -07:00

1 2 3 4 5 ...

1218 Commits