linux

Commit Graph

Author	SHA1	Message	Date
Christoffer Dall	2a2f3e269c	arm64: KVM: Add VGIC device control for arm64 This fixes the build breakage introduced by `c07a0191ef` and adds support for the device control API and save/restore of the VGIC state for ARMv8. The defines were simply missing from the arm64 header files and uaccess.h must be implicitly imported from somewhere else on arm. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-02-14 11:09:49 +01:00
Dan Carpenter	aac5c4226e	KVM: return an error code in kvm_vm_ioctl_register_coalesced_mmio() If kvm_io_bus_register_dev() fails then it returns success but it should return an error code. I also did a little cleanup like removing an impossible NULL test. Cc: stable@vger.kernel.org Fixes: `2b3c246a68` ('KVM: Make coalesced mmio use a device per zone') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-01-30 11:56:09 +01:00
Linus Torvalds	7ebd3faa9b	First round of KVM updates for 3.14; PPC parts will come next week. Nothing major here, just bugfixes all over the place. The most interesting part is the ARM guys' virtualized interrupt controller overhaul, which lets userspace get/set the state and thus enables migration of ARM VMs. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJS3TVKAAoJEBvWZb6bTYbyIFgP/2cmt4ifCuFMaZv4+G1S8jZU uC9ZB/+7vzht/p6zAy+4BxurKbHmSBFkC1OKcxYuy7yB4CQkHabzj4V2vRtqFdwH 5lExP9qh3kqaVLuhnvxLTmkktR3EW4PFy6OI53l5kRNktOXSuZ0aN6K3V7tCg/X0 iL7ASo4bJKlxeWcDpmuVrNgAajmZVfXrjKY7robgBQno+yIsgKhRZRBQHjozA6B8 FpCo/k48RZd/EzIbV/PDDRI4hmmry/lgrO9SKjzq56wSqff2bd/k/KYze4dbAPfd Ps60enPTuHmeEjjb4MMMU4EKHVdTQFUMx/xZCmT4xzoh8s4of6RHphXbfE0SUznQ dTveyEQAR7E3JNS0k1+3WEX5fWlFesp0hO2NeE0wzUq4TAr9ztgVO9NQ6Si15e7Z 2HysO0T5Ojtt0lY08/PvS6i48eCAuuBomrejJS8hLW4SUZ5adn+yW4Qo7Fp9JeBR l9a3LsVT8BZMtUWrUuFcVhlM4MbzElUPjDbgWhR8UYU/kpfVZOQu8qWgGKR4UWXy X7/t9l/tjR99CmfMJBAOzJid+ScSpAfg77BdaKiQrVfVIJmsjEjlO8vUMyj5b1HF hPX5wNyJjHAOfridLeHSs4Rdm4a8sk8Az5d4h76pLVz8M4jyTi2v0rO3N4/dU/pu x7N8KR5hAj+mLBoM9/Al =8sYU -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM updates from Paolo Bonzini: "First round of KVM updates for 3.14; PPC parts will come next week. Nothing major here, just bugfixes all over the place. The most interesting part is the ARM guys' virtualized interrupt controller overhaul, which lets userspace get/set the state and thus enables migration of ARM VMs" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (67 commits) kvm: make KVM_MMU_AUDIT help text more readable KVM: s390: Fix memory access error detection KVM: nVMX: Update guest activity state field on L2 exits KVM: nVMX: Fix nested_run_pending on activity state HLT KVM: nVMX: Clean up handling of VMX-related MSRs KVM: nVMX: Add tracepoints for nested_vmexit and nested_vmexit_inject KVM: nVMX: Pass vmexit parameters to nested_vmx_vmexit KVM: nVMX: Leave VMX mode on clearing of feature control MSR KVM: VMX: Fix DR6 update on #DB exception KVM: SVM: Fix reading of DR6 KVM: x86: Sync DR7 on KVM_SET_DEBUGREGS add support for Hyper-V reference time counter KVM: remove useless write to vcpu->hv_clock.tsc_timestamp KVM: x86: fix tsc catchup issue with tsc scaling KVM: x86: limit PIT timer frequency KVM: x86: handle invalid root_hpa everywhere kvm: Provide kvm_vcpu_eligible_for_directed_yield() stub kvm: vfio: silence GCC warning KVM: ARM: Remove duplicate include arm/arm64: KVM: relax the requirements of VMA alignment for THP ...	2014-01-22 21:40:43 -08:00
Scott Wood	4a55dd7273	kvm: Provide kvm_vcpu_eligible_for_directed_yield() stub Commit `7940876e13` ("kvm: make local functions static") broke KVM PPC builds due to removing (rather than moving) the stub version of kvm_vcpu_eligible_for_directed_yield(). This patch reintroduces it. Signed-off-by: Scott Wood <scottwood@freescale.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Alexander Graf <agraf@suse.de> [Move the #ifdef inside the function. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-01-15 12:03:23 +01:00
Paul Bolle	e81d1ad327	kvm: vfio: silence GCC warning Building vfio.o triggers a GCC warning (when building for 32 bits x86): arch/x86/kvm/../../../virt/kvm/vfio.c: In function 'kvm_vfio_set_group': arch/x86/kvm/../../../virt/kvm/vfio.c:104:22: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] void __user argp = (void __user )arg; ^ Silence this warning by casting arg to unsigned long. argp's current type, "void __user ", is always casted to "int32_t __user ". So its type might as well be changed to "int32_t __user *". Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-01-15 12:01:48 +01:00
Stephen Hemminger	ea0269bc34	kvm: remove dead code The function kvm_io_bus_read_cookie is defined but never used in current in-tree code. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2014-01-08 19:03:00 -02:00
Stephen Hemminger	7940876e13	kvm: make local functions static Running 'make namespacecheck' found lots of functions that should be declared static, since only used in one file. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2014-01-08 19:02:58 -02:00
Christoffer Dall	fa20f5aea5	KVM: arm-vgic: Support CPU interface reg access Implement support for the CPU interface register access driven by MMIO address offsets from the CPU interface base address. Useful for user space to support save/restore of the VGIC state. This commit adds support only for the same logic as the current VGIC support, and no more. For example, the active priority registers are handled as RAZ/WI, just like setting priorities on the emulated distributor. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2013-12-21 10:02:10 -08:00
Christoffer Dall	90a5355ee7	KVM: arm-vgic: Add GICD_SPENDSGIR and GICD_CPENDSGIR handlers Handle MMIO accesses to the two registers which should support both the case where the VMs want to read/write either of these registers and the case where user space reads/writes these registers to do save/restore of the VGIC state. Note that the added complexity compared to simple set/clear enable registers stems from the bookkeping of source cpu ids. It may be possible to change the underlying data structure to simplify the complexity, but since this is not in the critical path at all, this will do. Also note that reading this register from a live guest will not be accurate compared to on hardware, because some state may be living on the CPU LRs and the only way to give a consistent read would be to force stop all the VCPUs and request them to unqueu the LR state onto the distributor. Until we have an actual user of live reading this register, we can live with the difference. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2013-12-21 10:02:04 -08:00
Christoffer Dall	cbd333a4bf	KVM: arm-vgic: Support unqueueing of LRs to the dist To properly access the VGIC state from user space it is very unpractical to have to loop through all the LRs in all register access functions. Instead, support moving all pending state from LRs to the distributor, but leave active state LRs alone. Note that to accurately present the active and pending state to VCPUs reading these distributor registers from a live VM, we would have to stop all other VPUs than the calling VCPU and ask each CPU to unqueue their LR state onto the distributor and add fields to track active state on the distributor side as well. We don't have any users of such functionality yet and there are other inaccuracies of the GIC emulation, so don't provide accurate synchronized access to this state just yet. However, when the time comes, having this function should help. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2013-12-21 10:01:44 -08:00
Christoffer Dall	c07a0191ef	KVM: arm-vgic: Add vgic reg access from dev attr Add infrastructure to handle distributor and cpu interface register accesses through the KVM_{GET/SET}_DEVICE_ATTR interface by adding the KVM_DEV_ARM_VGIC_GRP_DIST_REGS and KVM_DEV_ARM_VGIC_GRP_CPU_REGS groups and defining the semantics of the attr field to be the MMIO offset as specified in the GICv2 specs. Missing register accesses or other changes in individual register access functions to support save/restore of the VGIC state is added in subsequent patches. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2013-12-21 10:01:39 -08:00
Christoffer Dall	1006e8cb22	KVM: arm-vgic: Make vgic mmio functions more generic Rename the vgic_ranges array to vgic_dist_ranges to be more specific and to prepare for handling CPU interface register access as well (for save/restore of VGIC state). Pass offset from distributor or interface MMIO base to find_matching_range function instead of the physical address of the access in the VM memory map. This allows other callers unaware of the VM specifics, but with generic VGIC knowledge to reuse the function. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2013-12-21 10:01:31 -08:00
Christoffer Dall	ce01e4e887	KVM: arm-vgic: Set base addr through device API Support setting the distributor and cpu interface base addresses in the VM physical address space through the KVM_{SET,GET}_DEVICE_ATTR API in addition to the ARM specific API. This has the added benefit of being able to share more code in user space and do things in a uniform manner. Also deprecate the older API at the same time, but backwards compatibility will be maintained. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2013-12-21 10:01:22 -08:00
Christoffer Dall	7330672bef	KVM: arm-vgic: Support KVM_CREATE_DEVICE for VGIC Support creating the ARM VGIC device through the KVM_CREATE_DEVICE ioctl, which can then later be leveraged to use the KVM_{GET/SET}_DEVICE_ATTR, which is useful both for setting addresses in a more generic API than the ARM-specific one and is useful for save/restore of VGIC state. Adds KVM_CAP_DEVICE_CTRL to ARM capabilities. Note that we change the check for creating a VGIC from bailing out if any VCPUs were created, to bailing out if any VCPUs were ever run. This is an important distinction that shouldn't break anything, but allows creating the VGIC after the VCPUs have been created. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2013-12-21 10:01:16 -08:00
Christoffer Dall	e1ba0207a1	ARM: KVM: Allow creating the VGIC after VCPUs Rework the VGIC initialization slightly to allow initialization of the vgic cpu-specific state even if the irqchip (the VGIC) hasn't been created by user space yet. This is safe, because the vgic data structures are already allocated when the CPU is allocated if VGIC support is compiled into the kernel. Further, the init process does not depend on any other information and the sacrifice is a slight performance degradation for creating VMs in the no-VGIC case. The reason is that the new device control API doesn't mandate creating the VGIC before creating the VCPU and it is unreasonable to require user space to create the VGIC before creating the VCPUs. At the same time move the irqchip_in_kernel check out of kvm_vcpu_first_run_init and into the init function to make the per-vcpu and global init functions symmetric and add comments on the exported functions making it a bit easier to understand the init flow by only looking at vgic.c. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2013-12-21 10:01:06 -08:00
Andre Przywara	39735a3a39	ARM/KVM: save and restore generic timer registers For migration to work we need to save (and later restore) the state of each core's virtual generic timer. Since this is per VCPU, we can use the [gs]et_one_reg ioctl and export the three needed registers (control, counter, compare value). Though they live in cp15 space, we don't use the existing list, since they need special accessor functions and the arch timer is optional. Acked-by: Marc Zynger <marc.zyngier@arm.com> Signed-off-by: Andre Przywara <andre.przywara@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2013-12-21 10:00:15 -08:00
Christoffer Dall	a1a64387ad	arm/arm64: KVM: arch_timer: Initialize cntvoff at kvm_init Initialize the cntvoff at kvm_init_vm time, not before running the VCPUs at the first time because that will overwrite any potentially restored values from user space. Cc: Andre Przywara <andre.przywara@linaro.org> Acked-by: Marc Zynger <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2013-12-21 09:58:57 -08:00
Takuya Yoshikawa	c08ac06ab3	KVM: Use cond_resched() directly and remove useless kvm_resched() Since the commit `15ad7146` ("KVM: Use the scheduler preemption notifiers to make kvm preemptible"), the remaining stuff in this function is a simple cond_resched() call with an extra need_resched() check which was there to avoid dropping VCPUs unnecessarily. Now it is meaningless. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-12-13 14:23:45 +01:00
Andy Honig	338c7dbadd	KVM: Improve create VCPU parameter (CVE-2013-4587) In multiple functions the vcpu_id is used as an offset into a bitfield. Ag malicious user could specify a vcpu_id greater than 255 in order to set or clear bits in kernel memory. This could be used to elevate priveges in the kernel. This patch verifies that the vcpu_id provided is less than 255. The api documentation already specifies that the vcpu_id must be less than max_vcpus, but this is currently not checked. Reported-by: Andrew Honig <ahonig@google.com> Cc: stable@vger.kernel.org Signed-off-by: Andrew Honig <ahonig@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-12-12 22:39:33 +01:00
Heiko Carstens	8a3caa6d74	KVM: kvm_clear_guest_page(): fix empty_zero_page usage Using the address of 'empty_zero_page' as source address in order to clear a page is wrong. On some architectures empty_zero_page is only the pointer to the struct page of the empty_zero_page. Therefore the clear page operation would copy the contents of a couple of struct pages instead of clearing a page. For kvm only arm/arm64 are affected by this bug. To fix this use the ZERO_PAGE macro instead which will return the struct page address of the empty_zero_page on all architectures. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-11-21 11:19:32 +02:00
Linus Torvalds	f080480488	Here are the 3.13 KVM changes. There was a lot of work on the PPC side: the HV and emulation flavors can now coexist in a single kernel is probably the most interesting change from a user point of view. On the x86 side there are nested virtualization improvements and a few bugfixes. ARM got transparent huge page support, improved overcommit, and support for big endian guests. Finally, there is a new interface to connect KVM with VFIO. This helps with devices that use NoSnoop PCI transactions, letting the driver in the guest execute WBINVD instructions. This includes some nVidia cards on Windows, that fail to start without these patches and the corresponding userspace changes. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJShPAhAAoJEBvWZb6bTYbyl48P/297GgmELHAGBgjvb6q7yyGu L8+eHjKbh4XBAkPwyzbvUjuww5z2hM0N3JQ0BDV9oeXlO+zwwCEns/sg2Q5/NJXq XxnTeShaKnp9lqVBnE6G9rAOUWKoyLJ2wItlvUL8JlaO9xJ0Vmk0ta4n2Nv5GqDp db6UD7vju6rHtIAhNpvvAO51kAOwc01xxRixCVb7KUYOnmO9nvpixzoI/S0Rp1gu w/OWMfCosDzBoT+cOe79Yx1OKcpaVW94X6CH1s+ShCw3wcbCL2f13Ka8/E3FIcuq vkZaLBxio7vjUAHRjPObw0XBW4InXEbhI1DjzIvm8dmc4VsgmtLQkTCG8fj+jINc dlHQUq6Do+1F4zy6WMBUj8tNeP1Z9DsABp98rQwR8+BwHoQpGQBpAxW0TE0ZMngC t1caqyvjZ5pPpFUxSrAV+8Kg4AvobXPYOim0vqV7Qea07KhFcBXLCfF7BWdwq/Jc 0CAOlsLL4mHGIQWZJuVGw0YGP7oATDCyewlBuDObx+szYCoV4fQGZVBEL0KwJx/1 7lrLN7JWzRyw6xTgJ5VVwgYE1tUY4IFQcHu7/5N+dw8/xg9KWA3f4PeMavIKSf+R qteewbtmQsxUnvuQIBHLs8NRWPnBPy+F3Sc2ckeOLIe4pmfTte6shtTXcLDL+LqH NTmT/cfmYp2BRkiCfCiS =rWNf -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM changes from Paolo Bonzini: "Here are the 3.13 KVM changes. There was a lot of work on the PPC side: the HV and emulation flavors can now coexist in a single kernel is probably the most interesting change from a user point of view. On the x86 side there are nested virtualization improvements and a few bugfixes. ARM got transparent huge page support, improved overcommit, and support for big endian guests. Finally, there is a new interface to connect KVM with VFIO. This helps with devices that use NoSnoop PCI transactions, letting the driver in the guest execute WBINVD instructions. This includes some nVidia cards on Windows, that fail to start without these patches and the corresponding userspace changes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (146 commits) kvm, vmx: Fix lazy FPU on nested guest arm/arm64: KVM: PSCI: propagate caller endianness to the incoming vcpu arm/arm64: KVM: MMIO support for BE guest kvm, cpuid: Fix sparse warning kvm: Delete prototype for non-existent function kvm_check_iopl kvm: Delete prototype for non-existent function complete_pio hung_task: add method to reset detector pvclock: detect watchdog reset at pvclock read kvm: optimize out smp_mb after srcu_read_unlock srcu: API for barrier after srcu read unlock KVM: remove vm mmap method KVM: IOMMU: hva align mapping page size KVM: x86: trace cpuid emulation when called from emulator KVM: emulator: cleanup decode_register_operand() a bit KVM: emulator: check rex prefix inside decode_register() KVM: x86: fix emulation of "movzbl %bpl, %eax" kvm_host: typo fix KVM: x86: emulate SAHF instruction MAINTAINERS: add tree for kvm.git Documentation/kvm: add a 00-INDEX file ...	2013-11-15 13:51:36 +09:00
Gleb Natapov	80f5b5e700	KVM: remove vm mmap method It was used in conjunction with KVM_SET_MEMORY_REGION ioctl which was removed by `b74a07beed` in 2010, QEMU stopped using it in 2008, so it is time to remove the code finally. Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-11-06 09:28:47 +02:00
Greg Edwards	27ef63c7e9	KVM: IOMMU: hva align mapping page size When determining the page size we could use to map with the IOMMU, the page size should also be aligned with the hva, not just the gfn. The gfn may not reflect the real alignment within the hugetlbfs file. Most of the time, this works fine. However, if the hugetlbfs file is backed by non-contiguous huge pages, a multi-huge page memslot starts at an unaligned offset within the hugetlbfs file, and the gfn is aligned with respect to the huge page size, kvm_host_page_size() will return the huge page size and we will use that to map with the IOMMU. When we later unpin that same memslot, the IOMMU returns the unmap size as the huge page size, and we happily unpin that many pfns in monotonically increasing order, not realizing we are spanning non-contiguous huge pages and partially unpin the wrong huge page. Ensure the IOMMU mapping page size is aligned with the hva corresponding to the gfn, which does reflect the alignment within the hugetlbfs file. Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Greg Edwards <gedwards@ddn.com> Cc: stable@vger.kernel.org Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-11-05 09:55:36 +02:00
Gleb Natapov	95f328d3ad	Merge branch 'kvm-ppc-queue' of git://github.com/agraf/linux-2.6 into queue Conflicts: arch/powerpc/include/asm/processor.h	2013-11-04 10:20:57 +02:00
Alex Williamson	e0f0bbc527	kvm: Create non-coherent DMA registeration We currently use some ad-hoc arch variables tied to legacy KVM device assignment to manage emulation of instructions that depend on whether non-coherent DMA is present. Create an interface for this, adapting legacy KVM device assignment and adding VFIO via the KVM-VFIO device. For now we assume that non-coherent DMA is possible any time we have a VFIO group. Eventually an interface can be developed as part of the VFIO external user interface to query the coherency of a group. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-10-30 19:02:23 +01:00
Alex Williamson	d96eb2c6f4	kvm/x86: Convert iommu_flags to iommu_noncoherent Default to operating in coherent mode. This simplifies the logic when we switch to a model of registering and unregistering noncoherent I/O with KVM. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-10-30 19:02:13 +01:00
Alex Williamson	ec53500fae	kvm: Add VFIO device So far we've succeeded at making KVM and VFIO mostly unaware of each other, but areas are cropping up where a connection beyond eventfds and irqfds needs to be made. This patch introduces a KVM-VFIO device that is meant to be a gateway for such interaction. The user creates the device and can add and remove VFIO groups to it via file descriptors. When a group is added, KVM verifies the group is valid and gets a reference to it via the VFIO external user interface. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-10-30 19:02:03 +01:00
Paolo Bonzini	0c8eb04a62	KVM: use a more sensible error number when debugfs directory creation fails I don't know if this was due to cut and paste, or somebody was really using a D20 to pick the error code for kvm_init_debugfs as suggested by Linus (EFAULT is 14, so the possibility cannot be entirely ruled out). In any case, this patch fixes it. Reported-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-10-30 12:15:34 +01:00
Yang Zhang	e0230e1327	KVM: Mapping IOMMU pages after updating memslot In kvm_iommu_map_pages(), we need to know the page size via call kvm_host_page_size(). And it will check whether the target slot is valid before return the right page size. Currently, we will map the iommu pages when creating a new slot. But we call kvm_iommu_map_pages() during preparing the new slot. At that time, the new slot is not visible by domain(still in preparing). So we cannot get the right page size from kvm_host_page_size() and this will break the IOMMU super page logic. The solution is to map the iommu pages after we insert the new slot into domain. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Tested-by: Patrick Lu <patrick.lu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-10-28 13:15:01 +01:00
Gleb Natapov	13acfd5715	Powerpc KVM work is based on a commit after rc4. Merging master into next to satisfy the dependencies. Conflicts: arch/arm/kvm/reset.c	2013-10-17 17:41:49 +03:00
Aneesh Kumar K.V	5587027ce9	kvm: Add struct kvm arg to memslot APIs We will use that in the later patch to find the kvm ops handler Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-10-17 15:49:23 +02:00
Aneesh Kumar K.V	2ba9f0d887	kvm: powerpc: book3s: Support building HV and PR KVM as module Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [agraf: squash in compile fix] Signed-off-by: Alexander Graf <agraf@suse.de>	2013-10-17 15:45:35 +02:00
chai wen	f2e106692d	KVM: Drop FOLL_GET in GUP when doing async page fault Page pinning is not mandatory in kvm async page fault processing since after async page fault event is delivered to a guest it accesses page once again and does its own GUP. Drop the FOLL_GET flag in GUP in async_pf code, and do some simplifying in check/clear processing. Suggested-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Gu zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: chai wen <chaiw.fnst@cn.fujitsu.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-10-15 13:43:37 +03:00
Andre Richter	29242cb5c6	virt/kvm/iommu.c: Add leading zeros to device's BDF notation in debug messages When KVM (de)assigns PCI(e) devices to VMs, a debug message is printed including the BDF notation of the respective device. Currently, the BDF notation does not have the commonly used leading zeros. This produces messages like "assign device 0:1:8.0", which look strange at first sight. The patch fixes this by exchanging the printk(KERN_DEBUG ...) with dev_info() and also inserts "kvm" into the debug message, so that it is obvious where the message comes from. Also reduces LoC. Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Andre Richter <andre.o.richter@gmail.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-10-03 11:47:02 +03:00
Gleb Natapov	a2ac07fe29	Fix NULL dereference in gfn_to_hva_prot() gfn_to_memslot() can return NULL or invalid slot. We need to check slot validity before accessing it. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-10-03 10:08:52 +03:00
Paolo Bonzini	2f303b74a6	KVM: Convert kvm_lock back to non-raw spinlock In commit `e935b8372c` ("KVM: Convert kvm_lock to raw_spinlock"), the kvm_lock was made a raw lock. However, the kvm mmu_shrink() function tries to grab the (non-raw) mmu_lock within the scope of the raw locked kvm_lock being held. This leads to the following: BUG: sleeping function called from invalid context at kernel/rtmutex.c:659 in_atomic(): 1, irqs_disabled(): 0, pid: 55, name: kswapd0 Preemption disabled at:[<ffffffffa0376eac>] mmu_shrink+0x5c/0x1b0 [kvm] Pid: 55, comm: kswapd0 Not tainted 3.4.34_preempt-rt Call Trace: [<ffffffff8106f2ad>] __might_sleep+0xfd/0x160 [<ffffffff817d8d64>] rt_spin_lock+0x24/0x50 [<ffffffffa0376f3c>] mmu_shrink+0xec/0x1b0 [kvm] [<ffffffff8111455d>] shrink_slab+0x17d/0x3a0 [<ffffffff81151f00>] ? mem_cgroup_iter+0x130/0x260 [<ffffffff8111824a>] balance_pgdat+0x54a/0x730 [<ffffffff8111fe47>] ? set_pgdat_percpu_threshold+0xa7/0xd0 [<ffffffff811185bf>] kswapd+0x18f/0x490 [<ffffffff81070961>] ? get_parent_ip+0x11/0x50 [<ffffffff81061970>] ? __init_waitqueue_head+0x50/0x50 [<ffffffff81118430>] ? balance_pgdat+0x730/0x730 [<ffffffff81060d2b>] kthread+0xdb/0xe0 [<ffffffff8106e122>] ? finish_task_switch+0x52/0x100 [<ffffffff817e1e94>] kernel_thread_helper+0x4/0x10 [<ffffffff81060c50>] ? __init_kthread_worker+0x After the previous patch, kvm_lock need not be a raw spinlock anymore, so change it back. Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: kvm@vger.kernel.org Cc: gleb@redhat.com Cc: jan.kiszka@siemens.com Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-09-30 09:21:51 +02:00
Paolo Bonzini	4a937f96f3	KVM: protect kvm_usage_count with its own spinlock The VM list need not be protected by a raw spinlock. Separate the two so that kvm_lock can be made non-raw. Cc: kvm@vger.kernel.org Cc: gleb@redhat.com Cc: jan.kiszka@siemens.com Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-09-30 09:21:46 +02:00
Paolo Bonzini	4fa92fb25a	KVM: cleanup (physical) CPU hotplug Remove the useless argument, and do not do anything if there are no VMs running at the time of the hotplug. Cc: kvm@vger.kernel.org Cc: gleb@redhat.com Cc: jan.kiszka@siemens.com Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-09-30 09:21:30 +02:00
Radim Krčmář	98fda16929	kvm: remove .done from struct kvm_async_pf '.done' is used to mark the completion of 'async_pf_execute()', but 'cancel_work_sync()' returns true when the work was canceled, so we use it instead. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-09-24 19:12:12 +02:00
Radim Krčmář	28b441e240	kvm: free resources after canceling async_pf When we cancel 'async_pf_execute()', we should behave as if the work was never scheduled in 'kvm_setup_async_pf()'. Fixes a bug when we can't unload module because the vm wasn't destroyed. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-09-17 12:53:15 +03:00
Paolo Bonzini	ba6a354154	KVM: mmu: allow page tables to be in read-only slots Page tables in a read-only memory slot will currently cause a triple fault because the page walker uses gfn_to_hva and it fails on such a slot. OVMF uses such a page table; however, real hardware seems to be fine with that as long as the accessed/dirty bits are set. Save whether the slot is readonly, and later check it when updating the accessed and dirty bits. Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-09-17 12:52:31 +03:00
Linus Torvalds	45d9a2220f	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs pile 1 from Al Viro: "Unfortunately, this merge window it'll have a be a lot of small piles - my fault, actually, for not keeping #for-next in anything that would resemble a sane shape ;-/ This pile: assorted fixes (the first 3 are -stable fodder, IMO) and cleanups + %pd/%pD formats (dentry/file pathname, up to 4 last components) + several long-standing patches from various folks. There definitely will be a lot more (starting with Miklos' check_submount_and_drop() series)" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits) direct-io: Handle O_(D)SYNC AIO direct-io: Implement generic deferred AIO completions add formats for dentry/file pathnames kvm eventfd: switch to fdget powerpc kvm: use fdget switch fchmod() to fdget switch epoll_ctl() to fdget switch copy_module_from_fd() to fdget git simplify nilfs check for busy subtree ibmasmfs: don't bother passing superblock when not needed don't pass superblock to hypfs_{mkdir,create*} don't pass superblock to hypfs_diag_create_files don't pass superblock to hypfs_vm_create_files() oprofile: get rid of pointless forward declarations of struct super_block oprofilefs_create_...() do not need superblock argument oprofilefs_mkdir() doesn't need superblock argument don't bother with passing superblock to oprofile_create_stats_files() oprofile: don't bother with passing superblock to ->create_files() don't bother passing sb to oprofile_create_files() coh901318: don't open-code simple_read_from_buffer() ...	2013-09-05 08:50:26 -07:00
Al Viro	cffe78d92c	kvm eventfd: switch to fdget Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-09-03 23:04:45 -04:00
Christoffer Dall	8d98915b6b	ARM: KVM: Bugfix: vgic_bytemap_get_reg per cpu regs For bytemaps each IRQ field is 1 byte wide, so we pack 4 irq fields in one word and since there are 32 private (per cpu) irqs, we have 8 private u32 fields on the vgic_bytemap struct. We shift the offset from the base of the register group right by 2, giving us the word index instead of the field index. But then there are 8 private words, not 4, which is also why we subtract 8 words from the offset of the shared words. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-08-30 16:12:38 +03:00
Marc Zyngier	6545eae3d7	ARM: KVM: vgic: fix GICD_ICFGRn access All the code in handle_mmio_cfg_reg() assumes the offset has been shifted right to accomodate for the 2:1 bit compression, but this is only done when getting the register address. Shift the offset early so the code works mostly unchanged. Reported-by: Zhaobo (Bob, ERC) <zhaobo@huawei.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-08-30 16:12:16 +03:00
Marc Zyngier	986af8e078	ARM: KVM: vgic: simplify vgic_get_target_reg vgic_get_target_reg is quite complicated, for no good reason. Actually, it is fairly easy to write it in a much more efficient way by using the target CPU array instead of the bitmap. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-08-30 16:03:07 +03:00
Paolo Bonzini	c21fbff16b	KVM: rename __kvm_io_bus_sort_cmp to kvm_io_bus_cmp This is the type-safe comparison function, so the double-underscore is not related. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-08-28 09:39:40 +03:00
Andrea Arcangeli	11feeb4980	kvm: optimize away THP checks in kvm_is_mmio_pfn() The checks on PG_reserved in the page structure on head and tail pages aren't necessary because split_huge_page wouldn't transfer the PG_reserved bit from head to tail anyway. This was a forward-thinking check done in the case PageReserved was set by a driver-owned page mapped in userland with something like remap_pfn_range in a VM_PFNMAP region, but using hugepmds (not possible right now). It was meant to be very safe, but it's overkill as it's unlikely split_huge_page could ever run without the driver noticing and tearing down the hugepage itself. And if a driver in the future will really want to map a reserved hugepage in userland using an huge pmd it should simply take care of marking all subpages reserved too to keep KVM safe. This of course would require such a hypothetical driver to tear down the huge pmd itself and splitting the hugepage itself, instead of relaying on split_huge_page, but that sounds very reasonable, especially considering split_huge_page wouldn't currently transfer the reserved bit anyway. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-08-27 11:01:10 +03:00
Yann Droneaud	24009b0549	kvm: use anon_inode_getfd() with O_CLOEXEC flag KVM uses anon_inode_get() to allocate file descriptors as part of some of its ioctls. But those ioctls are lacking a flag argument allowing userspace to choose options for the newly opened file descriptor. In such case it's advised to use O_CLOEXEC by default so that userspace is allowed to choose, without race, if the file descriptor is going to be inherited across exec(). This patch set O_CLOEXEC flag on all file descriptors created with anon_inode_getfd() to not leak file descriptors across exec(). Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Link: http://lkml.kernel.org/r/cover.1377372576.git.ydroneaud@opteya.com Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-08-26 13:19:35 +03:00
Paolo Bonzini	a343c9b767	KVM: introduce __kvm_io_bus_sort_cmp kvm_io_bus_sort_cmp is used also directly, not just as a callback for sort and bsearch. In these cases, it is handy to have a type-safe variant. This patch introduces such a variant, __kvm_io_bus_sort_cmp, and uses it throughout kvm_main.c. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-07-29 09:01:14 +02:00

1 2 3 4 5 ...

610 Commits