Commit Graph

319 Commits

Author SHA1 Message Date
Marc-André Lureau
0ec7b3e7f2 char: rename CharDriverState Chardev
Pick a uniform chardev type name.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-27 18:07:59 +01:00
Phil Dennis-Jordan
0b564e6f53 pc: Enable vmware-cpuid-freq CPU option for 2.9+ machine types
Signed-off-by: Phil Dennis-Jordan <phil@philjordan.eu>
Message-Id: <1484921496-11257-4-git-send-email-phil@philjordan.eu>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-27 18:07:58 +01:00
Laszlo Ersek
b8bab8eb69 hw/isa/lpc_ich9: negotiate SMI broadcast on pc-q35-2.9+ machine types
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20170126014416.11211-4-lersek@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-27 18:07:31 +01:00
Laszlo Ersek
5ce45c7a2b hw/isa/lpc_ich9: add broadcast SMI feature
The generic edk2 SMM infrastructure prefers
EFI_SMM_CONTROL2_PROTOCOL.Trigger() to inject an SMI on each processor. If
Trigger() only brings the current processor into SMM, then edk2 handles it
in the following ways:

(1) If Trigger() is executed by the BSP (which is guaranteed before
    ExitBootServices(), but is not necessarily true at runtime), then:

    (a) If edk2 has been configured for "traditional" SMM synchronization,
        then the BSP sends directed SMIs to the APs with APIC delivery,
        bringing them into SMM individually. Then the BSP runs the SMI
        handler / dispatcher.

    (b) If edk2 has been configured for "relaxed" SMM synchronization,
        then the APs that are not already in SMM are not brought in, and
        the BSP runs the SMI handler / dispatcher.

(2) If Trigger() is executed by an AP (which is possible after
    ExitBootServices(), and can be forced e.g. by "taskset -c 1
    efibootmgr"), then the AP in question brings in the BSP with a
    directed SMI, and the BSP runs the SMI handler / dispatcher.

The smaller problem with (1a) and (2) is that the BSP and AP
synchronization is slow. For example, the "taskset -c 1 efibootmgr"
command from (2) can take more than 3 seconds to complete, because
efibootmgr accesses non-volatile UEFI variables intensively.

The larger problem is that QEMU's current behavior diverges from the
behavior usually seen on physical hardware, and that keeps exposing
obscure corner cases, race conditions and other instabilities in edk2,
which generally expects / prefers a software SMI to affect all CPUs at
once.

Therefore introduce the "broadcast SMI" feature that causes QEMU to inject
the SMI on all VCPUs.

While the original posting of this patch
<http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg05658.html>
only intended to speed up (2), based on our recent "stress testing" of SMM
this patch actually provides functional improvements.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20170126014416.11211-3-lersek@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-27 18:07:31 +01:00
Laszlo Ersek
50de920b37 hw/isa/lpc_ich9: add SMI feature negotiation via fw_cfg
Introduce the following fw_cfg files:

- "etc/smi/supported-features": a little endian uint64_t feature bitmap,
  presenting the features known by the host to the guest. Read-only for
  the guest.

  The content of this file will be determined via bit-granularity ICH9-LPC
  device properties, to be introduced later. For now, the bitmask is left
  zeroed. The bits will be set from machine type compat properties and on
  the QEMU command line, hence this file is not migrated.

- "etc/smi/requested-features": a little endian uint64_t feature bitmap,
  representing the features the guest would like to request. Read-write
  for the guest.

  The guest can freely (re)write this file, it has no direct consequence.
  Initial value is zero. A nonzero value causes the SMI-related fw_cfg
  files and fields that are under guest influence to be migrated.

- "etc/smi/features-ok": contains a uint8_t value, and it is read-only for
  the guest. When the guest selects the associated fw_cfg key, the guest
  features are validated against the host features. In case of error, the
  negotiation doesn't proceed, and the "features-ok" file remains zero. In
  case of success, the "features-ok" file becomes (uint8_t)1, and the
  negotiated features are locked down internally (to which no further
  changes are possible until reset).

  The initial value is zero.  A nonzero value causes the SMI-related
  fw_cfg files and fields that are under guest influence to be migrated.

The C-language fields backing the "supported-features" and
"requested-features" files are uint8_t arrays. This is because they carry
guest-side representation (our choice is little endian), while
VMSTATE_UINT64() assumes / implies host-side endianness for any uint64_t
fields. If we migrate a guest between hosts with different endiannesses
(which is possible with TCG), then the host-side value is preserved, and
the host-side representation is translated. This would be visible to the
guest through fw_cfg, unless we used plain byte arrays. So we do.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20170126014416.11211-2-lersek@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-27 18:07:31 +01:00
Pavel Dovgalyuk
07bfa35477 apic: save apic_delivered flag
This patch implements saving/restoring of static apic_delivered variable.

v8: saving static variable only for one of the APICs

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Message-Id: <20170126123429.5412.94368.stgit@PASHA-ISP>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-27 18:07:30 +01:00
Igor Mammedov
80e5db303d machine: Make possible_cpu_arch_ids() return const pointer
make sure that external callers won't try to modify
possible_cpus and owner of possible_cpus can access
it directly when it modifies it.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1484759609-264075-5-git-send-email-imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-01-23 21:25:37 -02:00
Marcelo Tosatti
abc62c89f3 pc.h: move x-mach-use-reliable-get-clock compat entry to PC_COMPAT_2_8
As noticed by David Gilbert, commit 6053a86 'kvmclock: reduce kvmclock
differences on migration' added 'x-mach-use-reliable-get-clock' and a
compatibility entry that turns it off; however it got merged after 2.8.0
was released but the entry has gone into PC_COMPAT_2_7 where it should
have gone into PC_COMPAT_2_8.

Fix it by moving the entry to PC_COMPAT_2_8.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20170118175343.GA26873@amt.cnet>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-20 13:22:57 +01:00
Jason Wang
554f5e1604 intel_iommu: support device iotlb descriptor
This patch enables device IOTLB support for intel iommu. The major
work is to implement QI device IOTLB descriptor processing and notify
the device through iommu notifier.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
2017-01-10 05:56:58 +02:00
Marcelo Tosatti
6053a86fe7 kvmclock: reduce kvmclock difference on migration
Check for KVM_CAP_ADJUST_CLOCK capability KVM_CLOCK_TSC_STABLE, which
indicates that KVM_GET_CLOCK returns a value as seen by the guest at
that moment.

For new machine types, use this value rather than reading
from guest memory.

This reduces kvmclock difference on migration from 5s to 0.1s
(when max_downtime == 5s).

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Message-Id: <20161121105052.598267440@redhat.com>
[Add comment explaining what is going on. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-12-22 16:00:56 +01:00
Chao Peng
feddd2fd91 pc: make pit configurable
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Message-Id: <1478330391-74060-4-git-send-email-chao.p.peng@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-12-22 16:00:25 +01:00
Chao Peng
272f042877 pc: make sata configurable
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Message-Id: <1478330391-74060-3-git-send-email-chao.p.peng@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-12-22 16:00:25 +01:00
Chao Peng
be232eb076 pc: make smbus configurable
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Message-Id: <1478330391-74060-2-git-send-email-chao.p.peng@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-12-22 16:00:25 +01:00
Dr. David Alan Gilbert
f9f885b78a migration/pcspk: Turn migration of pcspk off for 2.7 and older
To keep backwards migration compatibility allow us to turn pcspk
migration off.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20161128133201.16104-3-dgilbert@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-28 16:45:12 +01:00
Igor Mammedov
e3cadac073 pc: fix FW_CFG_NB_CPUS to account for -device added CPUs
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1479301481-197333-1-git-send-email-imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-11-16 12:10:00 -02:00
Igor Mammedov
eabff15820 Revert "pc: Add 'etc/boot-cpus' fw_cfg file for machine with more than 255 CPUs"
This reverts commit 080ac219cc.

Legacy FW_CFG_NB_CPUS will be reused instead of 'etc/boot-cpus'
fw_cfg file since it does the same and there is no point
to maintaing duplicate guest ABI, if it can be helped.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1479212236-183810-2-git-send-email-imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-11-16 12:09:53 -02:00
Peter Xu
1a43713b02 intel_iommu: fix several incorrect endianess and bit fields
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-11-15 17:20:36 +02:00
Wei Liu
021746c131 PCMachineState: introduce acpi_build_enabled field
Introduce this field to control whether ACPI build is enabled by a
particular machine or accelerator.

It defaults to true if the machine itself supports ACPI build. Xen
accelerator will disable it because Xen is in charge of building ACPI
tables for the guest.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Tested-by: Sander Eikelenboom <linux@eikelenboom.it>
2016-11-02 12:26:12 -07:00
Anand J
814bb12a56 clean-up: removed duplicate #includes
Some files contain multiple #includes of the same header file.
Removed most of those unnecessary duplicate entries using
scripts/clean-includes.

Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Anand J <anand.indukala@gmail.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2016-10-28 18:17:24 +03:00
Igor Mammedov
080ac219cc pc: Add 'etc/boot-cpus' fw_cfg file for machine with more than 255 CPUs
Currently firmware uses 1 byte at 0x5F offset in RTC CMOS
to get number of CPUs present at boot. However 1 byte is
not enough to handle more than 255 CPUs.  So add a new
fw_cfg file that would allow QEMU to tell it.
For compat reasons add file only for machine types that
support more than 255 CPUs.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-24 17:29:15 -02:00
Igor Mammedov
33d7a28829 pc: apic_common: Extend APIC ID property to 32bit
ACPI ID is 32 bit wide on CPUs with x2APIC support.
Extend 'id' property to support it.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-24 17:29:15 -02:00
Radim Krčmář
fb506e701e intel_iommu: reject broken EIM
Cluster x2APIC cannot work without KVM's x2apic API when the maximal
APIC ID is greater than 8 and only KVM's LAPIC can support x2APIC, so we
forbid other APICs and also the old KVM case with less than 9, to
simplify the code.

There is no point in enabling EIM in forbidden APICs, so we keep it
enabled only for the KVM APIC;  unconditionally, because making the
option depend on KVM version would be a maintanance burden.

Old QEMUs would enable eim whenever intremap was on, which would trick
guests into thinking that they can enable cluster x2APIC even if any
interrupt destination would get clamped to 8 bits.
Depending on your configuration, QEMU could notice that the destination
LAPIC is not present and report it with a very non-obvious:

  KVM: injection failed, MSI lost (Operation not permitted)

Or the guest could say something about unexpected interrupts, because
clamping leads to aliasing so interrupts were being delivered to
incorrect VCPUs.

KVM_X2APIC_API is the feature that allows us to enable EIM for KVM.

QEMU 2.7 allowed EIM whenever interrupt remapping was enabled.  In order
to keep backward compatibility, we again allow guests to misbehave in
non-obvious ways, and make it the default for old machine types.

A user can enable the buggy mode it with "x-buggy-eim=on".

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-17 15:44:49 -02:00
Radim Krčmář
e6b6af0560 intel_iommu: add OnOffAuto intr_eim as "eim" property
The default (auto) emulates the current behavior.
A user can now control EIM like
  -device intel-iommu,intremap=on,eim=off

Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-17 15:44:49 -02:00
Radim Krčmář
267ee35715 apic: add send_msi() to APICCommonClass
The MMIO based interface to APIC doesn't work well with MSIs that have
upper address bits set (remapped x2APIC MSIs).  A specialized interface
is a quick and dirty way to avoid the shortcoming.

Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-17 15:44:49 -02:00
Radim Krčmář
2f114315dc apic: add global apic_get_class()
Every configuration has only up to one APIC class and we'll be extending
the class with a function that can be called without an instanced
object, so a direct access to the class is convenient.

This patch will break compilation if some code uses apic_get_class()
with CONFIG_USER_ONLY.

Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-17 15:44:49 -02:00
Peter Maydell
86e121ae75 * Thread Sanitizer fixes (Alex)
* Coverity fixes (David)
 * test-qht fixes (Emilio)
 * QOM interface for info irq/info pic (Hervé)
 * -rtc clock=rt fix (Junlian)
 * mux chardev fixes (Marc-André)
 * nicer report on death by signal (Michal)
 * qemu-tech TLC (Paolo)
 * MSI support for edu device (Peter)
 * qemu-nbd --offset fix (Tomáš)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQExBAABCAAbBQJX98xmFBxwYm9uemluaUByZWRoYXQuY29tAAoJEL/70l94x66D
 IXsH/idLNlBzbrGhcuZOXEAd4fCyCyhXGMuOAGJXLHgv+EfiqrJ9z4HTn44czdh7
 rJuQDYeDrfl36zc0n8weY7JSEsorCq+JBDomFUFodmCrFUIue2jXYOK6pt5LUrQM
 OTyruQMKHD316SnJFOK8Tkxi5DrAHNRs+ynDcm+IoB65KE9YgBcBWuEJ03mF9cHi
 5sb/SBEqfL49gVlnFXBDTRgXXwA5axS7xKd4+7CWtbVFvJxurImjywGqKI5G/dmC
 TJyP+Dty4iNjFP1E0VvfL6ETovncZlfe4Hx1b971pll/ec88jGL0brqQMPjACrWh
 TyLXLN9oTbEKuDxx1Nh23xRFh+c=
 =sgtZ
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

* Thread Sanitizer fixes (Alex)
* Coverity fixes (David)
* test-qht fixes (Emilio)
* QOM interface for info irq/info pic (Hervé)
* -rtc clock=rt fix (Junlian)
* mux chardev fixes (Marc-André)
* nicer report on death by signal (Michal)
* qemu-tech TLC (Paolo)
* MSI support for edu device (Peter)
* qemu-nbd --offset fix (Tomáš)

# gpg: Signature made Fri 07 Oct 2016 17:25:10 BST
# gpg:                using RSA key 0xBFFBD25F78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>"
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream: (39 commits)
  qemu-doc: merge qemu-tech and qemu-doc
  qemu-tech: rewrite some parts
  qemu-tech: reorganize content
  qemu-tech: move TCG test documentation to tests/tcg/README
  qemu-tech: move user mode emulation features from qemu-tech
  qemu-tech: document lazy condition code evaluation in cpu.h
  qemu-tech: move text from qemu-tech to tcg/README
  qemu-doc: drop installation and compilation notes
  qemu-doc: replace introduction with the one from the internals manual
  qemu-tech: drop index
  test-qht: perform lookups under rcu_read_lock
  qht: fix unlock-after-free segfault upon resizing
  qht: simplify qht_reset_size
  qemu-nbd: Shrink image size by specified offset
  qemu_kill_report: Report PID name too
  util: Introduce qemu_get_pid_name
  char: update read handler in all cases
  char: use a fixed idx for child muxed chr
  i8259: give ISA device when registering ISA ioports
  .travis.yml: add gcc sanitizer build
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-10-10 10:39:29 +01:00
Hervé Poussineau
61b97833b3 intc: make HMP 'info irq' and 'info pic' commands use InterruptStatsProvider interface
Signed-off-by: Hervé Poussineau <hpoussin@reactos.org>
Message-Id: <1474921408-24710-6-git-send-email-hpoussin@reactos.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-10-04 10:00:25 +02:00
Evgeny Yakovlev
339892d758 target-i386: Correct family/model/stepping for Opteron_G3
Current CPU definition for AMD Opteron third generation includes
features like SSE4a and LAHF_LM support in emulated CPUID. These
features are present in K8 rev.E or K10 CPUs and later. However,
current G3 family and model describe 2nd generation K8 cores instead.

This is incorrect but was considered harmless until our tests found a
problem with linux kernels >= 3.10 (and maybe earlier) which specifically
check for Opteron K8 model when parsing CPUID leaf 0x80000001:
http://lxr.free-electrons.com/source/arch/x86/kernel/cpu/amd.c?v=3.16#L552
This code will disable LAHF_LM feature in /proc/cpuinfo if model number
is inconsistent.

This change sets Opteron_G3 family/model/stepping to 16/2/3 which is
a proper Opteron 3rd generation 2350 CPU.

Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Richard Henderson <rth@twiddle.net>
CC: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-03 16:06:43 -03:00
Eduardo Habkost
c39c0edf9b target-i386: Automatically set level/xlevel/xlevel2 when needed
Instead of requiring users and management software to be aware of
required CPUID level/xlevel/xlevel2 values for each feature,
automatically increase those values when features need them.

This was already done for CPUID[7].EBX, and is now made generic
for all CPUID feature flags. Unit test included, to make sure we
don't break ABI on older machine-types and don't mess with the
CPUID level values if they are explicitly set by the user.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-09-27 16:17:17 -03:00
David Kiarie
fb9f592623 hw/i386: AMD IOMMU IVRS table
Add IVRS table for AMD IOMMU. Generate IVRS or DMAR
depending on emulated IOMMU.

Signed-off-by: David Kiarie <davidkiarie4@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-09-24 01:02:01 +03:00
Igor Mammedov
152fcbecad target-i386: turn off CPU.l3-cache only for 2.7 and older machine types
commit (14c985cff target-i386: present virtual L3 cache info for vcpus)
misplaced compat property putting it in new 2.8 machine type
which would effectively to disable feature until 2.9 is released.
Intent of commit probably should be to disable feature for 2.7
and older while allowing not yet released 2.8 to have feature
enabled by default.

Cc: qemu-stable@nongnu.org
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
2016-09-23 18:51:40 +03:00
Igor Mammedov
2e0910329b pc: clean up COMPAT macro chaining
Since commit
 bacc344c ("machine: add properties to compat_props incrementaly")
there is no need to chain per machine type compat macro.

Clean up places where it was done anyway so it will be
consistent and won't confuse contributors during addtion
of new machine types.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
2016-09-23 18:51:40 +03:00
Ladi Prosek
d4b84d564e Remove unused function declarations
Unused function declarations were found using a simple gcc plugin and
manually verified by grepping the sources.

Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2016-09-15 15:32:22 +03:00
Longpeng(Mike)
14c985cffa target-i386: present virtual L3 cache info for vcpus
Some software algorithms are based on the hardware's cache info, for example,
for x86 linux kernel, when cpu1 want to wakeup a task on cpu2, cpu1 will trigger
a resched IPI and told cpu2 to do the wakeup if they don't share low level
cache. Oppositely, cpu1 will access cpu2's runqueue directly if they share llc.
The relevant linux-kernel code as bellow:

	static void ttwu_queue(struct task_struct *p, int cpu)
	{
		struct rq *rq = cpu_rq(cpu);
		......
		if (... && !cpus_share_cache(smp_processor_id(), cpu)) {
			......
			ttwu_queue_remote(p, cpu); /* will trigger RES IPI */
			return;
		}
		......
		ttwu_do_activate(rq, p, 0); /* access target's rq directly */
		......
	}

In real hardware, the cpus on the same socket share L3 cache, so one won't
trigger a resched IPIs when wakeup a task on others. But QEMU doesn't present a
virtual L3 cache info for VM, then the linux guest will trigger lots of RES IPIs
under some workloads even if the virtual cpus belongs to the same virtual socket.

For KVM, there will be lots of vmexit due to guest send IPIs.
The workload is a SAP HANA's testsuite, we run it one round(about 40 minuates)
and observe the (Suse11sp3)Guest's amounts of RES IPIs which triggering during
the period:
        No-L3           With-L3(applied this patch)
cpu0:	363890		44582
cpu1:	373405		43109
cpu2:	340783		43797
cpu3:	333854		43409
cpu4:	327170		40038
cpu5:	325491		39922
cpu6:	319129		42391
cpu7:	306480		41035
cpu8:	161139		32188
cpu9:	164649		31024
cpu10:	149823		30398
cpu11:	149823		32455
cpu12:	164830		35143
cpu13:	172269		35805
cpu14:	179979		33898
cpu15:	194505		32754
avg:	268963.6	40129.8

The VM's topology is "1*socket 8*cores 2*threads".
After present virtual L3 cache info for VM, the amounts of RES IPIs in guest
reduce 85%.

For KVM, vcpus send IPIs will cause vmexit which is expensive, so it can cause
severe performance degradation. We had tested the overall system performance if
vcpus actually run on sparate physical socket. With L3 cache, the performance
improves 7.2%~33.1%(avg:15.7%).

Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-09-09 20:58:34 +03:00
Longpeng(Mike)
a4d3c83476 pc: Add 2.8 machine
This will used by the next patch.

Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-09-09 20:58:34 +03:00
Marc-André Lureau
3e6c0c4c2c pc: keep gsi reference
Further cleanup would need to call qemu_free_irq() at the appropriate
time, but for now this silences ASAN about direct leaks.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
2016-09-08 18:05:21 +04:00
Marc-André Lureau
8ea753718b machine: use class base init generated name
machine_class_base_init() member name is allocated by
machine_class_base_init(), but not freed by
machine_class_finalize().  Simply freeing there doesn't work,
because DEFINE_PC_MACHINE() overwrites it with a literal string.

Fix DEFINE_PC_MACHINE() not to overwrite it, and add the missing
free to machine_class_finalize().

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
2016-09-08 18:05:21 +04:00
Marc-André Lureau
d80fe99de4 pc: simplify passing qemu_irq
qemu_irq is already a pointer, no need to have an extra pointer level.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-09-08 18:05:21 +04:00
Peter Xu
20fd4b7b6d x86: ioapic: add support for explicit EOI
Some old Linux kernels (upstream before v4.0), or any released RHEL
kernels has problem in sending APIC EOI when IR is enabled. Meanwhile,
many of them only support explicit EOI for IOAPIC, which is only
introduced in IOAPIC version 0x20. This patch provide a way to boost
QEMU IOAPIC to version 0x20, in order for QEMU to correctly receive EOI
messages.

Without boosting IOAPIC version to 0x20, kernels before commit d32932d
("x86/irq: Convert IOAPIC to use hierarchical irqdomain interfaces")
will have trouble enabling both IR and level-triggered interrupt devices
(like e1000).

To upgrade IOAPIC to version 0x20, we need to specify:

  -global ioapic.version=0x20

To be compatible with old systems, 0x11 will still be the default IOAPIC
version. Here 0x11 and 0x20 are the only versions to be supported.

One thing to mention: this patch only applies to emulated IOAPIC. It
does not affect kernel IOAPIC behavior.

Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1470059959-372-1-git-send-email-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-08-03 18:44:57 +02:00
Igor Mammedov
7298d4fd51 apic: fix broken migration for kvm-apic
commit f6e98444 (apic: Use apic_id as apic's migration instance_id)
breaks migration when in kernel irqchip is used for 2.6 and older
machine types.

It applies compat property only for userspace 'apic' type
instead of applying it to all apic types inherited from
'apic-common' type as it was supposed to do.

Fix it by setting compat property 'legacy-instance-id' for
'apic-common' type which affects inherited types (i.e. not
only 'apic' but also 'kvm-apic' types)

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1469800542-11402-1-git-send-email-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-08-03 18:44:57 +02:00
Peter Maydell
206d0c2436 pc, pci, virtio: new features, cleanups, fixes
- interrupt remapping for intel iommus
 - a bunch of virtio cleanups
 - fixes all over the place
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXkQsqAAoJECgfDbjSjVRpanoIAJ9JVlc1aEjt9sa0cSBcs+NQ
 J7JmgU9FqFsj+4FrNTouO3AxTjHurd1UAULP1WMPD+V3JpbnHct8r6SCBLQ5EBMN
 VOjYo4DwWs1g+DqnQ9WZmbadu06XvYi/yiAKNUzWfZk0MR11D0D/S5hmarNKw0Kq
 tGHeTWjGeY4WqFLV7m+qB4+cqkAByn6um99UtUvgLL05RgIEIP2IEMKYZ+rXvAa9
 iGUvzqlO7mbq/+LbL18kaWywa4TCwbbd2eSGWaqhX4CuB62Rl33mWTXFcfaYhkyp
 Z3FgwaJ09h0lAjSVEbyAuLFMfO/BnMcsoKqwl4xc4vkn/xBCqFtgH9JcEVm3O8U=
 =ge2D
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging

pc, pci, virtio: new features, cleanups, fixes

- interrupt remapping for intel iommus
- a bunch of virtio cleanups
- fixes all over the place

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

# gpg: Signature made Thu 21 Jul 2016 18:49:30 BST
# gpg:                using RSA key 0x281F0DB8D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>"
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
#      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* remotes/mst/tags/for_upstream: (57 commits)
  intel_iommu: avoid unnamed fields
  virtio: Update migration docs
  virtio-gpu: Wrap in vmstate
  virtio-gpu: Use migrate_add_blocker for virgl migration blocking
  virtio-input: Wrap in vmstate
  9pfs: Wrap in vmstate
  virtio-serial: Wrap in vmstate
  virtio-net: Wrap in vmstate
  virtio-balloon: Wrap in vmstate
  virtio-rng: Wrap in vmstate
  virtio-blk: Wrap in vmstate
  virtio-scsi: Wrap in vmstate
  virtio: Migration helper function and macro
  virtio-serial: Remove old migration version support
  virtio-net: Remove old migration version support
  virtio-scsi: Replace HandleOutput typedef
  Revert "mirror: Workaround for unexpected iohandler events during completion"
  virtio-scsi: Call virtio_add_queue_aio
  virtio-blk: Call virtio_add_queue_aio
  virtio: Introduce virtio_add_queue_aio
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-07-21 20:12:37 +01:00
Michael S. Tsirkin
bc38ee10fc intel_iommu: avoid unnamed fields
Also avoid unnamed fields for portability.
Also, rename VTD_IRTE to VTD_IR_TableEntry for coding
style compliance.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-07-21 20:44:20 +03:00
Peter Xu
ede9c94acf intel_iommu: add SID validation for IR
This patch enables SID validation. Invalid interrupts will be dropped.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-07-21 20:44:16 +03:00
Jan Kiszka
28589311b3 intel_iommu: Add support for Extended Interrupt Mode
As neither QEMU nor KVM support more than 255 CPUs so far, this is
simple: we only need to switch the destination ID translation in
vtd_remap_irq_get if EIME is set.

Once CFI support is there, it will have to take EIM into account as
well. So far, nothing to do for this.

This patch allows to use x2APIC in split irqchip mode of KVM.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
[use le32_to_cpu() to retrieve dest_id]
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-07-21 20:43:49 +03:00
Peter Xu
e3d9c92507 ioapic: register IOMMU IEC notifier for ioapic
Let IOAPIC the first consumer of x86 IOMMU IEC invalidation
notifiers. This is only used for split irqchip case, when vIOMMU
receives IR invalidation requests, IOAPIC will be notified to update
kernel irq routes. For simplicity, we just update all IOAPIC routes,
even if the invalidated entries are not IOAPIC ones.

Since now we are creating IOMMUs using "-device" parameter, IOMMU
device will be created after IOAPIC.  We need to do the registration
after machine done by leveraging machine_done notifier.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-07-21 20:43:49 +03:00
Peter Xu
02a2cbc872 x86-iommu: introduce IEC notifiers
This patch introduces x86 IOMMU IEC (Interrupt Entry Cache)
invalidation notifier list. When vIOMMU receives IEC invalidate
request, all the registered units will be notified with specific
invalidation requests.

Intel IOMMU is the first provider that generates such a event.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-07-21 20:43:49 +03:00
Peter Xu
8b5ed7dffa intel_iommu: add support for split irqchip
In split irqchip mode, IOAPIC is working in user space, only update
kernel irq routes when entry changed. When IR is enabled, we directly
update the kernel with translated messages. It works just like a kernel
cache for the remapping entries.

Since KVM irqfd is using kernel gsi routes to deliver interrupts, as
long as we can support split irqchip, we will support irqfd as
well. Also, since kernel gsi routes will cache translated interrupts,
irqfd delivery will not suffer from any performance impact due to IR.

And, since we supported irqfd, vhost devices will be able to work
seamlessly with IR now. Logically this should contain both vhost-net and
vhost-user case.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[move trace-events lines into target-i386/trace-events]
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-07-21 20:43:49 +03:00
Peter Xu
cb135f59b8 q35: ioapic: add support for emulated IOAPIC IR
This patch translates all IOAPIC interrupts into MSI ones. One pseudo
ioapic address space is added to transfer the MSI message. By default,
it will be system memory address space. When IR is enabled, it will be
IOMMU address space.

Currently, only emulated IOAPIC is supported.

Idea suggested by Jan Kiszka and Rita Sinha in the following patch:

https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg01933.html

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-07-21 20:43:49 +03:00
Peter Xu
651e4cefee intel_iommu: Add support for PCI MSI remap
This patch enables interrupt remapping for PCI devices.

To play the trick, one memory region "iommu_ir" is added as child region
of the original iommu memory region, covering range 0xfeeXXXXX (which is
the address range for APIC). All the writes to this range will be taken
as MSI, and translation is carried out only when IR is enabled.

Idea suggested by Paolo Bonzini.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-07-20 19:31:04 +03:00
Peter Xu
1f91acee17 intel_iommu: define several structs for IOMMU IR
Several data structs are defined to better support the rest of the
patches: IRTE to parse remapping table entries, and IOAPIC/MSI related
structure bits to parse interrupt entries to be filled in by guest
kernel.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-07-20 19:30:27 +03:00