Commit Graph

12046 Commits

Author SHA1 Message Date
Ingo Molnar 6c529a266b Merge commit 'v2.6.37-rc7' into perf/core
Merge reason: Pick up the latest -rc.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-22 11:53:23 +01:00
Linus Torvalds 55ec86f848 Merge branches 'x86-fixes-for-linus' and 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86-32: Make sure we can map all of lowmem if we need to
  x86, vt-d: Handle previous faults after enabling fault handling
  x86: Enable the intr-remap fault handling after local APIC setup
  x86, vt-d: Fix the vt-d fault handling irq migration in the x2apic mode
  x86, vt-d: Quirk for masking vtd spec errors to platform error handling logic
  x86, xsave: Use alloc_bootmem_align() instead of alloc_bootmem()
  bootmem: Add alloc_bootmem_align()
  x86, gcc-4.6: Use gcc -m options when building vdso
  x86: HPET: Chose a paranoid safe value for the ETIME check
  x86: io_apic: Avoid unused variable warning when CONFIG_GENERIC_PENDING_IRQ=n

* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf: Fix off by one in perf_swevent_init()
  perf: Fix duplicate events with multiple-pmu vs software events
  ftrace: Have recordmcount honor endianness in fn_ELF_R_INFO
  scripts/tags.sh: Add magic for trace-events
  tracing: Fix panic when lseek() called on "trace" opened for writing
2010-12-19 10:44:54 -08:00
Linus Torvalds 46bdfe6a50 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
  x86: avoid high BIOS area when allocating address space
  x86: avoid E820 regions when allocating address space
  x86: avoid low BIOS area when allocating address space
  resources: add arch hook for preventing allocation in reserved areas
  Revert "resources: support allocating space within a region from the top down"
  Revert "PCI: allocate bus resources from the top down"
  Revert "x86/PCI: allocate space from the end of a region, not the beginning"
  Revert "x86: allocate space within a region top-down"
  Revert "PCI: fix pci_bus_alloc_resource() hang, prefer positive decode"
  PCI: Update MCP55 quirk to not affect non HyperTransport variants
2010-12-18 10:13:24 -08:00
Bjorn Helgaas a2c606d53a x86: avoid high BIOS area when allocating address space
This prevents allocation of the last 2MB before 4GB.

The experiment described here shows Windows 7 ignoring the last 1MB:
https://bugzilla.kernel.org/show_bug.cgi?id=23542#c27

This patch ignores the top 2MB instead of just 1MB because H. Peter Anvin
says "There will be ROM at the top of the 32-bit address space; it's a fact
of the architecture, and on at least older systems it was common to have a
shadow 1 MiB below."

Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-12-17 10:01:30 -08:00
Bjorn Helgaas 4dc2287c18 x86: avoid E820 regions when allocating address space
When we allocate address space, e.g., to assign it to a PCI device, don't
allocate anything mentioned in the BIOS E820 memory map.

On recent machines (2008 and newer), we assign PCI resources from the
windows described by the ACPI PCI host bridge _CRS.  On many Dell
machines, these windows overlap some E820 reserved areas, e.g.,

    BIOS-e820: 00000000bfe4dc00 - 00000000c0000000 (reserved)
    pci_root PNP0A03:00: host bridge window [mem 0xbff00000-0xdfffffff]

If we put devices at 0xbff00000, they don't work, probably because
that's really RAM, not I/O memory.  This patch prevents that by removing
the 0xbfe4dc00-0xbfffffff area from the "available" resource.

I'm not very happy with this solution because Windows solves the problem
differently (it seems to ignore E820 reserved areas and it allocates
top-down instead of bottom-up; details at comment 45 of the bugzilla
below).  That means we're vulnerable to BIOS defects that Windows would not
trip over.  For example, if BIOS described a device in ACPI but didn't
mention it in E820, Windows would work fine but Linux would fail.

Reference: https://bugzilla.kernel.org/show_bug.cgi?id=16228
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-12-17 10:01:24 -08:00
Bjorn Helgaas 30919b0bf3 x86: avoid low BIOS area when allocating address space
This implements arch_remove_reservations() so allocate_resource() can
avoid any arch-specific reserved areas.  This currently just avoids the
BIOS area (the first 1MB), but could be used for E820 reserved areas if
that turns out to be necessary.

We previously avoided this area in pcibios_align_resource().  This patch
moves the test from that PCI-specific path to a generic path, so *all*
resource allocations will avoid this area.

Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-12-17 10:01:17 -08:00
Bjorn Helgaas d14125ecfe Revert "x86/PCI: allocate space from the end of a region, not the beginning"
This reverts commit dc9887dc02.

Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-12-17 10:00:49 -08:00
Bjorn Helgaas 5e52f1c5e8 Revert "x86: allocate space within a region top-down"
This reverts commit 1af3c2e45e.

Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-12-17 10:00:43 -08:00
Linus Torvalds a6ac1f0af4 Merge branch 'kvm-updates/2.6.37' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/2.6.37' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: Fix preemption counter leak in kvm_timer_init()
  KVM: enlarge number of possible CPUID leaves
  KVM: SVM: Do not report xsave in supported cpuid
  KVM: Fix OSXSAVE after migration
2010-12-17 09:32:39 -08:00
H. Peter Anvin 147dd5610c x86-32: Make sure we can map all of lowmem if we need to
A relocatable kernel can be anywhere in lowmem -- and in the case of a
kdump kernel, is likely to be fairly high.  Since the early page
tables map everything from address zero up we need to make sure we
allocate enough brk that we can map all of lowmem if we need to.

Reported-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Tested-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4D0AD3ED.8070607@kernel.org>
2010-12-16 19:11:09 -08:00
Avi Kivity 3e26f23091 KVM: Fix preemption counter leak in kvm_timer_init()
Based on a patch from Thomas Meyer.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-12-16 12:39:31 +02:00
Peter Zijlstra 7639dae0ca perf, x86: Provide a PEBS capable cycle event
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-16 11:36:44 +01:00
Peter Zijlstra 2e80a82a49 perf: Dynamic pmu types
Extend the perf_pmu_register() interface to allow for named and
dynamic pmu types.

Because we need to support the existing static types we cannot use
dynamic types for everything, hence provide a type argument.

If we want to enumerate the PMUs they need a name, provide one.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20101117222056.259707703@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-16 11:36:43 +01:00
Peter Zijlstra 4407204c5c perf, x86: Detect broken BIOSes that corrupt the PMU
Some BIOSes use PMU resources, which can cause various bugs:

 - Non-working or erratic PMU based statistics - the PMU can end up
   counting the wrong thing, resulting in misleading statistics

 - Profiling can stop working or it can profile the wrong thing

 - A non-working or erratic NMI watchdog that cannot be relied on

 - The kernel may disturb whatever thing the BIOS tries to use the
   PMU for - possibly causing hardware malfunction in extreme cases.

 - ... and other forms of potential misbehavior

Various forms of such misbehavior has been observed in practice - there are
BIOSes that just corrupt the PMU state, consequences be damned.

The PMU is a CPU resource that is handled by the kernel and the BIOS
stealing+corrupting it is not acceptable nor robust, so we detect it,
warn about it and further refuse to touch the PMU ourselves.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-16 11:36:42 +01:00
Ingo Molnar 006b20fe4c Merge branch 'perf/urgent' into perf/core
Merge reason: We want to apply a dependent patch.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-16 11:22:27 +01:00
Rusty Russell da32dac101 lguest: populate initial_page_table
Two x86 patches broke lguest:
1) v2.6.35-492-g72d7c3b, which changed x86 to use the memblock allocator.

In lguest, the host places linear page tables at the top of mem, which
used to be enough to get us up to the swapper_pg_dir page tables.  With
the first patch, the direct mapping tables used that memory:

Before: kernel direct mapping tables up to 4000000 @ 7000-1a000
After: kernel direct mapping tables up to 4000000 @ 3fed000-4000000

I initially fixed this by lying about the amount of memory we had, so
the kernel wouldn't blatt the lguest boot pagetables (yuk!), but then...

2) v2.6.36-rc8-54-gb40827f, which made x86 boot use initial_page_table.

This was initialized in a part of head_32.S which isn't executed by
lguest; it is then copied into swapper_pg_dir.  So we have to initialize
it; and anyway we switch to it before we blatt the old tables, so that
fixes the previous damage as well.

For the moment, I cut & pasted the code into lguest's boot code, but
next merge window I will merge them.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: x86@kernel.org
2010-12-16 17:03:15 +10:30
Rusty Russell bb4093deb2 lguest: restore boot speed
lguest is dumb and drops *all* the pagetables for set_pte (which is
only used for kernel mapping manipulation, so it's OK without highmem).

But it's used a lot in boot, too.  As a guest optimization, we
suppressed this flushing until the first page switch.  Now we have
initial_page_table, that happens much earlier, so extend the heuristic
to wait until we switch to something other than the swapper_pg_dir or
initial_page_table.

As measured on my laptop under kvm, this dropped the time-to-mount-root
from 48 seconds to 4.3 seconds.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-12-16 17:03:15 +10:30
Rusty Russell bb6f1d9a99 lguest: fix crash lguest_time_init
fe25c7fc2e "x86: lguest: Convert to new irq chip functions" converted
enable_lguest_irq() to take a struct irq_data *, but didn't fix the one
internal caller.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
To: x86@kernel.org
2010-12-16 17:03:14 +10:30
Randy Dunlap 52f6c5ad43 crypto: ghash-intel - ghash-clmulni-intel_glue needs err.h
Add missing header file:

arch/x86/crypto/ghash-clmulni-intel_glue.c:256: error: implicit declaration of function 'IS_ERR'
arch/x86/crypto/ghash-clmulni-intel_glue.c:257: error: implicit declaration of function 'PTR_ERR'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2010-12-15 19:44:08 +08:00
Kenji Kaneshige 7f7fbf45c6 x86: Enable the intr-remap fault handling after local APIC setup
Interrupt-remapping gets enabled very early in the boot, as it determines the
apic mode that the processor can use. And the current code enables the vt-d
fault handling before the setup_local_APIC(). And hence the APIC LDR registers
and data structure in the memory may not be initialized. So the vt-d fault
handling in logical xapic/x2apic modes were broken.

Fix this by enabling the vt-d fault handling in the end_local_APIC_setup()

A cleaner fix of enabling fault handling while enabling intr-remapping
will be addressed for v2.6.38. [ Enabling intr-remapping determines the
usage of x2apic mode and the apic mode determines the fault-handling
configuration. ]

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
LKML-Reference: <20101201062244.541996375@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: stable@kernel.org [v2.6.32+]
Acked-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-12-13 16:53:32 -08:00
Kenji Kaneshige 086e8ced65 x86, vt-d: Fix the vt-d fault handling irq migration in the x2apic mode
In x2apic mode, we need to set the upper address register of the fault
handling interrupt register of the vt-d hardware. Without this
irq migration of the vt-d fault handling interrupt is broken.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
LKML-Reference: <1291225233.2648.39.camel@sbsiddha-MOBL3>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: stable@kernel.org [v2.6.32+]
Acked-by: Chris Wright <chrisw@sous-sol.org>
Tested-by: Takao Indoh <indou.takao@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-12-13 16:52:52 -08:00
Suresh Siddha 10340ae130 x86, xsave: Use alloc_bootmem_align() instead of alloc_bootmem()
Alignment of alloc_bootmem() depends on the value of
L1_CACHE_SHIFT. What we need here, however, is 64 byte alignment.  Use
alloc_bootmem_align() and explicitly specify the alignment instead.

This fixes a kernel boot crash reported by Jody when the cpu in .config
is set to MPENTIUMII but the kernel is booted on a xsave-capable CPU.

Reported-by: Jody Bruchon <jody@nctritech.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <20101116212442.059967454@sbsiddha-MOBL3.sc.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@kernel.org>
2010-12-13 16:13:11 -08:00
H. Peter Anvin de2a8cf98e x86, gcc-4.6: Use gcc -m options when building vdso
The vdso Makefile passes linker-style -m options not to the linker but
to gcc.  This happens to work with earlier gcc, but fails with gcc
4.6.  Pass gcc-style -m options, instead.

Note: all currently supported versions of gcc supports -m32, so there
is no reason to conditionalize it any more.

Reported-by: H. J. Lu <hjl.tools@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
LKML-Reference: <tip-*@git.kernel.org>
Cc: <stable@kernel.org>
2010-12-13 16:08:37 -08:00
Don Zickus 5f29805a4f x86, watchdog: Compile fix when CONFIG_LOCAL_APIC not enabled
When adjusting the code to handle removing the old nmi watchdog,
I forgot to consider the compile case when the local apic is not
enabled.

This change fixes the following build error:

  arch/x86/kernel/apic/hw_nmi.c:28:6: error: redefinition of ‘touch_nmi_watchdog’

Signed-off-by: Don Zickus <dzickus@redhat.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Rakib Mullick <rakib.mullick@gmail.com>
LKML-Reference: <20101213153719.GD18577@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-13 18:23:23 +01:00
Thomas Gleixner f1c18071ad x86: HPET: Chose a paranoid safe value for the ETIME check
commit 995bd3bb5 (x86: Hpet: Avoid the comparator readback penalty)
chose 8 HPET cycles as a safe value for the ETIME check, as we had the
confirmation that the posted write to the comparator register is
delayed by two HPET clock cycles on Intel chipsets which showed
readback problems.

After that patch hit mainline we got reports from machines with newer
AMD chipsets which seem to have an even longer delay. See
http://thread.gmane.org/gmane.linux.kernel/1054283 and
http://thread.gmane.org/gmane.linux.kernel/1069458 for further
information.

Boris tried to come up with an ACPI based selection of the minimum
HPET cycles, but this failed on a couple of test machines. And of
course we did not get any useful information from the hardware folks.

For now our only option is to chose a paranoid high and safe value for
the minimum HPET cycles used by the ETIME check. Adjust the minimum ns
value for the HPET clockevent accordingly.

Reported-Bistected-and-Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <alpine.LFD.2.00.1012131222420.2653@localhost6.localdomain6>
Cc: Simon Kirby <sim@hostway.ca>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andreas Herrmann <Andreas.Herrmann3@amd.com>
Cc: John Stultz <johnstul@us.ibm.com>
2010-12-13 13:42:44 +01:00
Don Zickus 5dc3055879 x86, NMI: Add back unknown_nmi_panic and nmi_watchdog sysctls
Originally adapted from Huang Ying's patch which moved the
unknown_nmi_panic to the traps.c file.  Because the old nmi
watchdog was deleted before this change happened, the
unknown_nmi_panic sysctl was lost.  This re-adds it.

Also, the nmi_watchdog sysctl was re-implemented and its
documentation updated accordingly.

Patch-inspired-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Cc: fweisbec@gmail.com
LKML-Reference: <1291068437-5331-3-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-10 00:01:06 +01:00
Don Zickus 96a84c20d6 lockup detector: Compile fixes from removing the old x86 nmi watchdog
My patch that removed the old x86 nmi watchdog broke other
arches.  This change reverts a piece of that patch and puts the
change in the correct spot.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: fweisbec@gmail.com
Cc: yinghai@kernel.org
LKML-Reference: <1291068437-5331-2-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-10 00:01:06 +01:00
Thomas Gleixner 4720dd1b38 x86: io_apic: Avoid unused variable warning when CONFIG_GENERIC_PENDING_IRQ=n
arch/x86/kernel/apic/io_apic.c: In function 'ack_apic_level':
arch/x86/kernel/apic/io_apic.c:2433: warning: unused variable 'desc'

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <201010272107.o9RL7rse018212@imap1.linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-12-09 17:43:21 +01:00
Rakib Mullick 2c6cb1053a x86: Address 'unused' warning in hw_nmi.c again
arch/x86/kernel/apic/hw_nmi.c:29: warning: backtrace_mask defined but not used

commit 0e2af2a9(x86, hw_nmi: Move backtrace_mask declaration under
ARCH_HAS_NMI_WATCHDOG) addressed this warning, but it was reintroduced
by commit 5f2b0ba4(x86, nmi_watchdog: Remove the old nmi_watchdog).

Move backtrace_mask into the #ifdef arch_trigger_all_cpu_backtrace
section again.

Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <AANLkTi=rcc38QzoKa6LFy4m++-p_9=Zt4_kDQE=GeKxf@mail.gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-12-09 16:06:52 +01:00
Peter Zijlstra c079c791c5 perf, amd: Remove the nb lock
Since all the hotplug stuff is serialized by the hotplug mutex,
do away with the amd_nb_lock.

Cc: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-08 20:16:30 +01:00
Andre Przywara 73c1160ce3 KVM: enlarge number of possible CPUID leaves
Currently the number of CPUID leaves KVM handles is limited to 40.
My desktop machine (AthlonII) already has 35 and future CPUs will
expand this well beyond the limit. Extend the limit to 80 to make
room for future processors.

KVM-Stable-Tag.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-12-08 17:28:38 +02:00
Joerg Roedel 24d1b15f72 KVM: SVM: Do not report xsave in supported cpuid
To support xsave properly for the guest the SVM module need
software support for it. As long as this is not present do
not report the xsave as supported feature in cpuid.
As a side-effect this patch moves the bit() helper function
into the x86.h file so that it can be used in svm.c too.

KVM-Stable-Tag.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-12-08 17:28:37 +02:00
Sheng Yang 3ea3aa8cf6 KVM: Fix OSXSAVE after migration
CPUID's OSXSAVE is a mirror of CR4.OSXSAVE bit. We need to update the CPUID
after migration.

KVM-Stable-Tag.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-12-08 17:28:31 +02:00
Linus Torvalds 6313e3c217 Merge branches 'x86-fixes-for-linus', 'perf-fixes-for-linus' and 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86/pvclock: Zero last_value on resume

* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf record: Fix eternal wait for stillborn child
  perf header: Don't assume there's no attr info if no sample ids is provided
  perf symbols: Figure out start address of kernel map from kallsyms
  perf symbols: Fix kallsyms kernel/module map splitting

* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  nohz: Fix printk_needs_cpu() return value on offline cpus
  printk: Fix wake_up_klogd() vs cpu hotplug
2010-12-08 06:40:59 -08:00
Ingo Molnar 10a18d7dc0 Merge commit 'v2.6.37-rc5' into perf/core
Merge reason: Pick up the latest -rc.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-07 07:49:51 +01:00
Masami Hiramatsu f984ba4eb5 kprobes: Use text_poke_smp_batch for unoptimizing
Use text_poke_smp_batch() on unoptimization path for reducing
the number of stop_machine() issues. If the number of
unoptimizing probes is more than MAX_OPTIMIZE_PROBES(=256),
kprobes unoptimizes first MAX_OPTIMIZE_PROBES probes and kicks
optimizer for remaining probes.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: 2nddept-manager@sdl.hitachi.co.jp
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <20101203095434.2961.22657.stgit@ltc236.sdl.hitachi.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-06 17:59:32 +01:00
Masami Hiramatsu cd7ebe2298 kprobes: Use text_poke_smp_batch for optimizing
Use text_poke_smp_batch() in optimization path for reducing
the number of stop_machine() issues. If the number of optimizing
probes is more than MAX_OPTIMIZE_PROBES(=256), kprobes optimizes
first MAX_OPTIMIZE_PROBES probes and kicks optimizer for
remaining probes.

Changes in v5:
- Use kick_kprobe_optimizer() instead of directly calling
  schedule_delayed_work().
- Rescheduling optimizer outside of kprobe mutex lock.

Changes in v2:
- Allocate code buffer and parameters in arch_init_kprobes()
  instead of using static arraies.
- Merge previous max optimization limit patch into this patch.
  So, this patch introduces upper limit of optimization at
  once.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: 2nddept-manager@sdl.hitachi.co.jp
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <20101203095428.2961.8994.stgit@ltc236.sdl.hitachi.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-06 17:59:31 +01:00
Masami Hiramatsu 7deb18dcf0 x86: Introduce text_poke_smp_batch() for batch-code modifying
Introduce text_poke_smp_batch(). This function modifies several
text areas with one stop_machine() on SMP. Because calling
stop_machine() is heavy task, it is better to aggregate
text_poke requests.

( Note: I've talked with Rusty about this interface, and
  he would not like to expand stop_machine() interface, since
  it is not for generic use. )

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Jan Beulich <jbeulich@novell.com>
Cc: 2nddept-manager@sdl.hitachi.co.jp
LKML-Reference: <20101203095422.2961.51217.stgit@ltc236.sdl.hitachi.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-06 17:59:31 +01:00
Masami Hiramatsu 6274de4984 kprobes: Support delayed unoptimizing
Unoptimization occurs when a probe is unregistered or disabled,
and is heavy because it recovers instructions by using
stop_machine(). This patch delays unoptimization operations and
unoptimize several probes at once by using
text_poke_smp_batch(). This can avoid unexpected system slowdown
coming from stop_machine().

Changes in v5:
- Split this patch into several cleanup patches and this patch.
- Fix some text_mutex lock miss.
- Use bool instead of int for behavior flags.
- Add additional comment for (un)optimizing path.

Changes in v2:
- Use dynamic allocated buffers and params.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: 2nddept-manager@sdl.hitachi.co.jp
LKML-Reference: <20101203095409.2961.82733.stgit@ltc236.sdl.hitachi.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-06 17:59:30 +01:00
Linus Torvalds 11e8896474 Merge branch '2.6.37-rc4-pvhvm-fixes' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm
* '2.6.37-rc4-pvhvm-fixes' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm:
  xen: unplug the emulated devices at resume time
  xen: fix save/restore for PV on HVM guests with pirq remapping
  xen: resume the pv console for hvm guests too
  xen: fix MSI setup and teardown for PV on HVM guests
  xen: use PHYSDEVOP_get_free_pirq to implement find_unbound_pirq
2010-12-03 11:30:57 -08:00
Linus Torvalds 8338fded13 Merge branches 'upstream/core' and 'upstream/bugfix' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen
* 'upstream/core' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen:
  xen: allocate irq descs on any NUMA node
  xen: prevent crashes with non-HIGHMEM 32-bit kernels with largeish memory
  xen: use default_idle
  xen: clean up "extra" memory handling some more

* 'upstream/bugfix' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen:
  xen: x86/32: perform initial startup on initial_page_table
  xen: don't bother to stop other cpus on shutdown/reboot
2010-12-03 10:08:52 -08:00
Jeremy Fitzhardinge 64141da587 vmalloc: eagerly clear ptes on vunmap
On stock 2.6.37-rc4, running:

  # mount lilith:/export /mnt/lilith
  # find  /mnt/lilith/ -type f -print0 | xargs -0 file

crashes the machine fairly quickly under Xen.  Often it results in oops
messages, but the couple of times I tried just now, it just hung quietly
and made Xen print some rude messages:

    (XEN) mm.c:2389:d80 Bad type (saw 7400000000000001 != exp
    3000000000000000) for mfn 1d7058 (pfn 18fa7)
    (XEN) mm.c:964:d80 Attempt to create linear p.t. with write perms
    (XEN) mm.c:2389:d80 Bad type (saw 7400000000000010 != exp
    1000000000000000) for mfn 1d2e04 (pfn 1d1fb)
    (XEN) mm.c:2965:d80 Error while pinning mfn 1d2e04

Which means the domain tried to map a pagetable page RW, which would
allow it to map arbitrary memory, so Xen stopped it.  This is because
vm_unmap_ram() left some pages mapped in the vmalloc area after NFS had
finished with them, and those pages got recycled as pagetable pages
while still having these RW aliases.

Removing those mappings immediately removes the Xen-visible aliases, and
so it has no problem with those pages being reused as pagetable pages.
Deferring the TLB flush doesn't upset Xen because it can flush the TLB
itself as needed to maintain its invariants.

When unmapping a region in the vmalloc space, clear the ptes
immediately.  There's no point in deferring this because there's no
amortization benefit.

The TLBs are left dirty, and they are flushed lazily to amortize the
cost of the IPIs.

This specific motivation for this patch is an oops-causing regression
since 2.6.36 when using NFS under Xen, triggered by the NFS client's use
of vm_map_ram() introduced in 56e4ebf877 ("NFS: readdir with vmapped
pages") .  XFS also uses vm_map_ram() and could cause similar problems.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Bryan Schumaker <bjschuma@netapp.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Alex Elder <aelder@sgi.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-12-02 14:51:15 -08:00
Stefano Stabellini 512b109ec9 xen: unplug the emulated devices at resume time
Early after being resumed we need to unplug again the emulated devices.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2010-12-02 14:40:53 +00:00
Stefano Stabellini af42b8d12f xen: fix MSI setup and teardown for PV on HVM guests
When remapping MSIs into pirqs for PV on HVM guests, qemu is responsible
for doing the actual mapping and unmapping.
We only give qemu the desired pirq number when we ask to do the mapping
the first time, after that we should be reading back the pirq number
from qemu every time we want to re-enable the MSI.

This fixes a bug in xen_hvm_setup_msi_irqs that manifests itself when
trying to enable the same MSI for the second time: the old MSI to pirq
mapping is still valid at this point but xen_hvm_setup_msi_irqs would
try to assign a new pirq anyway.
A simple way to reproduce this bug is to assign an MSI capable network
card to a PV on HVM guest, if the user brings down the corresponding
ethernet interface and up again, Linux would fail to enable MSIs on the
device.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2010-12-02 14:34:25 +00:00
Ian Campbell 805e3f4950 xen: x86/32: perform initial startup on initial_page_table
Only make swapper_pg_dir readonly and pinned when generic x86 architecture code
(which also starts on initial_page_table) switches to it.  This helps ensure
that the generic setup paths work on Xen unmodified. In particular
clone_pgd_range writes directly to the destination pgd entries and is used to
initialise swapper_pg_dir so we need to ensure that it remains writeable until
the last possible moment during bring up.

This is complicated slightly by the need to avoid sharing kernel PMD entries
when running under Xen, therefore the Xen implementation must make a copy of
the kernel PMD (which is otherwise referred to by both intial_page_table and
swapper_pg_dir) before switching to swapper_pg_dir.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-11-29 17:07:53 -08:00
Jeremy Fitzhardinge 31e323cca9 xen: don't bother to stop other cpus on shutdown/reboot
Xen will shoot all the VCPUs when we do a shutdown hypercall, so there's
no need to do it manually.

In any case it will fail because all the IPI irqs have been pulled
down by this point, so the cross-CPU calls will simply hang forever.

Until change 76fac077db the function calls
were not synchronously waited for, so this wasn't apparent.  However after
that change the calls became synchronous leading to a hang on shutdown
on multi-VCPU guests.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
Cc: Alok Kataria <akataria@vmware.com>
2010-11-29 14:16:53 -08:00
Linus Torvalds a9e40a2493 Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf: Fix the software context switch counter
  perf, x86: Fixup Kconfig deps
  x86, perf, nmi: Disable perf if counters are not accessible
  perf: Fix inherit vs. context rotation bug
2010-11-28 12:25:02 -08:00
Jeremy Fitzhardinge e7a3481c02 x86/pvclock: Zero last_value on resume
If the guest domain has been suspend/resumed or migrated, then the
system clock backing the pvclock clocksource may revert to a smaller
value (ie, can be non-monotonic across the migration/save-restore).

Make sure we zero last_value in that case so that the domain
continues to see clock updates.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-28 09:33:20 +01:00
Linus Torvalds fbe6c4047f Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  dmar, x86: Use function stubs when CONFIG_INTR_REMAP is disabled
  x86-64: Fix and clean up AMD Fam10 MMCONF enabling
  x86: UV: Address interrupt/IO port operation conflict
  x86: Use online node real index in calulate_tbl_offset()
  x86, asm: Fix binutils 2.15 build failure
2010-11-27 07:28:47 +09:00
Linus Torvalds d2f30c73ab Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf symbols: Remove incorrect open-coded container_of()
  perf record: Handle restrictive permissions in /proc/{kallsyms,modules}
  x86/kprobes: Prevent kprobes to probe on save_args()
  irq_work: Drop cmpxchg() result
  perf: Fix owner-list vs exit
  x86, hw_nmi: Move backtrace_mask declaration under ARCH_HAS_NMI_WATCHDOG
  tracing: Fix recursive user stack trace
  perf,hw_breakpoint: Initialize hardware api earlier
  x86: Ignore trap bits on single step exceptions
  tracing: Force arch_local_irq_* notrace for paravirt
  tracing: Fix module use of trace_bprintk()
2010-11-27 07:28:17 +09:00