Commit Graph

57462 Commits

Author SHA1 Message Date
Fenghua Yu 2f19e06ac3 x86, mem: memset_64.S: Optimize memset by enhanced REP MOVSB/STOSB
Support memset() with enhanced rep stosb. On processors supporting enhanced
REP MOVSB/STOSB, the alternative memset_c_e function using enhanced rep stosb
overrides the fast string alternative memset_c and the original function.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1305671358-14478-10-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-17 15:40:31 -07:00
Fenghua Yu 057e05c1d6 x86, mem: memmove_64.S: Optimize memmove by enhanced REP MOVSB/STOSB
Support memmove() by enhanced rep movsb. On processors supporting enhanced
REP MOVSB/STOSB, the alternative memmove() function using enhanced rep movsb
overrides the original function.

The patch doesn't change the backward memmove case to use enhanced rep
movsb.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1305671358-14478-9-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-17 15:40:30 -07:00
Fenghua Yu 101068c1f4 x86, mem: memcpy_64.S: Optimize memcpy by enhanced REP MOVSB/STOSB
Support memcpy() with enhanced rep movsb. On processors supporting enhanced
rep movsb, the alternative memcpy() function using enhanced rep movsb overrides the original function and the fast string
function.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1305671358-14478-8-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-17 15:40:29 -07:00
Fenghua Yu 4307bec934 x86, mem: copy_user_64.S: Support copy_to/from_user by enhanced REP MOVSB/STOSB
Support copy_to_user/copy_from_user() by enhanced REP MOVSB/STOSB.
On processors supporting enhanced REP MOVSB/STOSB, the alternative
copy_user_enhanced_fast_string function using enhanced rep movsb overrides the
original function and the fast string function.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1305671358-14478-7-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-17 15:40:28 -07:00
Fenghua Yu e365c9df2f x86, mem: clear_page_64.S: Support clear_page() with enhanced REP MOVSB/STOSB
Intel processors are adding enhancements to REP MOVSB/STOSB and the use of
REP MOVSB/STOSB for optimal memcpy/memset or similar functions is recommended.
Enhancement availability is indicated by CPUID.7.0.EBX[9] (Enhanced REP MOVSB/
STOSB).

Support clear_page() with rep stosb for processor supporting enhanced REP MOVSB
/STOSB. On processors supporting enhanced REP MOVSB/STOSB, the alternative
clear_page_c_e function using enhanced REP STOSB overrides the original function
and the fast string function.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1305671358-14478-6-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-17 15:40:27 -07:00
Fenghua Yu 9072d11da1 x86, alternative: Add altinstruction_entry macro
Add altinstruction_entry macro to generate .altinstructions section
entries from assembly code.  This should be less failure-prone than
open-coding.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1305671358-14478-5-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-17 15:40:25 -07:00
Fenghua Yu 5097313363 x86, alternative, doc: Add comment for applying alternatives order
Some string operation functions may be patched twice, e.g. on enhanced REP MOVSB
/STOSB processors, memcpy is patched first by fast string alternative function,
then it is patched by enhanced REP MOVSB/STOSB alternative function.

Add comment for applying alternatives order to warn people who may change the
applying alternatives order for any reason.

[ Documentation-only patch ]

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1305671358-14478-4-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-17 15:40:25 -07:00
Fenghua Yu 161ec53c70 x86, mem, intel: Initialize Enhanced REP MOVSB/STOSB
If kernel intends to use enhanced REP MOVSB/STOSB, it must ensure
IA32_MISC_ENABLE.Fast_String_Enable (bit 0) is set and CPUID.(EAX=07H, ECX=0H):
EBX[bit 9] also reports 1.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1305671358-14478-3-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-17 15:40:23 -07:00
Fenghua Yu 724a92ee45 x86, cpufeature: Add CPU feature bit for enhanced REP MOVSB/STOSB
Intel processors are adding enhancements to REP MOVSB/STOSB and the use of
REP MOVSB/STOSB for optimal memcpy/memset or similar functions is recommended.
Enhancement availability is indicated by CPUID.7.0.EBX[9] (Enhanced REP MOVSB/
STOSB).

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1305671358-14478-2-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-17 14:56:36 -07:00
Rafael J. Wysocki c650da23d5 PM: Remove CONFIG_PM_VERBOSE
Now that we have CONFIG_DYNAMIC_DEBUG there is no need for yet
another flag causing dev_dbg() and pr_debug() statements in the
core PM code to produce output.  Moreover, CONFIG_PM_VERBOSE
causes so much output to be generated that it's not really useful
and almost no one sets it.

References: https://bugzilla.kernel.org/show_bug.cgi?id=23182
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-05-17 23:25:10 +02:00
Rafael J. Wysocki 290c748725 Merge branch 'power-domains' into for-linus
* power-domains:
  PM: Fix build issue in clock_ops.c for CONFIG_PM_RUNTIME unset
  PM: Revert "driver core: platform_bus: allow runtime override of dev_pm_ops"
  OMAP1 / PM: Use generic clock manipulation routines for runtime PM
  PM / Runtime: Generic clock manipulation rountines for runtime PM (v6)
  PM / Runtime: Add subsystem data field to struct dev_pm_info
  OMAP2+ / PM: move runtime PM implementation to use device power domains
  PM / Platform: Use generic runtime PM callbacks directly
  shmobile: Use power domains for platform runtime PM
  PM: Export platform bus type's default PM callbacks
  PM: Make power domain callbacks take precedence over subsystem ones
2011-05-17 23:23:46 +02:00
Rafael J. Wysocki 2d2a9163bd Merge branch 'syscore' into for-linus
* syscore:
  PM: Remove sysdev suspend, resume and shutdown operations
  PM / PowerPC: Use struct syscore_ops instead of sysdevs for PM
  PM / UNICORE32: Use struct syscore_ops instead of sysdevs for PM
  PM / AVR32: Use struct syscore_ops instead of sysdevs for PM
  PM / Blackfin: Use struct syscore_ops instead of sysdevs for PM
  ARM / Samsung: Use struct syscore_ops for "core" power management
  ARM / PXA: Use struct syscore_ops for "core" power management
  ARM / SA1100: Use struct syscore_ops for "core" power management
  ARM / Integrator: Use struct syscore_ops for core PM
  ARM / OMAP: Use struct syscore_ops for "core" power management
  ARM: Use struct syscore_ops instead of sysdevs for PM in common code
2011-05-17 23:23:40 +02:00
Amerigo Wang c3b0795c98 PM / ACPI: Remove acpi_sleep=s4_nonvs
acpi_sleep=s4_nonvs is superseded by acpi_sleep=nonvs, so remove it.

Signed-off-by: WANG Cong <amwang@redhat.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Len Brown <lenb@kernel.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-05-17 23:19:18 +02:00
Fenghua Yu 2494b030ba x86, cpufeature: Fix cpuid leaf 7 feature detection
CPUID leaf 7, subleaf 0 returns the maximum subleaf in EAX, not the
number of subleaves.  Since so far only subleaf 0 is defined (and only
the EBX bitfield) we do not need to qualify the test.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1305660806-17519-1-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@kernel.org> 2.6.36..39
2011-05-17 13:36:29 -07:00
Borislav Petkov 14fb57dccb x86, AMD: Fix ARAT feature setting again
Trying to enable the local APIC timer on early K8 revisions
uncovers a number of other issues with it, in conjunction with
the C1E enter path on AMD. Fixing those causes much more churn
and troubles than the benefit of using that timer brings so
don't enable it on K8 at all, falling back to the original
functionality the kernel had wrt to that.

Reported-and-bisected-by: Nick Bowler <nbowler@elliptictech.com>
Cc: Boris Ostrovsky <Boris.Ostrovsky@amd.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Hans Rosenfeld <hans.rosenfeld@amd.com>
Cc: Nick Bowler <nbowler@elliptictech.com>
Cc: Joerg-Volker-Peetz <jvpeetz@web.de>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Link: http://lkml.kernel.org/r/1305636919-31165-3-git-send-email-bp@amd64.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-17 15:28:34 +02:00
Borislav Petkov 328935e634 Revert "x86, AMD: Fix APIC timer erratum 400 affecting K8 Rev.A-E processors"
This reverts commit e20a2d205c, as it crashes
certain boxes with specific AMD CPU models.

Moving the lower endpoint of the Erratum 400 check to accomodate
earlier K8 revisions (A-E) opens a can of worms which is simply
not worth to fix properly by tweaking the errata checking
framework:

* missing IntPenging MSR on revisions < CG cause #GP:

http://marc.info/?l=linux-kernel&m=130541471818831

* makes earlier revisions use the LAPIC timer instead of the C1E
idle routine which switches to HPET, thus not waking up in
deeper C-states:

http://lkml.org/lkml/2011/4/24/20

Therefore, leave the original boundary starting with K8-revF.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-17 15:28:33 +02:00
Ingo Molnar 86b9523ab1 Merge branch 'gart/rename' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into core/iommu 2011-05-17 14:39:00 +02:00
David Rientjes dc382fd5bc x86, mm: Allow ZONE_DMA to be configurable
ZONE_DMA is unnecessary for a large number of machines that do not
require less than 32-bit DMA addressing, e.g. ISA legacy DMA or PCI
cards with a restricted DMA address mask.

This patch allows users to disable ZONE_DMA for x86 if they know they
will not be using such devices with their kernel.

This prevents the VM from unnecessarily reserving a ratio of memory
(defaulting to 1/256th of system capacity) with lowmem_reserve_ratio
for such allocations when it will never be used.

Signed-off-by: David Rientjes <rientjes@google.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.00.1105161353560.4353@chino.kir.corp.google.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-16 14:03:28 -07:00
Ondrej Zary 865be7a810 x86, cpu: Fix detection of Celeron Covington stepping A1 and B0
Steppings A1 and B0 of Celeron Covington are currently misdetected as
Pentium II (Dixon). Fix it by removing the stepping check.

[ hpa: this fixes this specific bug... the CPUID documentation
  specifies that the L2 cache size can disambiguate additional CPUs;
  this patch does not fix that. ]

Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Link: http://lkml.kernel.org/r/201105162138.15416.linux@rainbow-software.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-16 13:24:21 -07:00
Martin Schwidefsky f296388682 ftrace/s390: mcount offset calculation
Do the mcount offset adjustment in the recordmcount.pl/recordmcount.[ch]
at compile time and not in ftrace_call_adjust at run time.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-05-16 15:05:06 -04:00
Martin Schwidefsky 521ccb5c4a ftrace/x86: mcount offset calculation
Do the mcount offset adjustment in the recordmcount.pl/recordmcount.[ch]
at compile time and not in ftrace_call_adjust at run time.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-05-16 14:55:57 -04:00
Steven Rostedt 2895cd2ab8 ftrace/x86: Do not trace .discard.text section
The section called .discard.text has tracing attached to it and is
currently ignored by ftrace. But it does include a call to the mcount
stub. Adding a notrace to the code keeps gcc from adding the useless
mcount caller to it.

Link: http://lkml.kernel.org/r/20110421023739.243651696@goodmis.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-05-16 14:47:13 -04:00
Frank Arnold 42be450565 x86, AMD, cacheinfo: Fix L3 cache index disable checks
We provide two slots to disable cache indices, and have a check to
prevent both slots to be used for the same index.

If the user disables the same index on different subcaches, both slots
will hold the same index, e.g.

  $ echo 2047 > /sys/devices/system/cpu/cpu0/cache/index3/cache_disable_0
  $ cat /sys/devices/system/cpu/cpu0/cache/index3/cache_disable_0
  2047
  $ echo 1050623 > /sys/devices/system/cpu/cpu0/cache/index3/cache_disable_1
  $ cat /sys/devices/system/cpu/cpu0/cache/index3/cache_disable_1
  2047

due to the fact that the check was looking only at index bits [11:0]
and was ignoring writes to bits outside that range. The more correct
fix is to simply check whether the index is within the bounds of
[0..l3->indices].

While at it, cleanup comments and drop now-unused local macros.

Signed-off-by: Frank Arnold <frank.arnold@amd.com>
Link: http://lkml.kernel.org/r/1305553188-21061-3-git-send-email-bp@amd64.org
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-16 11:24:27 -07:00
Borislav Petkov 50e7534427 x86, AMD, cacheinfo: Fix fallout caused by max3 conversion
732eacc054 converted code around the
kernel using nested max() macros to use the new max3 macro but forgot to
remove the old line in intel_cacheinfo.c. Fix it.

Cc: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: Frank Arnold <farnold@amd64.org>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Link: http://lkml.kernel.org/r/1305553188-21061-2-git-send-email-bp@amd64.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-16 11:24:23 -07:00
Rafael J. Wysocki 600b776eb3 OMAP1 / PM: Use generic clock manipulation routines for runtime PM
Convert OMAP1 to using the new generic clock manipulation routines
and a device power domain for runtime PM instead of overriding the
platform bus type's runtime PM callbacks.  This allows us to simplify
OMAP1-specific code and to share some code with other platforms
(shmobile in particular).

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Kevin Hilman <khilman@ti.com>
2011-05-16 20:15:36 +02:00
Konrad Rzeszutek Wilk 7c1bfd685b xen/pci: Fix compiler error when CONFIG_XEN_PRIVILEGED_GUEST is not set.
If we have CONFIG_XEN and the other parameters to build an
Linux kernel that is non-privileged, the xen_[find|register|unregister]_
device_domain_owner functions should not be compiled. They should
use the nops defined in arch/x86/include/asm/xen/pci.h instead.

This fixes:

arch/x86/pci/xen.c:496: error: redefinition of ‘xen_find_device_domain_owner’
arch/x86/include/asm/xen/pci.h:25: note: previous definition of ‘xen_find_device_domain_owner’ was here
arch/x86/pci/xen.c:510: error: redefinition of ‘xen_register_device_domain_owner’
arch/x86/include/asm/xen/pci.h:29: note: previous definition of ‘xen_register_device_domain_owner’ was here
arch/x86/pci/xen.c:532: error: redefinition of ‘xen_unregister_device_domain_owner’
arch/x86/include/asm/xen/pci.h:34: note: previous definition of ‘xen_unregister_device_domain_owner’ was here

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
2011-05-16 13:47:30 -04:00
Linus Torvalds df8d06ade6 Merge branch 'omap-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6
* 'omap-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6:
  OMAP3: set the core dpll clk rate in its set_rate function
  omap: iommu: Return IRQ_HANDLED in fault handler when no fault occured
2011-05-16 08:55:49 -07:00
Youquan Song e503f9e4b0 x86, apic: Fix spurious error interrupts triggering on all non-boot APs
This patch fixes a bug reported by a customer, who found
that many unreasonable error interrupts reported on all
non-boot CPUs (APs) during the system boot stage.

According to Chapter 10 of Intel Software Developer Manual
Volume 3A, Local APIC may signal an illegal vector error when
an LVT entry is set as an illegal vector value (0~15) under
FIXED delivery mode (bits 8-11 is 0), regardless of whether
the mask bit is set or an interrupt actually happen. These
errors are seen as error interrupts.

The initial value of thermal LVT entries on all APs always reads
0x10000 because APs are woken up by BSP issuing INIT-SIPI-SIPI
sequence to them and LVT registers are reset to 0s except for
the mask bits which are set to 1s when APs receive INIT IPI.

When the BIOS takes over the thermal throttling interrupt,
the LVT thermal deliver mode should be SMI and it is required
from the kernel to keep AP's LVT thermal monitoring register
programmed as such as well.

This issue happens when BIOS does not take over thermal throttling
interrupt, AP's LVT thermal monitor register will be restored to
0x10000 which means vector 0 and fixed deliver mode, so all APs will
signal illegal vector error interrupts.

This patch check if interrupt delivery mode is not fixed mode before
restoring AP's LVT thermal monitor register.

Signed-off-by: Youquan Song <youquan.song@intel.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Yong Wang <yong.y.wang@intel.com>
Cc: hpa@linux.intel.com
Cc: joe@perches.com
Cc: jbaron@redhat.com
Cc: trenn@suse.de
Cc: kent.liu@intel.com
Cc: chaohong.guo@intel.com
Cc: <stable@kernel.org> # As far back as possible
Link: http://lkml.kernel.org/r/1303402963-17738-1-git-send-email-youquan.song@intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-16 13:48:25 +02:00
Thomas Gleixner a18f22a968 Merge branch 'consolidate-clksrc-i8253' of master.kernel.org:~rmk/linux-2.6-arm into timers/clocksource
Conflicts:
	arch/ia64/kernel/cyclone.c
	arch/mips/kernel/i8253.c
	arch/x86/kernel/i8253.c

Reason: Resolve conflicts so further cleanups do not conflict further

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-05-14 12:06:36 +02:00
Russell King 798778b865 clocksource: convert mips to generic i8253 clocksource
Convert MIPS i8253 clocksource code to use generic i8253 clocksource.

Acked-by: John Stultz <john.stultz@linaro.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-05-14 10:29:48 +01:00
Russell King 82491451dd clocksource: convert x86 to generic i8253 clocksource
Convert x86 i8253 clocksource code to use generic i8253 clocksource.

Acked-by: John Stultz <john.stultz@linaro.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-05-14 10:29:48 +01:00
Russell King 8c414ff3f4 clocksource: convert footbridge to generic i8253 clocksource
Convert the footbridge isa-timer code to use generic i8253 clocksource.

Acked-by: John Stultz <john.stultz@linaro.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-05-14 10:29:48 +01:00
Michael Cree 90b57f3516 alpha: Wire up syscalls new to 2.6.39
Wire up the syscalls:
   name_to_handle_at
   open_by_handle_at
   clock_adjtime
   syncfs
and adjust some whitespace in the neighbourhood to align commments.

Signed-off-by: Michael Cree <mcree@orcon.net.nz>
Signed-off-by: Matt Turner <mattst88@gmail.com>
2011-05-13 19:16:11 -04:00
John Stultz f550806a7f alpha: convert to clocksource_register_hz
Converts alpha to use clocksource_register_hz.

Signed-off-by: John Stultz <johnstul@us.ibm.com>
CC: Richard Henderson <rth@twiddle.net>
CC: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
CC: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Matt Turner <mattst88@gmail.com>
2011-05-13 19:16:10 -04:00
Greg Kroah-Hartman 82a3242e11 sysfs: remove "last sysfs file:" line from the oops messages
On some arches (x86, sh, arm, unicore, powerpc) the oops message would
print out the last sysfs file accessed.

This was very useful in finding a number of sysfs and driver core bugs
in the 2.5 and early 2.6 development days, but it has been a number of
years since this file has actually helped in debugging anything that
couldn't also be trivially determined from the stack traceback.

So it's time to delete the line.  This is good as we need all the space
we can get for oops messages at times on consoles.

Acked-by: Phil Carmody <ext-phil.2.carmody@nokia.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-13 16:05:51 -07:00
Julia Lawall d9a5ac9ef3 x86, mce, AMD: Fix leaving freed data in a list
b may be added to a list, but is not removed before being freed
in the case of an error.  This is done in the corresponding
deallocation function, so the code here has been changed to
follow that.

The sematic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression E,E1,E2;
identifier l;
@@

*list_add(&E->l,E1);
... when != E1
    when != list_del(&E->l)
    when != list_del_init(&E->l)
    when != E = E2
*kfree(E);// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/1305294731-12127-1-git-send-email-julia@diku.dk
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-13 17:11:02 +02:00
Avinash H.M 5fd2a84ab3 OMAP3: set the core dpll clk rate in its set_rate function
The debug l3_ick/rate is not displaying the actual rate of the clock in
hardware. This is because, the core dpll set_rate function doesn't update the
clk.rate. After fixing, the l3_ick/rate is displaying proper values.

Signed-off-by: Shweta Gulati <shweta.gulati@ti.com>
Signed-off-by: Avinash.H.M <avinashhm@ti.com>
Cc: Rajendra Nayak <rnayak@ti.com>
Cc: Paul Wamsley <paul@pwsan.com>
Acked-by: Paul Walmsley <paul@pwsan.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
2011-05-13 07:08:18 -07:00
Cliff Wickman 77ed23f8d9 x86: Fix UV BAU for non-consecutive nasids
This is a fix for the SGI Altix-UV Broadcast Assist Unit code,
which is used for TLB flushing.

Certain hardware configurations (that customers are ordering)
cause nasids (numa address space id's) to be non-consecutive.
Specifically, once you have more than 4 blades in a IRU
(Individual Rack Unit - or 1/2 rack) but less than the maximum
of 16, the nasid numbering becomes non-consecutive.  This
currently results in a 'catastrophic error' (CATERR) detected by
the firmware during OS boot.  The BAU is generating an 'INTD'
request that is targeting a non-existent nasid value. Such
configurations may also occur when a blade is configured off
because of hardware errors. (There is one UV hub per blade.)

This patch is required to support such configurations.

The problem with the tlb_uv.c code is that is using the
consecutive hub numbers as indices to the BAU distribution bit
map. These are simply the ordinal position of the hub or blade
within its partition.  It should be using physical node numbers
(pnodes), which correspond to the physical nasid values. Use of
the hub number only works as long as the nasids in the partition
are consecutive and increase with a stride of 1.

This patch changes the index to be the pnode number, thus
allowing nasids to be non-consecutive.
It also provides a table in local memory for each cpu to
translate target cpu number to target pnode and nasid.
And it improves naming to properly reflect 'node' and 'uvhub'
versus 'nasid'.

Signed-off-by: Cliff Wickman <cpw@sgi.com>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/E1QJmxX-0002Mz-Fk@eag09.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-12 23:45:42 +02:00
Daniel Kiper ae15a3b4d1 arch/x86/xen/setup: Cleanup code/data sections definitions
Cleanup code/data sections definitions
accordingly to include/linux/init.h.

Signed-off-by: Daniel Kiper <dkiper@net-space.pl>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12 17:19:34 -04:00
Daniel Kiper ad3062a0f4 arch/x86/xen/enlighten: Cleanup code/data sections definitions
Cleanup code/data sections definitions
accordingly to include/linux/init.h.

Signed-off-by: Daniel Kiper <dkiper@net-space.pl>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12 17:19:34 -04:00
Daniel Kiper 251511a18d arch/x86/xen/irq: Cleanup code/data sections definitions
Cleanup code/data sections definitions
accordingly to include/linux/init.h.

Signed-off-by: Daniel Kiper <dkiper@net-space.pl>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12 17:19:33 -04:00
Linus Torvalds 0c5e1577f1 Merge branch 'stable/bug-fixes-for-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/bug-fixes-for-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  x86/mm: Fix section mismatch derived from native_pagetable_reserve()
  x86,xen: introduce x86_init.mapping.pagetable_reserve
  Revert "xen/mmu: Add workaround "x86-64, mm: Put early page table high""
2011-05-12 12:21:51 -07:00
Konrad Rzeszutek Wilk 8c5950881c xen/p2m: Create entries in the P2M_MFN trees's to track 1-1 mappings
.. when applicable. We need to track in the p2m_mfn and
p2m_mfn_p the MFNs and pointers, respectivly, for the P2M entries
that are allocated for the identity mappings. Without this,
a PV domain with an E820 that triggers the 1-1 mapping to kick in,
won't be able to be restored as the P2M won't have the identity
mappings.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12 14:38:53 -04:00
Daniel Kiper 0f16d0dfcd xen/setup: Fix for incorrect xen_extra_mem_start initialization under 32-bit
git commit 24bdb0b62c (xen: do not create
the extra e820 region at an addr lower than 4G) does not take into
account that ifdef CONFIG_X86_32 instead of e820_end_of_low_ram_pfn()
find_low_pfn_range() is called (both calls are from arch/x86/kernel/setup.c).
find_low_pfn_range() behaves correctly and does not require change in
xen_extra_mem_start initialization. Additionally, if xen_extra_mem_start
is initialized in the same way as ifdef CONFIG_X86_64 then memory hotplug
support for Xen balloon driver (under development) is broken.

Signed-off-by: Daniel Kiper <dkiper@net-space.pl>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12 14:37:06 -04:00
Konrad Rzeszutek Wilk 15bfc09451 xen/setup: Ignore E820_UNUSABLE when setting 1-1 mappings.
When we parse the raw E820, the Xen hypervisor can set "E820_RAM"
to "E820_UNUSABLE" if the mem=X argument is used. As such we
should _not_ consider the E820_UNUSABLE as an 1-1 identity
mapping, but instead use the same case as for E820_RAM.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12 14:32:13 -04:00
Tian, Kevin 7899891c7d xen mmu: fix a race window causing leave_mm BUG()
There's a race window in xen_drop_mm_ref, where remote cpu may exit
dirty bitmap between the check on this cpu and the point where remote
cpu handles drop request. So in drop_other_mm_ref we need check
whether TLB state is still lazy before calling into leave_mm. This
bug is rarely observed in earlier kernel, but exaggerated by the
commit 831d52bc15
("x86, mm: avoid possible bogus tlb entries by clearing prev mm_cpumask after switching mm")
which clears bitmap after changing the TLB state. the call trace is as below:

---------------------------------
kernel BUG at arch/x86/mm/tlb.c:61!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/devices/system/xen_memory/xen_memory0/info/current_kb
CPU 1
Modules linked in: 8021q garp xen_netback xen_blkback blktap blkback_pagemap nbd bridge stp llc autofs4 ipmi_devintf ipmi_si ipmi_msghandler lockd sunrpc bonding ipv6 xenfs dm_multipath video output sbs sbshc parport_pc lp parport ses enclosure snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device serio_raw bnx2 snd_pcm_oss snd_mixer_oss snd_pcm snd_timer iTCO_wdt snd soundcore snd_page_alloc i2c_i801 iTCO_vendor_support i2c_core pcs pkr pata_acpi ata_generic ata_piix shpchp mptsas mptscsih mptbase [last unloaded: freq_table]
Pid: 25581, comm: khelper Not tainted 2.6.32.36fixxen #1 Tecal RH2285
RIP: e030:[<ffffffff8103a3cb>]  [<ffffffff8103a3cb>] leave_mm+0x15/0x46
RSP: e02b:ffff88002805be48  EFLAGS: 00010046
RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffff88015f8e2da0
RDX: ffff88002805be78 RSI: 0000000000000000 RDI: 0000000000000001
RBP: ffff88002805be48 R08: ffff88009d662000 R09: dead000000200200
R10: dead000000100100 R11: ffffffff814472b2 R12: ffff88009bfc1880
R13: ffff880028063020 R14: 00000000000004f6 R15: 0000000000000000
FS:  00007f62362d66e0(0000) GS:ffff880028058000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000003aabc11909 CR3: 000000009b8ca000 CR4: 0000000000002660
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 00000000000000 00
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process khelper (pid: 25581, threadinfo ffff88007691e000, task ffff88009b92db40)
Stack:
 ffff88002805be68 ffffffff8100e4ae 0000000000000001 ffff88009d733b88
<0> ffff88002805be98 ffffffff81087224 ffff88002805be78 ffff88002805be78
<0> ffff88015f808360 00000000000004f6 ffff88002805bea8 ffffffff81010108
Call Trace:
 <IRQ>
 [<ffffffff8100e4ae>] drop_other_mm_ref+0x2a/0x53
 [<ffffffff81087224>] generic_smp_call_function_single_interrupt+0xd8/0xfc
 [<ffffffff81010108>] xen_call_function_single_interrupt+0x13/0x28
 [<ffffffff810a936a>] handle_IRQ_event+0x66/0x120
 [<ffffffff810aac5b>] handle_percpu_irq+0x41/0x6e
 [<ffffffff8128c1c0>] __xen_evtchn_do_upcall+0x1ab/0x27d
 [<ffffffff8128dd11>] xen_evtchn_do_upcall+0x33/0x46
 [<ffffffff81013efe>] xen_do_hyper visor_callback+0x1e/0x30
 <EOI>
 [<ffffffff814472b2>] ? _spin_unlock_irqrestore+0x15/0x17
 [<ffffffff8100f8cf>] ? xen_restore_fl_direct_end+0x0/0x1
 [<ffffffff81113f71>] ? flush_old_exec+0x3ac/0x500
 [<ffffffff81150dc5>] ? load_elf_binary+0x0/0x17ef
 [<ffffffff81150dc5>] ? load_elf_binary+0x0/0x17ef
 [<ffffffff8115115d>] ? load_elf_binary+0x398/0x17ef
 [<ffffffff81042fcf>] ? need_resched+0x23/0x2d
 [<ffffffff811f4648>] ? process_measurement+0xc0/0xd7
 [<ffffffff81150dc5>] ? load_elf_binary+0x0/0x17ef
 [<ffffffff81113094>] ? search_binary_handler+0xc8/0x255
 [<ffffffff81114362>] ? do_execve+0x1c3/0x29e
 [<ffffffff8101155d>] ? sys_execve+0x43/0x5d
 [<ffffffff8106fc45>] ? __call_usermodehelper+0x0/0x6f
 [<ffffffff81013e28>] ? kernel_execve+0x68/0xd0
 [<ffffffff 8106fc45>] ? __call_usermodehelper+0x0/0x6f
 [<ffffffff8100f8cf>] ? xen_restore_fl_direct_end+0x0/0x1
 [<ffffffff8106fb64>] ? ____call_usermodehelper+0x113/0x11e
 [<ffffffff81013daa>] ? child_rip+0xa/0x20
 [<ffffffff8106fc45>] ? __call_usermodehelper+0x0/0x6f
 [<ffffffff81012f91>] ? int_ret_from_sys_call+0x7/0x1b
 [<ffffffff8101371d>] ? retint_restore_args+0x5/0x6
 [<ffffffff81013da0>] ? child_rip+0x0/0x20
Code: 41 5e 41 5f c9 c3 55 48 89 e5 0f 1f 44 00 00 e8 17 ff ff ff c9 c3 55 48 89 e5 0f 1f 44 00 00 65 8b 04 25 c8 55 01 00 ff c8 75 04 <0f> 0b eb fe 65 48 8b 34 25 c0 55 01 00 48 81 c6 b8 02 00 00 e8
RIP  [<ffffffff8103a3cb>] leave_mm+0x15/0x46
 RSP <ffff88002805be48>
---[ end trace ce9cee6832a9c503 ]---

Tested-by: Maoxiaoyun<tinnycloud@hotmail.com>
Signed-off-by: Kevin Tian <kevin.tian@intel.com>
[v1: Fleshed out the git description a bit]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12 14:27:43 -04:00
Sedat Dilek 53f8023feb x86/mm: Fix section mismatch derived from native_pagetable_reserve()
With CONFIG_DEBUG_SECTION_MISMATCH=y I see these warnings in next-20110415:

  LD      vmlinux.o
  MODPOST vmlinux.o
WARNING: vmlinux.o(.text+0x1ba48): Section mismatch in reference from the function native_pagetable_reserve() to the function .init.text:memblock_x86_reserve_range()
The function native_pagetable_reserve() references
the function __init memblock_x86_reserve_range().
This is often because native_pagetable_reserve lacks a __init
annotation or the annotation of memblock_x86_reserve_range is wrong.

This patch fixes the issue.
Thanks to pipacs from PaX project for help on IRC.

Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12 13:05:05 -04:00
Stefano Stabellini 279b706bf8 x86,xen: introduce x86_init.mapping.pagetable_reserve
Introduce a new x86_init hook called pagetable_reserve that at the end
of init_memory_mapping is used to reserve a range of memory addresses for
the kernel pagetable pages we used and free the other ones.

On native it just calls memblock_x86_reserve_range while on xen it also
takes care of setting the spare memory previously allocated
for kernel pagetable pages from RO to RW, so that it can be used for
other purposes.

A detailed explanation of the reason why this hook is needed follows.

As a consequence of the commit:

commit 4b239f458c
Author: Yinghai Lu <yinghai@kernel.org>
Date:   Fri Dec 17 16:58:28 2010 -0800

    x86-64, mm: Put early page table high

at some point init_memory_mapping is going to reach the pagetable pages
area and map those pages too (mapping them as normal memory that falls
in the range of addresses passed to init_memory_mapping as argument).
Some of those pages are already pagetable pages (they are in the range
pgt_buf_start-pgt_buf_end) therefore they are going to be mapped RO and
everything is fine.
Some of these pages are not pagetable pages yet (they fall in the range
pgt_buf_end-pgt_buf_top; for example the page at pgt_buf_end) so they
are going to be mapped RW.  When these pages become pagetable pages and
are hooked into the pagetable, xen will find that the guest has already
a RW mapping of them somewhere and fail the operation.
The reason Xen requires pagetables to be RO is that the hypervisor needs
to verify that the pagetables are valid before using them. The validation
operations are called "pinning" (more details in arch/x86/xen/mmu.c).

In order to fix the issue we mark all the pages in the entire range
pgt_buf_start-pgt_buf_top as RO, however when the pagetable allocation
is completed only the range pgt_buf_start-pgt_buf_end is reserved by
init_memory_mapping. Hence the kernel is going to crash as soon as one
of the pages in the range pgt_buf_end-pgt_buf_top is reused (b/c those
ranges are RO).

For this reason we need a hook to reserve the kernel pagetable pages we
used and free the other ones so that they can be reused for other
purposes.
On native it just means calling memblock_x86_reserve_range, on Xen it
also means marking RW the pagetable pages that we allocated before but
that haven't been used before.

Another way to fix this is without using the hook is by adding a 'if
(xen_pv_domain)' in the 'init_memory_mapping' code and calling the Xen
counterpart, but that is just nasty.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12 13:05:04 -04:00
Konrad Rzeszutek Wilk 92bdaef7b2 Revert "xen/mmu: Add workaround "x86-64, mm: Put early page table high""
This reverts commit a38647837a.

It does not work with certain AMD machines.

last_pfn = 0x100000 max_arch_pfn = 0x400000000
initial memory mapped : 0 - 02c3a000
Base memory trampoline at [ffff88000009b000] 9b000 size 20480
init_memory_mapping: 0000000000000000-0000000100000000
 0000000000 - 0100000000 page 4k
kernel direct mapping tables up to 100000000 @ ff7fb000-100000000
init_memory_mapping: 0000000100000000-00000001e0800000
 0100000000 - 01e0800000 page 4k
kernel direct mapping tables up to 1e0800000 @ 1df0f3000-1e0000000
xen: setting RW the range fffdc000 - 100000000
RAMDISK: 0203b000 - 02c3a000
No NUMA configuration found
Faking a node at 0000000000000000-00000001e0800000
NUMA: Using 63 for the hash shift.
Initmem setup node 0 0000000000000000-00000001e0800000
  NODE_DATA [00000001dfffb000 - 00000001dfffffff]
BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<ffffffff81cf6a75>] setup_node_bootmem+0x18a/0x1ea
PGD 0
Oops: 0003 [#1] SMP
last sysfs file:
CPU 0
Modules linked in:

Pid: 0, comm: swapper Not tainted 2.6.39-0-virtual #6~smb1
RIP: e030:[<ffffffff81cf6a75>]  [<ffffffff81cf6a75>] setup_node_bootmem+0x18a/0x1ea
RSP: e02b:ffffffff81c01e38  EFLAGS: 00010046
RAX: 0000000000000000 RBX: 00000001e0800000 RCX: 0000000000001040
RDX: 0000000000004100 RSI: 0000000000000000 RDI: ffff8801dfffb000
RBP: ffffffff81c01e58 R08: 0000000000000020 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000bfe400
FS:  0000000000000000(0000) GS:ffffffff81cca000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000001c03000 CR4: 0000000000000660
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffffffff81c00000, task ffffffff81c0b020)
Stack:
 0000000000000040 0000000000000001 0000000000000000 ffffffffffffffff
 ffffffff81c01e88 ffffffff81cf6c25 0000000000000000 0000000000000000
 ffffffff81cf687f 0000000000000000 ffffffff81c01ea8 ffffffff81cf6e45
Call Trace:
 [<ffffffff81cf6c25>] numa_register_memblks.constprop.3+0x150/0x181
 [<ffffffff81cf687f>] ? numa_add_memblk+0x7c/0x7c
 [<ffffffff81cf6e45>] numa_init.part.2+0x1c/0x7c
 [<ffffffff81cf687f>] ? numa_add_memblk+0x7c/0x7c
 [<ffffffff81cf6f67>] numa_init+0x6c/0x70
 [<ffffffff81cf7057>] initmem_init+0x39/0x3b
 [<ffffffff81ce5865>] setup_arch+0x64e/0x769
 [<ffffffff815e43c1>] ? printk+0x51/0x53
 [<ffffffff81cdf92b>] start_kernel+0xd4/0x3f3
 [<ffffffff81cdf388>] x86_64_start_reservations+0x132/0x136
 [<ffffffff81ce2ed4>] xen_start_kernel+0x588/0x58f
Code: 41 00 00 48 8b 3c c5 a0 24 cc 81 31 c0 40 f6 c7 01 74 05 aa 66 ba ff 40 40 f6 c7 02 74 05 66 ab 83 ea 02 89 d1 c1 e9 02 f6 c2 02 <f3> ab 74 02 66 ab 80 e2 01 74 01 aa 49 63 c4 48 c1 eb 0c 44 89
RIP  [<ffffffff81cf6a75>] setup_node_bootmem+0x18a/0x1ea
 RSP <ffffffff81c01e38>
CR2: 0000000000000000
---[ end trace a7919e7f17c0a725 ]---
Kernel panic - not syncing: Attempted to kill the idle task!
Pid: 0, comm: swapper Tainted: G      D     2.6.39-0-virtual #6~smb1

Reported-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12 13:04:29 -04:00
Linus Torvalds 8043f4eb85 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
  sparc32: Fixed unaligned memory copying in function __csum_partial_copy_sparc_generic
  sparc32: fix sparcstation 5 boot
  sparc32: fix section mismatch warnings in apc, pmc and time_32
2011-05-12 07:53:34 -07:00
Linus Torvalds 75c0b3b466 Merge branch 'fixes' of master.kernel.org:/home/rmk/linux-2.6-arm
* 'fixes' of master.kernel.org:/home/rmk/linux-2.6-arm:
  ARM: 6870/1: The mandatory barrier rmb() must be a dsb() in for device accesses
  ARM: 6892/1: handle ptrace requests to change PC during interrupted system calls
  ARM: 6890/1: memmap: only free allocated memmap entries when using SPARSEMEM
  ARM: zImage: the page table memory must be considered before relocation
  ARM: zImage: make sure not to relocate on top of the relocation code
  ARM: zImage: Fix bad SP address after relocating kernel
  ARM: zImage: make sure the stack is 64-bit aligned
  ARM: RiscPC: acornfb: fix section mismatches
  ARM: RiscPC: etherh: fix section mismatches
2011-05-12 07:53:06 -07:00
Richard Weinberger 449a66fd1f x86: Remove warning and warning_symbol from struct stacktrace_ops
Both warning and warning_symbol are nowhere used.
Let's get rid of them.

Signed-off-by: Richard Weinberger <richard@nod.at>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Soeren Sandmann Pedersen <ssp@redhat.com>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: x86 <x86@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Link: http://lkml.kernel.org/r/1305205872-10321-2-git-send-email-richard@nod.at
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2011-05-12 15:31:28 +02:00
Catalin Marinas a904f5f9eb ARM: 6870/1: The mandatory barrier rmb() must be a dsb() in for device accesses
Since mandatory barriers may be used (explicitly or implicitly via readl
etc.) to ensure the ordering between Device and Normal memory accesses,
a DMB is not enough. This patch converts it to a DSB.

Cc: Colin Cross <ccross@android.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-05-12 10:52:00 +01:00
Arnd Bergmann 2af68df02f ARM: 6892/1: handle ptrace requests to change PC during interrupted system calls
GDB's interrupt.exp test cases currenly fail on ARM.  The problem is how do_signal
handled restarting interrupted system calls:

The entry.S assembler code determines that we come from a system call; and that
information is passed as "syscall" parameter to do_signal.  That routine then
calls get_signal_to_deliver [*] and if a signal is to be delivered, calls into
handle_signal.  If a system call is to be restarted either after the signal
handler returns, or if no handler is to be called in the first place, the PC
is updated after the get_signal_to_deliver call, either in handle_signal (if
we have a handler) or at the end of do_signal (otherwise).

Now the problem is that during [*], the call to get_signal_to_deliver, a ptrace
intercept may happen.  During this intercept, the debugger may change registers,
including the PC.  This is done by GDB if it wants to execute an "inferior call",
i.e. the execution of some code in the debugged program triggered by GDB.

To this purpose, GDB will save all registers, allocate a stack frame, set up
PC and arguments as appropriate for the call, and point the link register to
a dummy breakpoint instruction.  Once the process is restarted, it will execute
the call and then trap back to the debugger, at which point GDB will restore
all registers and continue original execution.

This generally works fine.  However, now consider what happens when GDB attempts
to do exactly that while the process was interrupted during execution of a to-be-
restarted system call:  do_signal is called with the syscall flag set; it calls
get_signal_to_deliver, at which point the debugger takes over and changes the PC
to point to a completely different place.  Now get_signal_to_deliver returns
without a signal to deliver; but now do_signal decides it should be restarting
a system call, and decrements the PC by 2 or 4 -- so it now points to 2 or 4
bytes before the function GDB wants to call -- which leads to a subsequent crash.

To fix this problem, two things need to be supported:
- do_signal must be able to recognize that get_signal_to_deliver changed the PC
  to a different location, and skip the restart-syscall sequence
- once the debugger has restored all registers at the end of the inferior call
  sequence, do_signal must recognize that *now* it needs to restart the pending
  system call, even though it was now entered from a breakpoint instead of an
  actual svc instruction

This set of issues is solved on other platforms, usually by one of two
mechanisms:

- The status information "do_signal is handling a system call that may need
  restarting" is itself carried in some register that can be accessed via
  ptrace.  This is e.g. on Intel the "orig_eax" register; on Sparc the kernel
  defines a magic extra bit in the flags register for this purpose.
  This allows GDB to manage that state: reset it when doing an inferior call,
  and restore it after the call is finished.

- On s390, do_signal transparently handles this problem without requiring
  GDB interaction, by performing system call restarting in the following
  way: first, adjust the PC as necessary for restarting the call.  Then,
  call get_signal_to_deliver; and finally just continue execution at the
  PC.  This way, if GDB does not change the PC, everything is as before.
  If GDB *does* change the PC, execution will simply continue there --
  and once GDB restores the PC it saved at that point, it will automatically
  point to the *restarted* system call.  (There is the minor twist how to
  handle system calls that do *not* need restarting -- do_signal will undo
  the PC change in this case, after get_signal_to_deliver has returned, and
  only if ptrace did not change the PC during that call.)

Because there does not appear to be any obvious register to carry the
syscall-restart information on ARM, we'd either have to introduce a new
artificial ptrace register just for that purpose, or else handle the issue
transparently like on s390.  The patch below implements the second option;
using this patch makes the interrupt.exp test cases pass on ARM, with no
regression in the GDB test suite otherwise.

Cc: patches@linaro.org
Signed-off-by: Ulrich Weigand <ulrich.weigand@linaro.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-05-12 10:52:00 +01:00
Will Deacon 9af386c8dc ARM: 6890/1: memmap: only free allocated memmap entries when using SPARSEMEM
The SPARSEMEM code allocates memmap entries only for sections which are
present (i.e. those which contain some valid memory). The membank checks
in free_unused_memmap do not take this into account and can incorrectly
attempt to free memory which is not allocated, resulting in a BUG() in
the bootmem code.

However, if memory is configured as follows:

    |<----section---->|<----hole---->|<----section---->|
    +--------+--------+--------------+--------+--------+
    | bank 0 | unused |              | bank 1 | unused |
    +--------+--------+--------------+--------+--------+

where a bank only occupies part of a section, the memmap allocated for
the remainder of the section *can* be freed.

This patch modifies the checks in free_unused_memmap so that only valid
memmap entries are considered for removal.

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-05-12 10:52:00 +01:00
Ingo Molnar 9cb5baba5e Merge commit 'v2.6.39-rc7' into sched/core 2011-05-12 09:36:18 +02:00
Tkhai Kirill b1054282d7 sparc32: Fixed unaligned memory copying in function __csum_partial_copy_sparc_generic
When we are in the label cc_dword_align, registers %o0 and %o1 have the same last 2 bits,
but it's not guaranteed one of them is zero. So we can get unaligned memory access
in label ccte. Example of parameters which lead to this:
%o0=0x7ff183e9, %o1=0x8e709e7d, %g1=3

With the parameters I had a memory corruption, when the additional 5 bytes were rewritten.
This patch corrects the error.

One comment to the patch. We don't care about the third bit in %o1, because cc_end_cruft
stores word or less.

Signed-off-by: Tkhai Kirill <tkhai@yandex.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-11 21:35:04 -07:00
Linus Torvalds 409ab140e2 Merge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6:
  [S390] fix alloc_pgste check in init_new_context
  [S390] oprofile: fix min/max interval query checks
  [S390] replace diag10() with diag10_range() function
  [S390] disassembler: handle b280/spp instruction
  [S390] kernel: Initialize register 14 when starting new CPU
  [S390] dasd: prevent IO error during reserve/release loop
  [S390] sclp/memory hotplug: fix initial usecount of increments
2011-05-11 18:59:45 -07:00
Rafael J. Wysocki 2e711c04db PM: Remove sysdev suspend, resume and shutdown operations
Since suspend, resume and shutdown operations in struct sysdev_class
and struct sysdev_driver are not used any more, remove them.  Also
drop sysdev_suspend(), sysdev_resume() and sysdev_shutdown() used
for executing those operations and modify all of their users
accordingly.  This reduces kernel code size quite a bit and reduces
its complexity.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-11 21:37:15 +02:00
Rafael J. Wysocki f5a592f7d7 PM / PowerPC: Use struct syscore_ops instead of sysdevs for PM
Make some PowerPC architecture's code use struct syscore_ops
objects for power management instead of sysdev classes and sysdevs.

This simplifies the code and reduces the kernel's memory footprint.
It also is necessary for removing sysdevs from the kernel entirely in
the future.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-11 21:37:15 +02:00
Rafael J. Wysocki f98bf4aa39 PM / UNICORE32: Use struct syscore_ops instead of sysdevs for PM
Make some UNICORE32 architecture's code use struct syscore_ops
objects for power management instead of sysdev classes and sysdevs.

This simplifies the code and reduces the kernel's memory footprint.
It also is necessary for removing sysdevs from the kernel entirely in
the future.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: Guan Xuetao <gxt@mprc.pku.edu.cn>
2011-05-11 21:37:15 +02:00
Rafael J. Wysocki f25f4f522a PM / AVR32: Use struct syscore_ops instead of sysdevs for PM
Convert some AVR32 architecture's code to using struct syscore_ops
objects for power management instead of sysdev classes and sysdevs.

This simplifies the code and reduces the kernel's memory footprint.
It also is necessary for removing sysdevs from the kernel entirely in
the future.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
2011-05-11 21:37:14 +02:00
Laurent Pinchart c56b2ddd5f omap: iommu: Return IRQ_HANDLED in fault handler when no fault occured
Commit d594f1f31a (omap: IOMMU: add
support to callback during fault handling) broke interrupt line sharing
between the OMAP3 ISP and its IOMMU. Because of this, every interrupt
generated by the OMAP3 ISP is handled by the IOMMU driver instead of
being passed to the OMAP3 ISP driver.

Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Acked-by: Hiroshi DOYU <Hiroshi.DOYU@nokia.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
2011-05-11 10:47:50 -07:00
Jiri Olsa 9bbeacf52f kprobes, x86: Disable irqs during optimized callback
Disable irqs during optimized callback, so we dont miss any in-irq kprobes.

The following commands:

 # cd /debug/tracing/
 # echo "p mutex_unlock" >> kprobe_events
 # echo "p _raw_spin_lock" >> kprobe_events
 # echo "p smp_apic_timer_interrupt" >> ./kprobe_events
 # echo 1 > events/enable

Cause the optimized kprobes to be missed. None is missed
with the fix applied.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Link: http://lkml.kernel.org/r/20110511110613.GB2390@jolsa.brq.redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-11 13:21:23 +02:00
Ingo Molnar 2c14ddc135 Merge branch 'iommu/2.6.40' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into core/iommu 2011-05-10 22:15:57 +02:00
Manuel Lauss 780914c3cf MIPS: Alchemy: fix xxs1500 build error
This fixes:
alchemy/xxs1500/init.c: In function 'prom_init':
alchemy/xxs1500/init.c:57:17: error: ignoring return value of 'kstrtoul', declared with attribute warn_unused_result

Signed-off-by: Manuel Lauss <manuel.lauss@googlemail.com>
Cc: Linux-MIPS <linux-mips@linux-mips.org>
Patchwork: https://patchwork.linux-mips.org/patch/2340/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:26 +01:00
David Daney 310f130339 MIPS: Invalidate old TLB mappings when updating huge page PTEs.
Without this, stale Icache or TLB entries may be used.

Signed-off-by: David Daney <ddaney@caviumnetworks.com>
To: linux-mips@linux-mips.org
https://patchwork.linux-mips.org/patch/2318/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:26 +01:00
Wu Zhangjin f850548ef8 MIPS: Hibernation: Fixes for PAGE_SIZE >= 64kb
PAGE_SIZE >= 64kb (1 << 16) is too big to be the immediate of the
addiu/daddiu instruction, so, use addu/daddu instruction instead.

The following compiling error is fixed:

AS      arch/mips/power/hibernate.o
arch/mips/power/hibernate.S: Assembler messages:
arch/mips/power/hibernate.S:38: Error: expression out of range
make[2]: *** [arch/mips/power/hibernate.o] Error 1
make[1]: *** [arch/mips/power] Error 2

Reported-by: Roman Mamedov <rm@romanrm.ru>
Signed-off-by: Wu Zhangjin <wuzhangjin@gmail.com>
To: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/2313/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:26 +01:00
Lars-Peter Clausen 1e2bbde4af MIPS: JZ4740: Set one-shot feature flag for the clockevent
The code for supporting one-shot mode for the clockevent is already there,
only the feature flag was not set.  Setting the one-shot flag allows the
kernel to run in tickless mode.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/2261/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:26 +01:00
Ralf Baechle aa7ce1c303 MIPS: JZ4740: Export symbols to the watchdog driver module
MODPOST 356 modules
ERROR: "jz4740_timer_disable_watchdog" [drivers/watchdog/jz4740_wdt.ko] undefine
d!
ERROR: "jz4740_timer_enable_watchdog" [drivers/watchdog/jz4740_wdt.ko] undefined
!
make[1]: *** [__modpost] Error 1

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:26 +01:00
Ralf Baechle f1b6a5054c MIPS: JZ4740: Fix GCC 4.6.0 build error.
CC      arch/mips/jz4740/dma.o
arch/mips/jz4740/dma.c: In function 'jz4740_dma_chan_irq':
arch/mips/jz4740/dma.c:245:11: error: variable 'status' set but not used [-Werro
r=unused-but-set-variable]

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:26 +01:00
Ralf Baechle b20bff02b2 MIPS: Audit: Fix success success argument pass to audit_syscall_exit
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:25 +01:00
Ralf Baechle 893d20fbae MIPS: Fix calc_vmlinuz_load_addr build warnings.
HOSTCC  arch/mips/boot/compressed/calc_vmlinuz_load_addr
arch/mips/boot/compressed/calc_vmlinuz_load_addr.c: In function 'main':
arch/mips/boot/compressed/calc_vmlinuz_load_addr.c:35:2: warning: format '%llx' expects type 'long long unsigned int *', but argument 3 has type 'uint64_t *'
arch/mips/boot/compressed/calc_vmlinuz_load_addr.c:54:2: warning: format '%llx' expects type 'long long unsigned int', but argument 2 has type 'uint64_t'

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:25 +01:00
Ralf Baechle 403fbdff96 MIPS: Alchemy: Fix GCC 4.6.0 build error.
CC      arch/mips/alchemy/devboards/db1x00/board_setup.o
arch/mips/alchemy/devboards/db1x00/board_setup.c: In function 'board_setup':
arch/mips/alchemy/devboards/db1x00/board_setup.c:130:6: error: variable 'pin_func' set but not used [-Werror=unused-but-set-variable]

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:25 +01:00
Ralf Baechle 8bdd51429d MIPS: Document former use of timerfd(2) syscall number.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:25 +01:00
Ralf Baechle e12f47ef16 MIPS: IP27: Fix GCC 4.6.0 build error.
CC      arch/mips/sgi-ip27/ip27-hubio.o
arch/mips/sgi-ip27/ip27-hubio.c: In function 'hub_pio_map':
arch/mips/sgi-ip27/ip27-hubio.c:32:20: error: variable 'junk' set but not used [-Werror=unused-but-set-variable]
cc1: all warnings being treated as errors

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:25 +01:00
Ralf Baechle a6ab5ca394 MIPS: IP27: Fix GCC 4.6.0 build error.
CC      arch/mips/sgi-ip27/ip27-hubio.o
arch/mips/sgi-ip27/ip27-hubio.c: In function 'hub_pio_map':
arch/mips/sgi-ip27/ip27-hubio.c:32:20: error: variable 'junk' set but not used [-Werror=unused-but-set-variable]
cc1: all warnings being treated as errors

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:25 +01:00
Jonas Gorski 7da34c1dac MIPS: bcm63xx: Fix header_crc comment in bcm963xx_tag.h
The CRC32 actually includes the tag_version.

Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/2275/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:24 +01:00
David Daney 23a271ecdf MIPS: Octeon: Guard the Kconfig body with CPU_CAVIUM_OCTEON
Instead of making each Octeon specific option depend on
CPU_CAVIUM_OCTEON, gate the body of the entire file with
CPU_CAVIUM_OCTEON.  With this change, CAVIUM_OCTEON_SPECIFIC_OPTIONS
becomes useless, so get rid of it as well.

Signed-off-by: David Daney <ddaney@caviumnetworks.com>
To: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/2091/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:24 +01:00
David Daney e3fb3f27a7 MIPS: Octeon: Cleanup Kconfig IRQ_CPU* symbols.
Octeon doesn't use IRQ_CPU, so don't select it.

IRQ_CPU_OCTEON is a completely unused symbol, remove it completely.

Signed-off-by: David Daney <ddaney@caviumnetworks.com>
To: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/2086/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:24 +01:00
Catalin Marinas f8bec75acd MIPS: Rename .data..mostly and properly handle it in linker script
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:24 +01:00
Ralf Baechle 866d7f5622 MIPS: MSP: Fix build error
Reported and original patch by Yoichi Yuasa <yuasa@linux-mips.org>.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:24 +01:00
Yoichi Yuasa 088a42acc4 MIPS: MSP71xx: Fix typo in msp_per_irq_controller
CC      arch/mips/pmc-sierra/msp71xx/msp_irq_per.o
arch/mips/pmc-sierra/msp71xx/msp_irq_per.c:101:2: error: expected identifier before '.' token
make[2]: *** [arch/mips/pmc-sierra/msp71xx/msp_irq_per.o] Error 1

Signed-off-by: Yoichi Yuasa <yuasa@linux-mips.org>
Patchwork: https://patchwork.linux-mips.org/patch/2246/
Cc: linux-mips <linux-mips@linux-mips.org>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:23 +01:00
Ralf Baechle c87444af6f MIPS: Loongson: Fix GCC 2.6.0 build error.
CC      arch/mips/loongson/common/env.o
arch/mips/loongson/common/env.c: In function 'prom_init_env':
arch/mips/loongson/common/env.c:50:12: error: variable 'ret' set but not used [-Werror=unused-but-set-variable]
arch/mips/loongson/common/env.c:51:12: error: variable 'ret' set but not used [-Werror=unused-but-set-variable]
arch/mips/loongson/common/env.c:52:12: error: variable 'ret' set but not used [-Werror=unused-but-set-variable]
arch/mips/loongson/common/env.c:53:12: error: variable 'ret' set but not used [-Werror=unused-but-set-variable]
cc1: all warnings being treated as errors

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:23 +01:00
Ralf Baechle 84d3b0dbac MIPS: Jazz: Fix GCC 4.6.0 build error
CC      arch/mips/jazz/jazzdma.o
arch/mips/jazz/jazzdma.c: In function 'vdma_remap':
arch/mips/jazz/jazzdma.c:214:20: error: variable 'npages' set but not used [-Werror=unused-but-set-variable]
cc1: all warnings being treated as errors

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:23 +01:00
Ralf Baechle 11b9d0eca5 MIPS: SNI: Fix GCC 4.6.0 build error
CC      arch/mips/sni/time.o
arch/mips/sni/time.c: In function 'dosample':
arch/mips/sni/time.c:98:19: error: variable 'lsb' set but not used [-Werror=unused-but-set-variable]
cc1: all warnings being treated as errors

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:23 +01:00
Ralf Baechle 6be63bbbda MIPS: Malta: Fix GCC 4.6.0 build error
CC      arch/mips/mti-malta/malta-int.o
arch/mips/mti-malta/malta-int.c: In function 'mips_pcibios_iack':
arch/mips/mti-malta/malta-int.c:59:6: error: variable 'dummy' set but not used [-Werror=unused-but-set-variable]
cc1: all warnings being treated as errors

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:23 +01:00
Ralf Baechle af3a1f6f48 MIPS: Malta: Fix GCC 4.6.0 build error
CC      arch/mips/mti-malta/malta-init.o
arch/mips/mti-malta/malta-init.c: In function 'prom_init':
arch/mips/mti-malta/malta-init.c:196:6: error: variable 'result' set but not used [-Werror=unused-but-set-variable]
cc1: all warnings being treated as errors

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:23 +01:00
Ralf Baechle 3be1afc8f6 MIPS: IP22: Fix GCC 4.6.0 build error
CC      arch/mips/sgi-ip22/ip22-platform.o
arch/mips/sgi-ip22/ip22-platform.c: In function 'sgiseeq_devinit':
arch/mips/sgi-ip22/ip22-platform.c:135:15: error: variable 'tmp' set but not used [-Werror=unused-but-set-variable]
cc1: all warnings being treated as errors

While at it rename the variable to pbdma for readability; there is a
local variable tmp of different type being used in two nested blocks.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:22 +01:00
Ralf Baechle 6fd78fc1fa MIPS: IP22: Fix GCC 4.6.0 build error
CC      arch/mips/sgi-ip22/ip22-time.o
arch/mips/sgi-ip22/ip22-time.c: In function 'dosample':
arch/mips/sgi-ip22/ip22-time.c:35:10: error: variable 'lsb' set but not used [-Werror=unused-but-set-variable]
cc1: all warnings being treated as errors

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:22 +01:00
Ralf Baechle 4a9040f451 MIPS: tlbex: Fix GCC 4.6.0 build error
CC      arch/mips/mm/tlbex.o
arch/mips/mm/tlbex.c: In function 'build_r4000_tlb_refill_handler':
arch/mips/mm/tlbex.c:1155:22: error: variable 'vmalloc_mode' set but not used [-Werror=unused-but-set-variable]
arch/mips/mm/tlbex.c:1154:28: error: variable 'htlb_info' set but not used [-Werror=unused-but-set-variable]
cc1: all warnings being treated as errors

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:22 +01:00
Ralf Baechle 71271aab8c MIPS: c-r4k: Fix GCC 4.6.0 build error
CC      arch/mips/mm/c-r4k.o
arch/mips/mm/c-r4k.c: In function 'probe_scache':
arch/mips/mm/c-r4k.c:1078:6: error: variable 'tmp' set but not used [-Werror=unused-but-set-variable]
cc1: all warnings being treated as errors

Older GCC versions didn't warn about the unused variable tmp because it was
getting initialized.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:22 +01:00
David Daney c54794d19e MIPS: Mask jump target in ftrace_dyn_arch_init_insns().
The current code is abusing the uasm interface by passing jump target
addresses with high bits set.  Mask the addresses to avoid annoying
messages at boot time.

Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Wu Zhangjin <wuzhangjin@gmail.com>
Patchwork: https://patchwork.linux-mips.org/patch/1922/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:22 +01:00
Joerg Roedel fffcda1183 x86, gart: Rename pci-gart_64.c to amd_gart_64.c
This file only contains code relevant for the northbridge
gart in AMD processors. This patch renames the file to
represent this fact in the filename.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2011-05-10 17:22:06 +02:00
Martin Schwidefsky badb8bb983 [S390] fix alloc_pgste check in init_new_context
Processes started with kernel_execve from a kernel thread will have
current->mm==NULL. Reading current->mm->context.alloc_pgste will
read a more or less random bit from lowcore in this case. If the
bit turns out to be set the whole process tree started this way
will allocate page table extensions although they have no need
for it.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-05-10 17:13:43 +02:00
Martin Schwidefsky 3d8dcb3c76 [S390] oprofile: fix min/max interval query checks
oprofile_min_interval and oprofile_max_interval are unsigned, checking
for negative values doesn't work. Change hwsampler_query_min_interval
and hwsampler_query_max_interval to return an unsigned long and
check for a zero value instead.

Reported-by: Nicolas Kaiser <nikai@nikai.net>
Acked-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-05-10 17:13:43 +02:00
Michael Holzheu 83ace2701b [S390] replace diag10() with diag10_range() function
Currently the diag10() function can only release one page. For exploiters
that have to call diag10 on a contiguous memory region this is suboptimal.
This patch replaces the diag10() function with diag10_range() that is
able to release multiple pages. In addition to that the new function now
allows to release memory with addresses higher than 2047 MiB. This was
due to a restriction of the diagnose implementation under z/VM prior to
release 5.2.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-05-10 17:13:43 +02:00
Christian Borntraeger 91d378088b [S390] disassembler: handle b280/spp instruction
arch/s390/kvm/sie64a.S uses the b280 instruction. Tell the builtin
disassembler to handle that code.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-05-10 17:13:42 +02:00
Michael Holzheu 8eb4bd666f [S390] kernel: Initialize register 14 when starting new CPU
When starting a new CPU we currently jump to start_secondary() without
setting register 14 (the return address) correctly. Therefore on the stack
frame for start_secondary an invalid return address is stored. This leads
to wrong stack back traces in kernel dumps.

Example:

 #00 [1f33fe48] cpu_idle at 10614a
 #01 [1f33fe90] start_secondary at 54fa88
 #02 [1f33feb8] (null) at 0                 <--- invalid

To fix this start_secondary() is called now with basr/brasl that sets
register 14 correctly. The output of the stack backtrace looks then
like the following:

 #00 [1f33fe48] cpu_idle at 10614a
 #01 [1f33fe90] start_secondary at 54fa88
 #02 [1f33feb8] restart_base at 54f41e      <--- correct

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-05-10 17:13:42 +02:00
Ingo Molnar 932fed4e2e Merge commit 'v2.6.39-rc7' into perf/core
Merge reason: pull in the latest fixes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-10 17:05:45 +02:00
Joerg Roedel 72fe00f01f x86/amd-iommu: Use threaded interupt handler
Move the interupt handling for the iommu into the interupt
thread to reduce latencies and prepare interupt handling for
pri handling.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2011-05-10 11:07:58 +02:00
Joerg Roedel 604c307bf4 Merge branches 'dma-debug/next', 'amd-iommu/command-cleanups', 'amd-iommu/ats' and 'amd-iommu/extended-features' into iommu/2.6.40
Conflicts:
	arch/x86/include/asm/amd_iommu_types.h
	arch/x86/kernel/amd_iommu.c
	arch/x86/kernel/amd_iommu_init.c
2011-05-10 10:25:23 +02:00
Joe Perches e969687595 arch/x86/kernel/pci-iommu_table.c: Convert sprintf_symbol to %pS
Coalesce format as well.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2011-05-10 10:21:35 +02:00
Jack Steiner 1d44e8288a x86, UV: Fix NMI handler for UV platforms
This fixes problems seen on UV systems handling NMIs from the
node controller.

I isolated the "dazed..." messages that I saw earlier to a bug in
the BMC on our platform. It was sending NMIs w/o properly setting
a register that indicated the source of NMI.

So rather than _assuming_ any unhandled NMI came from the UV system
maintenance console (SMC), add a check to verify that the SMC actually
sent the NMI.

Signed-off-by: Jack Steiner <steiner@sgi.com>
Cc: gorcunov@gmail.com
Cc: dzickus@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-10 09:26:55 +02:00
Matthew Garrett 935a638241 x86, efi: Ensure that the entirity of a region is mapped
It's possible for init_memory_mapping() to fail to map the entire region
if it crosses a boundary, so ensure that we complete the mapping.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Link: http://lkml.kernel.org/r/1304623186-18261-5-git-send-email-mjg@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-09 12:14:45 -07:00
Matthew Garrett 7cb00b7287 x86, efi: Pass a minimal map to SetVirtualAddressMap()
Experimentation with various EFI implementations has shown that functions
outside runtime services will still update their pointers if
SetVirtualAddressMap() is called with memory descriptors outside the
runtime area. This is obviously insane, and therefore is unsurprising.
Evidence from instrumenting another EFI implementation suggests that it
only passes the set of descriptors covering runtime regions, so let's
avoid any problems by doing the same. Runtime descriptors are copied to
a separate memory map, and only that map is passed back to the firmware.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Link: http://lkml.kernel.org/r/1304623186-18261-4-git-send-email-mjg@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-09 12:14:39 -07:00
Matthew Garrett 202f9d0a41 x86, efi: Merge contiguous memory regions of the same type and attribute
Some firmware implementations assume that physically contiguous regions
will be contiguous in virtual address space. This assumption is, obviously,
entirely unjustifiable. Said firmware implementations lack the good grace
to handle their failings in a measured and reasonable manner, instead
tending to shit all over address space and oopsing the kernel.

In an ideal universe these firmware implementations would simultaneously
catch fire and cease to be a problem, but since some of them are present
in attractively thin and shiny metal devices vanity wins out and some
poor developer spends an extended period of time surrounded by a
growing array of empty bottles until the underlying reason becomes
apparent. Said developer presents this patch, which simply merges
adjacent regions if they happen to be contiguous and have the same EFI
memory type and caching attributes.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Link: http://lkml.kernel.org/r/1304623186-18261-3-git-send-email-mjg@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-09 12:14:34 -07:00
Matthew Garrett 9cd2b07c19 x86, efi: Consolidate EFI nx control
The core EFI code and 64-bit EFI code currently have independent
implementations of code for setting memory regions as executable or not.
Let's consolidate them.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Link: http://lkml.kernel.org/r/1304623186-18261-2-git-send-email-mjg@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-09 12:14:29 -07:00
Matthew Garrett 2b5e8ef35b x86, efi: Remove virtual-mode SetVirtualAddressMap call
The spec says that SetVirtualAddressMap doesn't work once you're in
virtual mode, so there's no point in having infrastructure for calling
it from there.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Link: http://lkml.kernel.org/r/1304623186-18261-1-git-send-email-mjg@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-09 12:14:25 -07:00
Luis R. Rodriguez 8fab6af215 x86: Fix mrst sparse complaints
Fix these Sparse complaints:

  CHECK   arch/x86/platform/mrst/mrst.c
  arch/x86/platform/mrst/mrst.c:197:13: warning: symbol 'mrst_time_init' was not declared. Should it be static?
  arch/x86/platform/mrst/mrst.c:219:16: warning: symbol 'mrst_arch_setup' was not declared. Should it be static?

Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Roman Gezikov <roman.gezikov@atheros.com>
Cc: Joonas Viskari <joonas.viskari@atheros.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Allen Kao <allen.kao@atheros.com>
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Link: http://lkml.kernel.org/r/1304719209-26913-1-git-send-email-lrodriguez@atheros.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-07 10:52:30 +02:00
Ingo Molnar 4cb1f43ce8 Merge commit 'v2.6.39-rc6' into x86/cleanups
Merge reason: move to a (much) newer upstream base.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-07 10:51:48 +02:00
Russell King 362607df9f Merge first four commits of 'zImage_fixes' of git://git.linaro.org/people/nico/linux into fixes 2011-05-07 08:34:40 +01:00
Nicolas Pitre ea9df3b168 ARM: zImage: the page table memory must be considered before relocation
For correctness, the initial page table located right before the
decompressed kernel should be considered when determining if relocation
is required.

Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Tested-by: Shawn Guo <shawn.guo@linaro.org>
Acked-by: Tony Lindgren <tony@atomide.com>
2011-05-07 00:07:58 -04:00
Nicolas Pitre adcc25915b ARM: zImage: make sure not to relocate on top of the relocation code
If the zImage load address is slightly below the relocation address,
there is a risk for the copied data to overwrite the copy loop or
cache flush code that the relocation process requires.  Always
bump the relocation address by the size of that code to avoid this
issue.

Noticed by Tony Lindgren <tony@atomide.com>.

While at it, let's start the copy from the restart symbol which makes
the above code size computation possible by the assembler directly
(same sections), given that we don't need to preserve the code before
that point anyway. And therefore we don't need to carry the _start
pointer in r5 anymore.

Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Tested-by: Tony Lindgren <tony@atomide.com>
2011-05-07 00:07:53 -04:00
Tony Lindgren 7c2527f0c4 ARM: zImage: Fix bad SP address after relocating kernel
Otherwise cache_clean_flush can overwrite some of the relocated
area depending on where the kernel image gets loaded. This fixes
booting on n900 after commit 6d7d0ae515
(ARM: 6750/1: improvements to compressed/head.S).

Thanks to Aaro Koskinen <aaro.koskinen@nokia.com> for debugging
the address of the relocated area that gets corrupted, and to
Nicolas Pitre <nicolas.pitre@linaro.org> for the other uncompress
related fixes.

Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-05-06 23:56:43 -04:00
Nicolas Pitre 3bd2cbb955 ARM: zImage: make sure the stack is 64-bit aligned
With ARMv5+ and EABI, the compiler expects a 64-bit aligned stack so
instructions like STRD and LDRD can be used.  Without this, mysterious
boot failures were seen semi randomly with the LZMA decompressor.

While at it, let's align .bss as well.

Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Tested-by: Shawn Guo <shawn.guo@linaro.org>
Acked-by: Tony Lindgren <tony@atomide.com>
CC: stable@kernel.org
2011-05-06 23:55:49 -04:00
Ingo Molnar 57d524154f Merge branch 'perf/stat' into perf/core
Merge reason: the perf stat improvements are tested and ready now.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-06 21:07:38 +02:00
Rob Landley 6151658751 Correct occurrences of
- Documentation/kvm/ to Documentation/virtual/kvm
- Documentation/uml/ to Documentation/virtual/uml
- Documentation/lguest/ to Documentation/virtual/lguest
throughout the kernel source tree.

Signed-off-by: Rob Landley <rob@landley.net>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
2011-05-06 09:27:55 -07:00
Peter Zijlstra 63b6a6758e perf events, x86: Fix Intel Nehalem and Westmere last level cache event definitions
The Intel Nehalem offcore bits implemented in:

  e994d7d23a0b: perf: Fix LLC-* events on Intel Nehalem/Westmere

... are wrong: they implemented _ACCESS as _HIT and counted OTHER_CORE_HIT* as
MISS even though its clearly documented as an L3 hit ...

Fix them and the Westmere definitions as well.

Cc: Andi Kleen <ak@linux.intel.com>
Cc: Lin Ming <ming.m.lin@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1299119690-13991-3-git-send-email-ming.m.lin@intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-06 11:24:48 +02:00
Frederic Weisbecker 925f83c085 hw_breakpoints, powerpc: Fix CONFIG_HAVE_HW_BREAKPOINT off-case in ptrace_set_debugreg()
We make use of ptrace_get_breakpoints() / ptrace_put_breakpoints() to
protect ptrace_set_debugreg() even if CONFIG_HAVE_HW_BREAKPOINT if off.
However in this case, these APIs are not implemented.

To fix this, push the protection down inside the relevant ifdef.
Best would be to export the code inside
CONFIG_HAVE_HW_BREAKPOINT into a standalone function to cleanup
the ifdefury there and call the breakpoint ref API inside. But
as it is more invasive, this should be rather made in an -rc1.

Fixes this build error:

  arch/powerpc/kernel/ptrace.c:1594: error: implicit declaration of function 'ptrace_get_breakpoints' make[2]: ***

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: LPPC <linuxppc-dev@lists.ozlabs.org>
Cc: Prasad <prasad@linux.vnet.ibm.com>
Cc: v2.6.33.. <stable@kernel.org>
Link: http://lkml.kernel.org/r/1304639598-4707-1-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-06 11:24:46 +02:00
Lin Ming e04d1b23f9 perf events, x86: Add SandyBridge stalled-cycles-frontend/backend events
Extend the Intel SandyBridge PMU driver with definitions
for generic front-end and back-end stall events.

( As commit 3011203 "perf events, x86: Add Westmere stalled-cycles-frontend/backend
  events" says, these are only approximations. )

Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1304666042-17577-1-git-send-email-ming.m.lin@intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-06 09:37:03 +02:00
Ingo Molnar 4d70230bb4 Merge branch 'master' of ssh://master.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 into perf/urgent 2011-05-06 08:11:28 +02:00
Linus Torvalds bfd412db9e Merge branch 'for-linus' of git://github.com/at91linux/linux-2.6-at91
* 'for-linus' of git://github.com/at91linux/linux-2.6-at91:
  at91: Add ARCH_ID and basic cpu macros definition for 5series chips family.
  arm: at91: fix compiler warning for eb01 board build
  arm: at91: minimal defconfig for at91x40 SoC
  ARM: at91: AT91CAP9 has a macb device
2011-05-05 21:27:57 -07:00
Ingo Molnar 98bb318864 Merge branch 'perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/urgent 2011-05-04 20:33:42 +02:00
Dominik Brodowski 2d06d8c49a [CPUFREQ] use dynamic debug instead of custom infrastructure
With dynamic debug having gained the capability to report debug messages
also during the boot process, it offers a far superior interface for
debug messages than the custom cpufreq infrastructure. As a first step,
remove the old cpufreq_debug_printk() function and replace it with a call
to the generic pr_debug() function.

How can dynamic debug be used on cpufreq? You need a kernel which has
CONFIG_DYNAMIC_DEBUG enabled.

To enabled debugging during runtime, mount debugfs and

$ echo -n 'module cpufreq +p' > /sys/kernel/debug/dynamic_debug/control

for debugging the complete "cpufreq" module. To achieve the same goal during
boot, append

	ddebug_query="module cpufreq +p"

as a boot parameter to the kernel of your choice.

For more detailled instructions, please see
Documentation/dynamic-debug-howto.txt

Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Dave Jones <davej@redhat.com>
2011-05-04 11:50:57 -04:00
Naga Chumbalkar 904cc1e637 [CPUFREQ] Fix _OSC UUID in pcc-cpufreq
UUID needs to be written out the way it is described in
Sec 18.5.124 of ACPI 4.0a Specification.

Platform firmware's use of this UUID/_OSC is optional, which is
why we didn't notice this bug earlier.

Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Cc: stable@kernel.org
2011-05-04 11:50:56 -04:00
Linus Torvalds 609cfda586 Merge branch 'stable/bug-fixes-for-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/bug-fixes-for-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen: mask_rw_pte mark RO all pagetable pages up to pgt_buf_top
  xen/mmu: Add workaround "x86-64, mm: Put early page table high"
2011-05-03 09:25:42 -07:00
Linus Torvalds bab0dcc717 Merge branches 'x86-fixes-for-linus' and 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, reboot: Fix relocations in reboot_32.S
  x86, NUMA: Fix empty memblk detection in numa_cleanup_meminfo()
  x86, AMD: Fix APIC timer erratum 400 affecting K8 Rev.A-E processors

* 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  genirq: Fix typo CONFIG_GENIRC_IRQ_SHOW_LEVEL
2011-05-03 09:23:44 -07:00
Linus Torvalds 5933f2ae35 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (47 commits)
  sysctl: net: call unregister_net_sysctl_table where needed
  Revert: veth: remove unneeded ifname code from veth_newlink()
  smsc95xx: fix reset check
  tg3: Fix failure to enable WoL by default when possible
  networking: inappropriate ioctl operation should return ENOTTY
  amd8111e: trivial typo spelling: Negotitate -> Negotiate
  ipv4: don't spam dmesg with "Using LC-trie" messages
  af_unix: Only allow recv on connected seqpacket sockets.
  mii: add support of pause frames in mii_get_an
  net: ftmac100: fix scheduling while atomic during PHY link status change
  usbnet: Transfer of maintainership
  usbnet: add support for some Huawei modems with cdc-ether ports
  bnx2: cancel timer on device removal
  iwl4965: fix "Received BA when not expected"
  iwlagn: fix "Received BA when not expected"
  dsa/mv88e6131: fix unknown multicast/broadcast forwarding on mv88e6085
  usbnet: Resubmit interrupt URB if device is open
  iwl4965: fix "TX Power requested while scanning"
  iwlegacy: led stay solid on when no traffic
  b43: trivial: update module info about ucode16_mimo firmware
  ...
2011-05-02 18:00:43 -07:00
H. Peter Anvin 7806a49ab6 x86, reboot: Fix relocations in reboot_32.S
The use of base for %ebx in this file is arbitrary, *except* that we
also use it to compute the real-mode segment.  Therefore, make it so
that r_base really is the true address to which %ebx points.

This resolves kernel bugzilla 33302.

Reported-and-tested-by: Alexey Zaytsev <alexey.zaytsev@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/n/tip-08os5wi3yq1no0y4i5m4z7he@git.kernel.org
2011-05-02 14:44:46 -07:00
Stefano Stabellini b9269dc7bf xen: mask_rw_pte mark RO all pagetable pages up to pgt_buf_top
mask_rw_pte is currently checking if a pfn is a pagetable page if it
falls in the range pgt_buf_start - pgt_buf_end but that is incorrect
because pgt_buf_end is a moving target: pgt_buf_top is the real
boundary.

Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-02 16:33:52 -04:00
Konrad Rzeszutek Wilk a38647837a xen/mmu: Add workaround "x86-64, mm: Put early page table high"
As a consequence of the commit:

commit 4b239f458c
Author: Yinghai Lu <yinghai@kernel.org>
Date:   Fri Dec 17 16:58:28 2010 -0800

    x86-64, mm: Put early page table high

it causes the Linux kernel to crash under Xen:

mapping kernel into physical memory
Xen: setup ISA identity maps
about to get started...
(XEN) mm.c:2466:d0 Bad type (saw 7400000000000001 != exp 1000000000000000) for mfn b1d89 (pfn bacf7)
(XEN) mm.c:3027:d0 Error while pinning mfn b1d89
(XEN) traps.c:481:d0 Unhandled invalid opcode fault/trap [#6] on VCPU 0 [ec=0000]
(XEN) domain_crash_sync called from entry.S
(XEN) Domain 0 (vcpu#0) crashed on cpu#0:
...

The reason is that at some point init_memory_mapping is going to reach
the pagetable pages area and map those pages too (mapping them as normal
memory that falls in the range of addresses passed to init_memory_mapping
as argument). Some of those pages are already pagetable pages (they are
in the range pgt_buf_start-pgt_buf_end) therefore they are going to be
mapped RO and everything is fine.
Some of these pages are not pagetable pages yet (they fall in the range
pgt_buf_end-pgt_buf_top; for example the page at pgt_buf_end) so they
are going to be mapped RW.  When these pages become pagetable pages and
are hooked into the pagetable, xen will find that the guest has already
a RW mapping of them somewhere and fail the operation.
The reason Xen requires pagetables to be RO is that the hypervisor needs
to verify that the pagetables are valid before using them. The validation
operations are called "pinning" (more details in arch/x86/xen/mmu.c).

In order to fix the issue we mark all the pages in the entire range
pgt_buf_start-pgt_buf_top as RO, however when the pagetable allocation
is completed only the range pgt_buf_start-pgt_buf_end is reserved by
init_memory_mapping. Hence the kernel is going to crash as soon as one
of the pages in the range pgt_buf_end-pgt_buf_top is reused (b/c those
ranges are RO).

For this reason, this function is introduced which is called _after_
the init_memory_mapping has completed (in a perfect world we would
call this function from init_memory_mapping, but lets ignore that).

Because we are called _after_ init_memory_mapping the pgt_buf_[start,
end,top] have all changed to new values (b/c another init_memory_mapping
is called). Hence, the first time we enter this function, we save
away the pgt_buf_start value and update the pgt_buf_[end,top].

When we detect that the "old" pgt_buf_start through pgt_buf_end
PFNs have been reserved (so memblock_x86_reserve_range has been called),
we immediately set out to RW the "old" pgt_buf_end through pgt_buf_top.

And then we update those "old" pgt_buf_[end|top] with the new ones
so that we can redo this on the next pagetable.

Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Reviewed-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
[v1: Updated with Jeremy's comments]
[v2: Added the crash output]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-02 16:33:34 -04:00
Linus Torvalds 625a3b6057 Merge branch 'fixes' of master.kernel.org:/home/rmk/linux-2.6-arm
* 'fixes' of master.kernel.org:/home/rmk/linux-2.6-arm: (47 commits)
  CLKDEV: Fix clkdev return value for NULL clk case
  ARM: 6891/1: prevent heap corruption in OABI semtimedop
  ARM: kprobes: Tidy-up kprobes-decode.c
  ARM: kprobes: Add emulation of hint instructions like NOP and WFI
  ARM: kprobes: Add emulation of SBFX, UBFX, BFI and BFC instructions
  ARM: kprobes: Add emulation of MOVW and MOVT instructions
  ARM: kprobes: Reject probing of undefined data processing instructions
  ARM: kprobes: Remove redundant code in space_1111
  ARM: kprobes: Fix emulation of PLD instructions
  ARM: kprobes: Reject probing of SETEND instructions
  ARM: kprobes: Consolidate stub decoding functions
  ARM: kprobes: Reject probing of all coprocessor instructions
  ARM: kprobes: Fix emulation of USAD8 instructions
  ARM: kprobes: Fix emulation of SMUAD, SMUSD and SMMUL instructions
  ARM: kprobes: Fix emulation of SXTB16, SXTB, SXTH, UXTB16, UXTB and UXTH instructions
  ARM: kprobes: Reject probing of undefined media instructions
  ARM: kprobes: Add emulation of RBIT instruction
  ARM: kprobes: Reject probing of LDRB instructions which load PC
  ARM: kprobes: Fix emulation of LDRD and STRD instructions
  ARM: kprobes: Reject probing of LDR/STR instructions which update PC unpredictably
  ...
2011-05-02 12:17:05 -07:00
Linus Torvalds 96f3ee2805 Merge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6:
  [S390] irqstats: fix counting of pfault, dasd diag and virtio irqs
  [S390] prng: fix pointer arithmetic
2011-05-02 08:47:35 -07:00
Yinghai Lu e5a10c1bd1 x86, NUMA: Trim numa meminfo with max_pfn in a separate loop
During testing 32bit numa unifying code from tj, found one system with
more than 64g fails to use numa.  It turns out we do not trim numa
meminfo correctly against max_pfn in case start address of a node is
higher than 64GiB.  Bug fix made it to tip tree.

This patch moves the checking and trimming to a separate loop.  So we
don't need to compare low/high in following merge loops.  It makes the
code more readable.

Also it makes the node merge printouts less strange.  On a 512GiB numa
system with 32bit,

before:
> NUMA: Node 0 [0,a0000) + [100000,80000000) -> [0,80000000)
> NUMA: Node 0 [0,80000000) + [100000000,1080000000) -> [0,1000000000)

after:
> NUMA: Node 0 [0,a0000) + [100000,80000000) -> [0,80000000)
> NUMA: Node 0 [0,80000000) + [100000000,1000000000) -> [0,1000000000)

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
[Updated patch description and comment slightly.]
Signed-off-by: Tejun Heo <tj@kernel.org>
2011-05-02 17:24:49 +02:00
Yinghai Lu a56bca80db x86, NUMA: Rename setup_node_bootmem() to setup_node_data()
After using memblock to replace bootmem, that function only sets up
node_data now.

Change the name to reflect what it actually does.

tj: Minor adjustment to the patch description.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2011-05-02 17:24:49 +02:00
Tejun Heo 1b7e03ef75 x86, NUMA: Enable emulation on 32bit too
Now that NUMA init path is unified, NUMA emulation can be enabled on
32bit.  Make numa_emluation.c safe on 32bit by doing the followings.

* Define MAX_DMA32_PFN on 32bit too.

* Include bootmem.h for max_pfn declaration.

* Use u64 explicitly and always use PFN_PHYS() when converting page
  number to address.

* Avoid __udivdi3() generation on 32bit by doing number of pages
  calculation instead in split_nodes_interleave().

And drop X86_64 dependency from Kconfig.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 17:24:48 +02:00
Tejun Heo 2706a0bf7b x86, NUMA: Enable CONFIG_AMD_NUMA on 32bit too
Now that NUMA init path is unified, amdtopology can be enabled on
32bit.  Make amdtopology.c safe on 32bit by explicitly using u64 and
drop X86_64 dependency from Kconfig.

Inclusion of bootmem.h is added for max_pfn declaration.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 17:24:48 +02:00
Tejun Heo c6f5887820 x86, NUMA: Rename amdtopology_64.c to amdtopology.c
amdtopology is going to be used by 32bit too drop _64 suffix.  This is
pure rename.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 17:24:48 +02:00
Tejun Heo 752d4f372f x86, NUMA: Make numa_init_array() static
numa_init_array() no longer has users outside of numa.c.  Make it
static.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 17:24:48 +02:00
Tejun Heo bd6709a91a x86, NUMA: Make 32bit use common NUMA init path
With both _numa_init() methods converted and the rest of init code
adjusted, numa_32.c now can switch from the 32bit only init code to
the common one in numa.c.

* Shim get_memcfg_*()'s are dropped and initmem_init() calls
  x86_numa_init(), which is updated to handle NUMAQ.

* All boilerplate operations including node range limiting, pgdat
  alloc/init are handled by numa_init().  32bit only implementation is
  removed.

* 32bit numa_add_memblk(), numa_set_distance() and
  memory_add_physaddr_to_nid() removed and common versions in
  numa_32.c enabled for 32bit.

This change causes the following behavior changes.

* NODE_DATA()->node_start_pfn/node_spanned_pages properly initialized
  for 32bit too.

* Much more sanity checks and configuration cleanups.

* Proper handling of node distances.

* The same NUMA init messages as 64bit.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 17:24:48 +02:00
Tejun Heo 7888e96b26 x86, NUMA: Initialize and use remap allocator from setup_node_bootmem()
setup_node_bootmem() is taken from 64bit and doesn't use remap
allocator.  It's about to be shared with 32bit so add support for it.
If NODE_DATA is remapped, it's noted in the debug message and node
locality check is skipped as the __pa() of the remapped address
doesn't reflect the actual physical address.

On 64bit, remap allocator becomes noop and doesn't affect the
behavior.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 14:18:54 +02:00
Tejun Heo 99cca492ea x86-32, NUMA: Add @start and @end to init_alloc_remap()
Instead of dereferencing node_start/end_pfn[] directly, make
init_alloc_remap() take @start and @end and let the caller be
responsible for making sure the range is sane.  This is to prepare for
use from unified NUMA init code.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 14:18:54 +02:00
Tejun Heo 38f3e1ca24 x86, NUMA: Remove long 64bit assumption from numa.c
Code moved from numa_64.c has assumption that long is 64bit in several
places.  This patch removes the assumption by using {s|u}64_t
explicity, using PFN_PHYS() for page number -> addr conversions and
adjusting printf formats.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 14:18:53 +02:00
Tejun Heo 744baba0c4 x86, NUMA: Enable build of generic NUMA init code on 32bit
Generic NUMA init code was moved to numa.c from numa_64.c but is still
guaraded by CONFIG_X86_64.  This patch removes the compile guard and
enables compiling on 32bit.

* numa_add_memblk() and numa_set_distance() clash with the shim
  implementation in numa_32.c and are left out.

* memory_add_physaddr_to_nid() clashes with 32bit implementation and
  is left out.

* MAX_DMA_PFN definition in dma.h moved out of !CONFIG_X86_32.

* node_data definition in numa_32.c removed in favor of the one in
  numa.c.

There are places where ulong is assumed to be 64bit.  The next patch
will fix them up.  Note that although the code is compiled it isn't
used yet and this patch doesn't cause any functional change.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 14:18:53 +02:00
Tejun Heo a4106eae65 x86, NUMA: Move NUMA init logic from numa_64.c to numa.c
Move the generic 64bit NUMA init machinery from numa_64.c to numa.c.

* node_data[], numa_mem_info and numa_distance
* numa_add_memblk[_to](), numa_remove_memblk[_from]()
* numa_set_distance() and friends
* numa_init() and all the numa_meminfo handling helpers called from it
* dummy_numa_init()
* memory_add_physaddr_to_nid()

A new function x86_numa_init() is added and the content of
numa_64.c::initmem_init() is moved into it.  initmem_init() now simply
calls x86_numa_init().

Constants and numa_off declaration are moved from numa_{32|64}.h to
numa.h.

This is code reorganization and doesn't involve any functional change.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 14:18:53 +02:00
Tejun Heo 299a180aec x86-32, NUMA: Update numaq to use new NUMA init protocol
Update numaq such that it calls numa_add_memblk() and sets
numa_nodes_parsed instead of directly diddling with NUMA states.  The
original get_memcfg_numaq() is renamed to numaq_numa_init() and new
get_memcfg_numaq() is created in numa_32.c.

The shim numa_add_memblk() implementation handles node_start/end_pfn[]
and node_set_online() for nodes with memory.  The new
get_memcfg_numaq() exactly the same with get_memcfg_from_srat() other
than calling the numaq init function.  Things get_memcfgs_numaq() do
are not strictly necessary for numaq but added for consistency and to
help unifying NUMA init handling.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 14:18:53 +02:00
Tejun Heo 5acd91ab83 x86-32, NUMA: Replace srat_32.c with srat.c
SRAT support implementation in srat_32.c and srat.c are generally
similar; however, there are some differences.

First of all, 64bit implementation supports more types of SRAT
entries.  64bit supports x2apic, affinity, memory and SLIT.  32bit
only supports processor and memory.

Most other differences stem from different initialization protocols
employed by 64bit and 32bit NUMA init paths.

On 64bit,

* Mappings among PXM, node and apicid are directly done in each SRAT
  entry callback.

* Memory affinity information is passed to numa_add_memblk() which
  takes care of all interfacing with NUMA init.

* Doesn't directly initialize NUMA configurations.  All the
  information is recorded in numa_nodes_parsed and memblks.

On 32bit,

* Checks numa_off.

* Things go through one more level of indirection via private tables
  but eventually end up initializing the same mappings.

* node_start/end_pfn[] are initialized and
  memblock_x86_register_active_regions() is called for each memory
  chunk.

* node_set_online() is called for each online node.

* sort_node_map() is called.

There are also other minor differences in sanity checking and messages
but taking 64bit version should be good enough.

This patch drops the 32bit specific implementation and makes the 64bit
implementation common for both 32 and 64bit.

The init protocol differences are dealt with in two places - the
numa_add_memblk() shim added in the previous patch and new temporary
numa_32.c:get_memcfg_from_srat() which wraps invocation of
x86_acpi_numa_init().

The shim numa_add_memblk() handles the folowings.

* node_start/end_pfn[] initialization.

* node_set_online() for memory nodes.

* Invocation of memblock_x86_register_active_regions().

The shim get_memcfg_from_srat() handles the followings.

* numa_off check.

* node_set_online() for CPU nodes.

* sort_node_map() invocation.

* Clearing of numa_nodes_parsed and active_ranges on failure.

The shims are temporary and will be removed as the generic NUMA init
path in 32bit is replaced with 64bit one.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 14:18:53 +02:00
Tejun Heo b0d310801a x86-32, NUMA: implement temporary NUMA init shims
To help transition to common NUMA init, implement temporary 32bit
shims for numa_add_memblk() and numa_set_distance().
numa_add_memblk() registers the memblk and adjusts
node_start/end_pfn[].  numa_set_distance() is noop.

These shims will allow using 64bit NUMA init functions on 32bit and
gradual transition to common NUMA init path.

For detailed description, please read description of commits which
make use of the shim functions.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 14:18:53 +02:00
Tejun Heo e6df595b37 x86, NUMA: Move numa_nodes_parsed to numa.[hc]
Move numa_nodes_parsed from numa_64.[hc] to numa.[hc] to prepare for
NUMA init path unification.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 14:18:53 +02:00