Commit Graph

57686 Commits

Author SHA1 Message Date
Ralf Baechle 10423c91ff MIPS: Fix duplicate invocation of notify_die.
Initial patch by Yury Polyanskiy <ypolyans@princeton.edu>.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Patchwork: https://patchwork.linux-mips.org/patch/2373/
2011-05-18 14:18:26 +01:00
Ralf Baechle 3436830af5 MIPS: RB532: Fix iomap resource size miscalculation.
This is the MIPS portion of Joe Perches <joe@perches.com>'s
https://patchwork.linux-mips.org/patch/2172/ which seems to have been
lost in time and space.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-18 14:18:26 +01:00
Jiri Olsa 26afb7c661 x86, 64-bit: Fix copy_[to/from]_user() checks for the userspace address limit
As reported in BZ #30352:

  https://bugzilla.kernel.org/show_bug.cgi?id=30352

there's a kernel bug related to reading the last allowed page on x86_64.

The _copy_to_user() and _copy_from_user() functions use the following
check for address limit:

  if (buf + size >= limit)
	fail();

while it should be more permissive:

  if (buf + size > limit)
	fail();

That's because the size represents the number of bytes being
read/write from/to buf address AND including the buf address.
So the copy function will actually never touch the limit
address even if "buf + size == limit".

Following program fails to use the last page as buffer
due to the wrong limit check:

 #include <sys/mman.h>
 #include <sys/socket.h>
 #include <assert.h>

 #define PAGE_SIZE       (4096)
 #define LAST_PAGE       ((void*)(0x7fffffffe000))

 int main()
 {
        int fds[2], err;
        void * ptr = mmap(LAST_PAGE, PAGE_SIZE, PROT_READ | PROT_WRITE,
                          MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED, -1, 0);
        assert(ptr == LAST_PAGE);
        err = socketpair(AF_LOCAL, SOCK_STREAM, 0, fds);
        assert(err == 0);
        err = send(fds[0], ptr, PAGE_SIZE, 0);
        perror("send");
        assert(err == PAGE_SIZE);
        err = recv(fds[1], ptr, PAGE_SIZE, MSG_WAITALL);
        perror("recv");
        assert(err == PAGE_SIZE);
        return 0;
 }

The other place checking the addr limit is the access_ok() function,
which is working properly. There's just a misleading comment
for the __range_not_ok() macro - which this patch fixes as well.

The last page of the user-space address range is a guard page and
Brian Gerst observed that the guard page itself due to an erratum on K8 cpus
(#121 Sequential Execution Across Non-Canonical Boundary Causes Processor
Hang).

However, the test code is using the last valid page before the guard page.
The bug is that the last byte before the guard page can't be read
because of the off-by-one error. The guard page is left in place.

This bug would normally not show up because the last page is
part of the process stack and never accessed via syscalls.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Brian Gerst <brgerst@gmail.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/1305210630-7136-1-git-send-email-jolsa@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-18 12:49:00 +02:00
Linus Torvalds 39dcfa552c Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, AMD: Fix ARAT feature setting again
  Revert "x86, AMD: Fix APIC timer erratum 400 affecting K8 Rev.A-E processors"
  x86, apic: Fix spurious error interrupts triggering on all non-boot APs
  x86, mce, AMD: Fix leaving freed data in a list
  x86: Fix UV BAU for non-consecutive nasids
  x86, UV: Fix NMI handler for UV platforms
2011-05-18 03:14:34 -07:00
Linus Torvalds 7f12b72bd8 Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf evlist: Fix per thread mmap setup
  perf tools: Honour the cpu list parameter when also monitoring a thread list
  kprobes, x86: Disable irqs during optimized callback
2011-05-18 03:13:46 -07:00
Richard Weinberger b2db21997f um: fix abort
os_dump_core() uses abort() to terminate UML in case of an fatal error.

glibc's abort() calls raise(SIGABRT) which makes use of tgkill().
tgkill() has no effect within UML's kernel threads because they are not
pthreads.  As fallback abort() executes an invalid instruction to
terminate the process.  Therefore UML gets killed by SIGSEGV and leaves a
ugly log entry in the host's kernel ring buffer.

To get rid of this we use our own abort routine.

Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-18 02:55:23 -07:00
Fenghua Yu de5397ad5b x86, cpu: Enable/disable Supervisor Mode Execution Protection
Enable/disable newly documented SMEP (Supervisor Mode Execution Protection) CPU
feature in kernel. CR4.SMEP (bit 20) is 0 at power-on. If the feature is
supported by CPU (X86_FEATURE_SMEP), enable SMEP by setting CR4.SMEP. New kernel
option nosmep disables the feature even if the feature is supported by CPU.

[ hpa: moved the call to setup_smep() until after the vendor-specific
  initialization; that ensures that CPUID features are unmasked.  We
  will still run it before we have userspace (never mind uncontrolled
  userspace). ]

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
LKML-Reference: <1305157865-31727-1-git-send-email-fenghua.yu@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-17 21:22:00 -07:00
Fenghua Yu dc23c0bccf x86, cpu: Add SMEP CPU feature in CR4
Add support for newly documented SMEP (Supervisor Mode Execution Protection)
CPU feature in CR4.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
LKML-Reference: <1305683069-25394-3-git-send-email-fenghua.yu@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-17 21:06:42 -07:00
Fenghua Yu d0281a257f x86, cpufeature: Add cpufeature flag for SMEP
Add support for newly documented SMEP (Supervisor Mode Execution Protection) CPU
feature flag.

SMEP prevents the CPU in kernel-mode to jump to an executable page
that has the user flag set in the PTE.  This prevents the kernel from
executing user-space code accidentally or maliciously, so it for
example prevents kernel exploits from jumping to specially prepared
user-mode shell code.

[ hpa: added better description by Ingo Molnar ]

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
LKML-Reference: <1305683069-25394-2-git-send-email-fenghua.yu@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-17 20:56:59 -07:00
Randy Dunlap 9bed4aca29 crypto: aesni-intel - fix aesni build on i386
Fix build error on i386 by moving function prototypes:

arch/x86/crypto/aesni-intel_glue.c: In function 'aesni_init':
arch/x86/crypto/aesni-intel_glue.c:1263: error: implicit declaration of function 'crypto_fpu_init'
arch/x86/crypto/aesni-intel_glue.c: In function 'aesni_exit':
arch/x86/crypto/aesni-intel_glue.c:1373: error: implicit declaration of function 'crypto_fpu_exit'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2011-05-18 09:03:34 +10:00
Fenghua Yu 2f19e06ac3 x86, mem: memset_64.S: Optimize memset by enhanced REP MOVSB/STOSB
Support memset() with enhanced rep stosb. On processors supporting enhanced
REP MOVSB/STOSB, the alternative memset_c_e function using enhanced rep stosb
overrides the fast string alternative memset_c and the original function.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1305671358-14478-10-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-17 15:40:31 -07:00
Fenghua Yu 057e05c1d6 x86, mem: memmove_64.S: Optimize memmove by enhanced REP MOVSB/STOSB
Support memmove() by enhanced rep movsb. On processors supporting enhanced
REP MOVSB/STOSB, the alternative memmove() function using enhanced rep movsb
overrides the original function.

The patch doesn't change the backward memmove case to use enhanced rep
movsb.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1305671358-14478-9-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-17 15:40:30 -07:00
Fenghua Yu 101068c1f4 x86, mem: memcpy_64.S: Optimize memcpy by enhanced REP MOVSB/STOSB
Support memcpy() with enhanced rep movsb. On processors supporting enhanced
rep movsb, the alternative memcpy() function using enhanced rep movsb overrides the original function and the fast string
function.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1305671358-14478-8-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-17 15:40:29 -07:00
Fenghua Yu 4307bec934 x86, mem: copy_user_64.S: Support copy_to/from_user by enhanced REP MOVSB/STOSB
Support copy_to_user/copy_from_user() by enhanced REP MOVSB/STOSB.
On processors supporting enhanced REP MOVSB/STOSB, the alternative
copy_user_enhanced_fast_string function using enhanced rep movsb overrides the
original function and the fast string function.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1305671358-14478-7-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-17 15:40:28 -07:00
Fenghua Yu e365c9df2f x86, mem: clear_page_64.S: Support clear_page() with enhanced REP MOVSB/STOSB
Intel processors are adding enhancements to REP MOVSB/STOSB and the use of
REP MOVSB/STOSB for optimal memcpy/memset or similar functions is recommended.
Enhancement availability is indicated by CPUID.7.0.EBX[9] (Enhanced REP MOVSB/
STOSB).

Support clear_page() with rep stosb for processor supporting enhanced REP MOVSB
/STOSB. On processors supporting enhanced REP MOVSB/STOSB, the alternative
clear_page_c_e function using enhanced REP STOSB overrides the original function
and the fast string function.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1305671358-14478-6-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-17 15:40:27 -07:00
Fenghua Yu 9072d11da1 x86, alternative: Add altinstruction_entry macro
Add altinstruction_entry macro to generate .altinstructions section
entries from assembly code.  This should be less failure-prone than
open-coding.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1305671358-14478-5-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-17 15:40:25 -07:00
Fenghua Yu 5097313363 x86, alternative, doc: Add comment for applying alternatives order
Some string operation functions may be patched twice, e.g. on enhanced REP MOVSB
/STOSB processors, memcpy is patched first by fast string alternative function,
then it is patched by enhanced REP MOVSB/STOSB alternative function.

Add comment for applying alternatives order to warn people who may change the
applying alternatives order for any reason.

[ Documentation-only patch ]

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1305671358-14478-4-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-17 15:40:25 -07:00
Fenghua Yu 161ec53c70 x86, mem, intel: Initialize Enhanced REP MOVSB/STOSB
If kernel intends to use enhanced REP MOVSB/STOSB, it must ensure
IA32_MISC_ENABLE.Fast_String_Enable (bit 0) is set and CPUID.(EAX=07H, ECX=0H):
EBX[bit 9] also reports 1.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1305671358-14478-3-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-17 15:40:23 -07:00
Fenghua Yu 724a92ee45 x86, cpufeature: Add CPU feature bit for enhanced REP MOVSB/STOSB
Intel processors are adding enhancements to REP MOVSB/STOSB and the use of
REP MOVSB/STOSB for optimal memcpy/memset or similar functions is recommended.
Enhancement availability is indicated by CPUID.7.0.EBX[9] (Enhanced REP MOVSB/
STOSB).

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1305671358-14478-2-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-17 14:56:36 -07:00
Rafael J. Wysocki c650da23d5 PM: Remove CONFIG_PM_VERBOSE
Now that we have CONFIG_DYNAMIC_DEBUG there is no need for yet
another flag causing dev_dbg() and pr_debug() statements in the
core PM code to produce output.  Moreover, CONFIG_PM_VERBOSE
causes so much output to be generated that it's not really useful
and almost no one sets it.

References: https://bugzilla.kernel.org/show_bug.cgi?id=23182
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-05-17 23:25:10 +02:00
Rafael J. Wysocki 290c748725 Merge branch 'power-domains' into for-linus
* power-domains:
  PM: Fix build issue in clock_ops.c for CONFIG_PM_RUNTIME unset
  PM: Revert "driver core: platform_bus: allow runtime override of dev_pm_ops"
  OMAP1 / PM: Use generic clock manipulation routines for runtime PM
  PM / Runtime: Generic clock manipulation rountines for runtime PM (v6)
  PM / Runtime: Add subsystem data field to struct dev_pm_info
  OMAP2+ / PM: move runtime PM implementation to use device power domains
  PM / Platform: Use generic runtime PM callbacks directly
  shmobile: Use power domains for platform runtime PM
  PM: Export platform bus type's default PM callbacks
  PM: Make power domain callbacks take precedence over subsystem ones
2011-05-17 23:23:46 +02:00
Rafael J. Wysocki 2d2a9163bd Merge branch 'syscore' into for-linus
* syscore:
  PM: Remove sysdev suspend, resume and shutdown operations
  PM / PowerPC: Use struct syscore_ops instead of sysdevs for PM
  PM / UNICORE32: Use struct syscore_ops instead of sysdevs for PM
  PM / AVR32: Use struct syscore_ops instead of sysdevs for PM
  PM / Blackfin: Use struct syscore_ops instead of sysdevs for PM
  ARM / Samsung: Use struct syscore_ops for "core" power management
  ARM / PXA: Use struct syscore_ops for "core" power management
  ARM / SA1100: Use struct syscore_ops for "core" power management
  ARM / Integrator: Use struct syscore_ops for core PM
  ARM / OMAP: Use struct syscore_ops for "core" power management
  ARM: Use struct syscore_ops instead of sysdevs for PM in common code
2011-05-17 23:23:40 +02:00
Amerigo Wang c3b0795c98 PM / ACPI: Remove acpi_sleep=s4_nonvs
acpi_sleep=s4_nonvs is superseded by acpi_sleep=nonvs, so remove it.

Signed-off-by: WANG Cong <amwang@redhat.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Len Brown <lenb@kernel.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-05-17 23:19:18 +02:00
Fenghua Yu 2494b030ba x86, cpufeature: Fix cpuid leaf 7 feature detection
CPUID leaf 7, subleaf 0 returns the maximum subleaf in EAX, not the
number of subleaves.  Since so far only subleaf 0 is defined (and only
the EBX bitfield) we do not need to qualify the test.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1305660806-17519-1-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@kernel.org> 2.6.36..39
2011-05-17 13:36:29 -07:00
Borislav Petkov 14fb57dccb x86, AMD: Fix ARAT feature setting again
Trying to enable the local APIC timer on early K8 revisions
uncovers a number of other issues with it, in conjunction with
the C1E enter path on AMD. Fixing those causes much more churn
and troubles than the benefit of using that timer brings so
don't enable it on K8 at all, falling back to the original
functionality the kernel had wrt to that.

Reported-and-bisected-by: Nick Bowler <nbowler@elliptictech.com>
Cc: Boris Ostrovsky <Boris.Ostrovsky@amd.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Hans Rosenfeld <hans.rosenfeld@amd.com>
Cc: Nick Bowler <nbowler@elliptictech.com>
Cc: Joerg-Volker-Peetz <jvpeetz@web.de>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Link: http://lkml.kernel.org/r/1305636919-31165-3-git-send-email-bp@amd64.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-17 15:28:34 +02:00
Borislav Petkov 328935e634 Revert "x86, AMD: Fix APIC timer erratum 400 affecting K8 Rev.A-E processors"
This reverts commit e20a2d205c, as it crashes
certain boxes with specific AMD CPU models.

Moving the lower endpoint of the Erratum 400 check to accomodate
earlier K8 revisions (A-E) opens a can of worms which is simply
not worth to fix properly by tweaking the errata checking
framework:

* missing IntPenging MSR on revisions < CG cause #GP:

http://marc.info/?l=linux-kernel&m=130541471818831

* makes earlier revisions use the LAPIC timer instead of the C1E
idle routine which switches to HPET, thus not waking up in
deeper C-states:

http://lkml.org/lkml/2011/4/24/20

Therefore, leave the original boundary starting with K8-revF.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-17 15:28:33 +02:00
Ingo Molnar 86b9523ab1 Merge branch 'gart/rename' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into core/iommu 2011-05-17 14:39:00 +02:00
David Rientjes dc382fd5bc x86, mm: Allow ZONE_DMA to be configurable
ZONE_DMA is unnecessary for a large number of machines that do not
require less than 32-bit DMA addressing, e.g. ISA legacy DMA or PCI
cards with a restricted DMA address mask.

This patch allows users to disable ZONE_DMA for x86 if they know they
will not be using such devices with their kernel.

This prevents the VM from unnecessarily reserving a ratio of memory
(defaulting to 1/256th of system capacity) with lowmem_reserve_ratio
for such allocations when it will never be used.

Signed-off-by: David Rientjes <rientjes@google.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.00.1105161353560.4353@chino.kir.corp.google.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-16 14:03:28 -07:00
Kristoffer Glembo d81f087f1f sparc32,leon: Remove unnecessary page_address calls in LEON DMA API.
The function mmu_inval_dma_area takes a virtual address as a parameter
which is problematic in case the buffer is located in highmem and the
mapping currently is unavailable.

Since the function was only implemented for LEON this patch removes
calls to it in non LEON code paths and renames it to dma_make_coherent
which instead takes a physical address (which for now is unused since we
flush the whole cache). This way it is possible to remove several unnecessary
calls to page_address which will fail if the virtual mapping is unavailable.

Signed-off-by: Kristoffer Glembo <kristoffer@gaisler.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-16 13:41:40 -07:00
KOSAKI Motohiro fb1fece5da sparc: convert old cpumask API into new one
Adapt new API. Almost change is trivial, most important change are to
remove following like =operator.

 cpumask_t cpu_mask = *mm_cpumask(mm);
 cpus_allowed = current->cpus_allowed;

Because cpumask_var_t is =operator unsafe. These usage might prevent
kernel core improvement.

No functional change.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-16 13:38:07 -07:00
Ondrej Zary 865be7a810 x86, cpu: Fix detection of Celeron Covington stepping A1 and B0
Steppings A1 and B0 of Celeron Covington are currently misdetected as
Pentium II (Dixon). Fix it by removing the stepping check.

[ hpa: this fixes this specific bug... the CPUID documentation
  specifies that the L2 cache size can disambiguate additional CPUs;
  this patch does not fix that. ]

Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Link: http://lkml.kernel.org/r/201105162138.15416.linux@rainbow-software.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-16 13:24:21 -07:00
Daniel Hellstrom 55dd23eca6 sparc32, sun4d: Implemented SMP IPIs support for SUN4D machines
The sun4d does not seem to have a distingstion between soft and hard
IRQs. When generating IPIs the generated IRQ looks like a hard IRQ,
this patch adds a "IPI check" in the sun4d irq trap handler at a
predefined IRQ number (SUN4D_IPI_IRQ). Before generating an IPI
a per-cpu memory structure is modified for the "IPI check" to
successfully detect a IPI request to a specific processor, the check
clears the IPI work requested.

All three IPIs (resched, single and cpu-mask) use the same IRQ
number.

The IPI IRQ should preferrably be on a separate IRQ and definitly
not shared with IRQ handlers requesting IRQ with IRQF_SHARED.

Signed-off-by: Daniel Hellstrom <daniel@gaisler.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-16 13:07:44 -07:00
Daniel Hellstrom ecbc42b70a sparc32, sun4m: Implemented SMP IPIs support for SUN4M machines
Implement the three IPIs (resched, single and cpu-mask) generation
and interrupt handler catch. The sun4m has 15 soft-IRQs and three
of them is used with this patch, the three IPIs was previously
implemented with the cross-call IRQ15 which does not work with
locking routines such as spinlocks because IRQ15 is NMI, it may
cause deadlock.

The IRQ trap handler code assumes (in the same spritit as the old
it seems) that hard interrupts will be generated until handled
(level), when a IRQ happens the IRQ pending register is checked
for pending soft-IRQs. When both hard and soft IRQ happens at the
same time only soft-IRQs are handled.

The old code implemented a soft-IRQ traphandler at IRQ14 which
called smp_reschedule_irq which in turn called set_need_resched.
It seems to be an old relic and is replaced with the interrupt
traphander exit code RESTORE_ALL, it calls schedule() when
appropriate.

Signed-off-by: Daniel Hellstrom <daniel@gaisler.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-16 13:07:44 -07:00
Daniel Hellstrom 1ca0c808c6 sparc32,leon: Implemented SMP IPIs for LEON CPU
This patch implements SMP IPIs on LEON using software generated
IRQs to signal between CPUs.

The IPI IRQ number is set by using the ipi_num property in the
device tree, or defaults to 13. LEON SMP systems should reserve
IRQ 13 (and IRQ 15) to Linux in order for the defaults to work.

Signed-off-by: Daniel Hellstrom <daniel@gaisler.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-16 13:07:43 -07:00
Daniel Hellstrom d6d048192b sparc32: implement SMP IPIs using the generic functions
The current sparc32 SMP IPI generation is implemented the
cross call function. The cross call function uses IRQ15 the
NMI, this is has the effect that IPIs will interrupt IRQ
critical areas and hang the system. Typically on/after
spin_lock_irqsave calls can be aborted.

The cross call functionality must still exist to flush
cache/TLBS.

This patch provides CPU models a custom way to implement
generation of IPIs on the generic code's request. The
typical approach is to generate an IRQ for each IPI case.

After this patch each sparc32 SMP CPU model needs to
implement IPIs in order to function properly.

Signed-off-by: Daniel Hellstrom <daniel@gaisler.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-16 13:07:43 -07:00
Daniel Hellstrom 2645e7219e sparc32,leon: SMP power down implementation
Signed-off-by: Daniel Hellstrom <daniel@gaisler.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-16 13:03:28 -07:00
Daniel Hellstrom 5149bed891 sparc32,leon: added some SMP comments
Signed-off-by: Daniel Hellstrom <daniel@gaisler.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-16 13:03:28 -07:00
Martin Schwidefsky f296388682 ftrace/s390: mcount offset calculation
Do the mcount offset adjustment in the recordmcount.pl/recordmcount.[ch]
at compile time and not in ftrace_call_adjust at run time.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-05-16 15:05:06 -04:00
Martin Schwidefsky 521ccb5c4a ftrace/x86: mcount offset calculation
Do the mcount offset adjustment in the recordmcount.pl/recordmcount.[ch]
at compile time and not in ftrace_call_adjust at run time.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-05-16 14:55:57 -04:00
Steven Rostedt 2895cd2ab8 ftrace/x86: Do not trace .discard.text section
The section called .discard.text has tracing attached to it and is
currently ignored by ftrace. But it does include a call to the mcount
stub. Adding a notrace to the code keeps gcc from adding the useless
mcount caller to it.

Link: http://lkml.kernel.org/r/20110421023739.243651696@goodmis.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-05-16 14:47:13 -04:00
Frank Arnold 42be450565 x86, AMD, cacheinfo: Fix L3 cache index disable checks
We provide two slots to disable cache indices, and have a check to
prevent both slots to be used for the same index.

If the user disables the same index on different subcaches, both slots
will hold the same index, e.g.

  $ echo 2047 > /sys/devices/system/cpu/cpu0/cache/index3/cache_disable_0
  $ cat /sys/devices/system/cpu/cpu0/cache/index3/cache_disable_0
  2047
  $ echo 1050623 > /sys/devices/system/cpu/cpu0/cache/index3/cache_disable_1
  $ cat /sys/devices/system/cpu/cpu0/cache/index3/cache_disable_1
  2047

due to the fact that the check was looking only at index bits [11:0]
and was ignoring writes to bits outside that range. The more correct
fix is to simply check whether the index is within the bounds of
[0..l3->indices].

While at it, cleanup comments and drop now-unused local macros.

Signed-off-by: Frank Arnold <frank.arnold@amd.com>
Link: http://lkml.kernel.org/r/1305553188-21061-3-git-send-email-bp@amd64.org
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-16 11:24:27 -07:00
Borislav Petkov 50e7534427 x86, AMD, cacheinfo: Fix fallout caused by max3 conversion
732eacc054 converted code around the
kernel using nested max() macros to use the new max3 macro but forgot to
remove the old line in intel_cacheinfo.c. Fix it.

Cc: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: Frank Arnold <farnold@amd64.org>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Link: http://lkml.kernel.org/r/1305553188-21061-2-git-send-email-bp@amd64.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-16 11:24:23 -07:00
Rafael J. Wysocki 600b776eb3 OMAP1 / PM: Use generic clock manipulation routines for runtime PM
Convert OMAP1 to using the new generic clock manipulation routines
and a device power domain for runtime PM instead of overriding the
platform bus type's runtime PM callbacks.  This allows us to simplify
OMAP1-specific code and to share some code with other platforms
(shmobile in particular).

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Kevin Hilman <khilman@ti.com>
2011-05-16 20:15:36 +02:00
Konrad Rzeszutek Wilk 7c1bfd685b xen/pci: Fix compiler error when CONFIG_XEN_PRIVILEGED_GUEST is not set.
If we have CONFIG_XEN and the other parameters to build an
Linux kernel that is non-privileged, the xen_[find|register|unregister]_
device_domain_owner functions should not be compiled. They should
use the nops defined in arch/x86/include/asm/xen/pci.h instead.

This fixes:

arch/x86/pci/xen.c:496: error: redefinition of ‘xen_find_device_domain_owner’
arch/x86/include/asm/xen/pci.h:25: note: previous definition of ‘xen_find_device_domain_owner’ was here
arch/x86/pci/xen.c:510: error: redefinition of ‘xen_register_device_domain_owner’
arch/x86/include/asm/xen/pci.h:29: note: previous definition of ‘xen_register_device_domain_owner’ was here
arch/x86/pci/xen.c:532: error: redefinition of ‘xen_unregister_device_domain_owner’
arch/x86/include/asm/xen/pci.h:34: note: previous definition of ‘xen_unregister_device_domain_owner’ was here

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
2011-05-16 13:47:30 -04:00
Linus Torvalds df8d06ade6 Merge branch 'omap-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6
* 'omap-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6:
  OMAP3: set the core dpll clk rate in its set_rate function
  omap: iommu: Return IRQ_HANDLED in fault handler when no fault occured
2011-05-16 08:55:49 -07:00
Youquan Song e503f9e4b0 x86, apic: Fix spurious error interrupts triggering on all non-boot APs
This patch fixes a bug reported by a customer, who found
that many unreasonable error interrupts reported on all
non-boot CPUs (APs) during the system boot stage.

According to Chapter 10 of Intel Software Developer Manual
Volume 3A, Local APIC may signal an illegal vector error when
an LVT entry is set as an illegal vector value (0~15) under
FIXED delivery mode (bits 8-11 is 0), regardless of whether
the mask bit is set or an interrupt actually happen. These
errors are seen as error interrupts.

The initial value of thermal LVT entries on all APs always reads
0x10000 because APs are woken up by BSP issuing INIT-SIPI-SIPI
sequence to them and LVT registers are reset to 0s except for
the mask bits which are set to 1s when APs receive INIT IPI.

When the BIOS takes over the thermal throttling interrupt,
the LVT thermal deliver mode should be SMI and it is required
from the kernel to keep AP's LVT thermal monitoring register
programmed as such as well.

This issue happens when BIOS does not take over thermal throttling
interrupt, AP's LVT thermal monitor register will be restored to
0x10000 which means vector 0 and fixed deliver mode, so all APs will
signal illegal vector error interrupts.

This patch check if interrupt delivery mode is not fixed mode before
restoring AP's LVT thermal monitor register.

Signed-off-by: Youquan Song <youquan.song@intel.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Yong Wang <yong.y.wang@intel.com>
Cc: hpa@linux.intel.com
Cc: joe@perches.com
Cc: jbaron@redhat.com
Cc: trenn@suse.de
Cc: kent.liu@intel.com
Cc: chaohong.guo@intel.com
Cc: <stable@kernel.org> # As far back as possible
Link: http://lkml.kernel.org/r/1303402963-17738-1-git-send-email-youquan.song@intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-16 13:48:25 +02:00
Andy Lutomirski b23b645165 crypto: aesni-intel - Merge with fpu.ko
Loading fpu without aesni-intel does nothing.  Loading aesni-intel
without fpu causes modes like xts to fail.  (Unloading
aesni-intel will restore those modes.)

One solution would be to make aesni-intel depend on fpu, but it
seems cleaner to just combine the modules.

This is probably responsible for bugs like:
https://bugzilla.redhat.com/show_bug.cgi?id=589390

Signed-off-by: Andy Lutomirski <luto@mit.edu>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2011-05-16 15:12:47 +10:00
Thomas Gleixner a18f22a968 Merge branch 'consolidate-clksrc-i8253' of master.kernel.org:~rmk/linux-2.6-arm into timers/clocksource
Conflicts:
	arch/ia64/kernel/cyclone.c
	arch/mips/kernel/i8253.c
	arch/x86/kernel/i8253.c

Reason: Resolve conflicts so further cleanups do not conflict further

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-05-14 12:06:36 +02:00
Russell King 798778b865 clocksource: convert mips to generic i8253 clocksource
Convert MIPS i8253 clocksource code to use generic i8253 clocksource.

Acked-by: John Stultz <john.stultz@linaro.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-05-14 10:29:48 +01:00
Russell King 82491451dd clocksource: convert x86 to generic i8253 clocksource
Convert x86 i8253 clocksource code to use generic i8253 clocksource.

Acked-by: John Stultz <john.stultz@linaro.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-05-14 10:29:48 +01:00
Russell King 8c414ff3f4 clocksource: convert footbridge to generic i8253 clocksource
Convert the footbridge isa-timer code to use generic i8253 clocksource.

Acked-by: John Stultz <john.stultz@linaro.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-05-14 10:29:48 +01:00
Michael Cree 90b57f3516 alpha: Wire up syscalls new to 2.6.39
Wire up the syscalls:
   name_to_handle_at
   open_by_handle_at
   clock_adjtime
   syncfs
and adjust some whitespace in the neighbourhood to align commments.

Signed-off-by: Michael Cree <mcree@orcon.net.nz>
Signed-off-by: Matt Turner <mattst88@gmail.com>
2011-05-13 19:16:11 -04:00
John Stultz f550806a7f alpha: convert to clocksource_register_hz
Converts alpha to use clocksource_register_hz.

Signed-off-by: John Stultz <johnstul@us.ibm.com>
CC: Richard Henderson <rth@twiddle.net>
CC: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
CC: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Matt Turner <mattst88@gmail.com>
2011-05-13 19:16:10 -04:00
Greg Kroah-Hartman 82a3242e11 sysfs: remove "last sysfs file:" line from the oops messages
On some arches (x86, sh, arm, unicore, powerpc) the oops message would
print out the last sysfs file accessed.

This was very useful in finding a number of sysfs and driver core bugs
in the 2.5 and early 2.6 development days, but it has been a number of
years since this file has actually helped in debugging anything that
couldn't also be trivially determined from the stack traceback.

So it's time to delete the line.  This is good as we need all the space
we can get for oops messages at times on consoles.

Acked-by: Phil Carmody <ext-phil.2.carmody@nokia.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-13 16:05:51 -07:00
Julia Lawall d9a5ac9ef3 x86, mce, AMD: Fix leaving freed data in a list
b may be added to a list, but is not removed before being freed
in the case of an error.  This is done in the corresponding
deallocation function, so the code here has been changed to
follow that.

The sematic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression E,E1,E2;
identifier l;
@@

*list_add(&E->l,E1);
... when != E1
    when != list_del(&E->l)
    when != list_del_init(&E->l)
    when != E = E2
*kfree(E);// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/1305294731-12127-1-git-send-email-julia@diku.dk
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-13 17:11:02 +02:00
Avinash H.M 5fd2a84ab3 OMAP3: set the core dpll clk rate in its set_rate function
The debug l3_ick/rate is not displaying the actual rate of the clock in
hardware. This is because, the core dpll set_rate function doesn't update the
clk.rate. After fixing, the l3_ick/rate is displaying proper values.

Signed-off-by: Shweta Gulati <shweta.gulati@ti.com>
Signed-off-by: Avinash.H.M <avinashhm@ti.com>
Cc: Rajendra Nayak <rnayak@ti.com>
Cc: Paul Wamsley <paul@pwsan.com>
Acked-by: Paul Walmsley <paul@pwsan.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
2011-05-13 07:08:18 -07:00
Cliff Wickman 77ed23f8d9 x86: Fix UV BAU for non-consecutive nasids
This is a fix for the SGI Altix-UV Broadcast Assist Unit code,
which is used for TLB flushing.

Certain hardware configurations (that customers are ordering)
cause nasids (numa address space id's) to be non-consecutive.
Specifically, once you have more than 4 blades in a IRU
(Individual Rack Unit - or 1/2 rack) but less than the maximum
of 16, the nasid numbering becomes non-consecutive.  This
currently results in a 'catastrophic error' (CATERR) detected by
the firmware during OS boot.  The BAU is generating an 'INTD'
request that is targeting a non-existent nasid value. Such
configurations may also occur when a blade is configured off
because of hardware errors. (There is one UV hub per blade.)

This patch is required to support such configurations.

The problem with the tlb_uv.c code is that is using the
consecutive hub numbers as indices to the BAU distribution bit
map. These are simply the ordinal position of the hub or blade
within its partition.  It should be using physical node numbers
(pnodes), which correspond to the physical nasid values. Use of
the hub number only works as long as the nasids in the partition
are consecutive and increase with a stride of 1.

This patch changes the index to be the pnode number, thus
allowing nasids to be non-consecutive.
It also provides a table in local memory for each cpu to
translate target cpu number to target pnode and nasid.
And it improves naming to properly reflect 'node' and 'uvhub'
versus 'nasid'.

Signed-off-by: Cliff Wickman <cpw@sgi.com>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/E1QJmxX-0002Mz-Fk@eag09.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-12 23:45:42 +02:00
Daniel Kiper ae15a3b4d1 arch/x86/xen/setup: Cleanup code/data sections definitions
Cleanup code/data sections definitions
accordingly to include/linux/init.h.

Signed-off-by: Daniel Kiper <dkiper@net-space.pl>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12 17:19:34 -04:00
Daniel Kiper ad3062a0f4 arch/x86/xen/enlighten: Cleanup code/data sections definitions
Cleanup code/data sections definitions
accordingly to include/linux/init.h.

Signed-off-by: Daniel Kiper <dkiper@net-space.pl>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12 17:19:34 -04:00
Daniel Kiper 251511a18d arch/x86/xen/irq: Cleanup code/data sections definitions
Cleanup code/data sections definitions
accordingly to include/linux/init.h.

Signed-off-by: Daniel Kiper <dkiper@net-space.pl>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12 17:19:33 -04:00
Jan Andersson 21dccddf45 sparc: add {read,write}*_be routines
This patch adds {read,write}*_be big endian memory access
routines to the io.h header used on SPARC32 and SPARC64.

Tested on SPARC32 (LEON)

Signed-off-by: Jan Andersson <jan@gaisler.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-12 13:44:29 -07:00
Linus Torvalds 0c5e1577f1 Merge branch 'stable/bug-fixes-for-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/bug-fixes-for-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  x86/mm: Fix section mismatch derived from native_pagetable_reserve()
  x86,xen: introduce x86_init.mapping.pagetable_reserve
  Revert "xen/mmu: Add workaround "x86-64, mm: Put early page table high""
2011-05-12 12:21:51 -07:00
Konrad Rzeszutek Wilk 8c5950881c xen/p2m: Create entries in the P2M_MFN trees's to track 1-1 mappings
.. when applicable. We need to track in the p2m_mfn and
p2m_mfn_p the MFNs and pointers, respectivly, for the P2M entries
that are allocated for the identity mappings. Without this,
a PV domain with an E820 that triggers the 1-1 mapping to kick in,
won't be able to be restored as the P2M won't have the identity
mappings.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12 14:38:53 -04:00
Daniel Kiper 0f16d0dfcd xen/setup: Fix for incorrect xen_extra_mem_start initialization under 32-bit
git commit 24bdb0b62c (xen: do not create
the extra e820 region at an addr lower than 4G) does not take into
account that ifdef CONFIG_X86_32 instead of e820_end_of_low_ram_pfn()
find_low_pfn_range() is called (both calls are from arch/x86/kernel/setup.c).
find_low_pfn_range() behaves correctly and does not require change in
xen_extra_mem_start initialization. Additionally, if xen_extra_mem_start
is initialized in the same way as ifdef CONFIG_X86_64 then memory hotplug
support for Xen balloon driver (under development) is broken.

Signed-off-by: Daniel Kiper <dkiper@net-space.pl>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12 14:37:06 -04:00
Konrad Rzeszutek Wilk 15bfc09451 xen/setup: Ignore E820_UNUSABLE when setting 1-1 mappings.
When we parse the raw E820, the Xen hypervisor can set "E820_RAM"
to "E820_UNUSABLE" if the mem=X argument is used. As such we
should _not_ consider the E820_UNUSABLE as an 1-1 identity
mapping, but instead use the same case as for E820_RAM.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12 14:32:13 -04:00
Tian, Kevin 7899891c7d xen mmu: fix a race window causing leave_mm BUG()
There's a race window in xen_drop_mm_ref, where remote cpu may exit
dirty bitmap between the check on this cpu and the point where remote
cpu handles drop request. So in drop_other_mm_ref we need check
whether TLB state is still lazy before calling into leave_mm. This
bug is rarely observed in earlier kernel, but exaggerated by the
commit 831d52bc15
("x86, mm: avoid possible bogus tlb entries by clearing prev mm_cpumask after switching mm")
which clears bitmap after changing the TLB state. the call trace is as below:

---------------------------------
kernel BUG at arch/x86/mm/tlb.c:61!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/devices/system/xen_memory/xen_memory0/info/current_kb
CPU 1
Modules linked in: 8021q garp xen_netback xen_blkback blktap blkback_pagemap nbd bridge stp llc autofs4 ipmi_devintf ipmi_si ipmi_msghandler lockd sunrpc bonding ipv6 xenfs dm_multipath video output sbs sbshc parport_pc lp parport ses enclosure snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device serio_raw bnx2 snd_pcm_oss snd_mixer_oss snd_pcm snd_timer iTCO_wdt snd soundcore snd_page_alloc i2c_i801 iTCO_vendor_support i2c_core pcs pkr pata_acpi ata_generic ata_piix shpchp mptsas mptscsih mptbase [last unloaded: freq_table]
Pid: 25581, comm: khelper Not tainted 2.6.32.36fixxen #1 Tecal RH2285
RIP: e030:[<ffffffff8103a3cb>]  [<ffffffff8103a3cb>] leave_mm+0x15/0x46
RSP: e02b:ffff88002805be48  EFLAGS: 00010046
RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffff88015f8e2da0
RDX: ffff88002805be78 RSI: 0000000000000000 RDI: 0000000000000001
RBP: ffff88002805be48 R08: ffff88009d662000 R09: dead000000200200
R10: dead000000100100 R11: ffffffff814472b2 R12: ffff88009bfc1880
R13: ffff880028063020 R14: 00000000000004f6 R15: 0000000000000000
FS:  00007f62362d66e0(0000) GS:ffff880028058000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000003aabc11909 CR3: 000000009b8ca000 CR4: 0000000000002660
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 00000000000000 00
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process khelper (pid: 25581, threadinfo ffff88007691e000, task ffff88009b92db40)
Stack:
 ffff88002805be68 ffffffff8100e4ae 0000000000000001 ffff88009d733b88
<0> ffff88002805be98 ffffffff81087224 ffff88002805be78 ffff88002805be78
<0> ffff88015f808360 00000000000004f6 ffff88002805bea8 ffffffff81010108
Call Trace:
 <IRQ>
 [<ffffffff8100e4ae>] drop_other_mm_ref+0x2a/0x53
 [<ffffffff81087224>] generic_smp_call_function_single_interrupt+0xd8/0xfc
 [<ffffffff81010108>] xen_call_function_single_interrupt+0x13/0x28
 [<ffffffff810a936a>] handle_IRQ_event+0x66/0x120
 [<ffffffff810aac5b>] handle_percpu_irq+0x41/0x6e
 [<ffffffff8128c1c0>] __xen_evtchn_do_upcall+0x1ab/0x27d
 [<ffffffff8128dd11>] xen_evtchn_do_upcall+0x33/0x46
 [<ffffffff81013efe>] xen_do_hyper visor_callback+0x1e/0x30
 <EOI>
 [<ffffffff814472b2>] ? _spin_unlock_irqrestore+0x15/0x17
 [<ffffffff8100f8cf>] ? xen_restore_fl_direct_end+0x0/0x1
 [<ffffffff81113f71>] ? flush_old_exec+0x3ac/0x500
 [<ffffffff81150dc5>] ? load_elf_binary+0x0/0x17ef
 [<ffffffff81150dc5>] ? load_elf_binary+0x0/0x17ef
 [<ffffffff8115115d>] ? load_elf_binary+0x398/0x17ef
 [<ffffffff81042fcf>] ? need_resched+0x23/0x2d
 [<ffffffff811f4648>] ? process_measurement+0xc0/0xd7
 [<ffffffff81150dc5>] ? load_elf_binary+0x0/0x17ef
 [<ffffffff81113094>] ? search_binary_handler+0xc8/0x255
 [<ffffffff81114362>] ? do_execve+0x1c3/0x29e
 [<ffffffff8101155d>] ? sys_execve+0x43/0x5d
 [<ffffffff8106fc45>] ? __call_usermodehelper+0x0/0x6f
 [<ffffffff81013e28>] ? kernel_execve+0x68/0xd0
 [<ffffffff 8106fc45>] ? __call_usermodehelper+0x0/0x6f
 [<ffffffff8100f8cf>] ? xen_restore_fl_direct_end+0x0/0x1
 [<ffffffff8106fb64>] ? ____call_usermodehelper+0x113/0x11e
 [<ffffffff81013daa>] ? child_rip+0xa/0x20
 [<ffffffff8106fc45>] ? __call_usermodehelper+0x0/0x6f
 [<ffffffff81012f91>] ? int_ret_from_sys_call+0x7/0x1b
 [<ffffffff8101371d>] ? retint_restore_args+0x5/0x6
 [<ffffffff81013da0>] ? child_rip+0x0/0x20
Code: 41 5e 41 5f c9 c3 55 48 89 e5 0f 1f 44 00 00 e8 17 ff ff ff c9 c3 55 48 89 e5 0f 1f 44 00 00 65 8b 04 25 c8 55 01 00 ff c8 75 04 <0f> 0b eb fe 65 48 8b 34 25 c0 55 01 00 48 81 c6 b8 02 00 00 e8
RIP  [<ffffffff8103a3cb>] leave_mm+0x15/0x46
 RSP <ffff88002805be48>
---[ end trace ce9cee6832a9c503 ]---

Tested-by: Maoxiaoyun<tinnycloud@hotmail.com>
Signed-off-by: Kevin Tian <kevin.tian@intel.com>
[v1: Fleshed out the git description a bit]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12 14:27:43 -04:00
Sedat Dilek 53f8023feb x86/mm: Fix section mismatch derived from native_pagetable_reserve()
With CONFIG_DEBUG_SECTION_MISMATCH=y I see these warnings in next-20110415:

  LD      vmlinux.o
  MODPOST vmlinux.o
WARNING: vmlinux.o(.text+0x1ba48): Section mismatch in reference from the function native_pagetable_reserve() to the function .init.text:memblock_x86_reserve_range()
The function native_pagetable_reserve() references
the function __init memblock_x86_reserve_range().
This is often because native_pagetable_reserve lacks a __init
annotation or the annotation of memblock_x86_reserve_range is wrong.

This patch fixes the issue.
Thanks to pipacs from PaX project for help on IRC.

Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12 13:05:05 -04:00
Stefano Stabellini 279b706bf8 x86,xen: introduce x86_init.mapping.pagetable_reserve
Introduce a new x86_init hook called pagetable_reserve that at the end
of init_memory_mapping is used to reserve a range of memory addresses for
the kernel pagetable pages we used and free the other ones.

On native it just calls memblock_x86_reserve_range while on xen it also
takes care of setting the spare memory previously allocated
for kernel pagetable pages from RO to RW, so that it can be used for
other purposes.

A detailed explanation of the reason why this hook is needed follows.

As a consequence of the commit:

commit 4b239f458c
Author: Yinghai Lu <yinghai@kernel.org>
Date:   Fri Dec 17 16:58:28 2010 -0800

    x86-64, mm: Put early page table high

at some point init_memory_mapping is going to reach the pagetable pages
area and map those pages too (mapping them as normal memory that falls
in the range of addresses passed to init_memory_mapping as argument).
Some of those pages are already pagetable pages (they are in the range
pgt_buf_start-pgt_buf_end) therefore they are going to be mapped RO and
everything is fine.
Some of these pages are not pagetable pages yet (they fall in the range
pgt_buf_end-pgt_buf_top; for example the page at pgt_buf_end) so they
are going to be mapped RW.  When these pages become pagetable pages and
are hooked into the pagetable, xen will find that the guest has already
a RW mapping of them somewhere and fail the operation.
The reason Xen requires pagetables to be RO is that the hypervisor needs
to verify that the pagetables are valid before using them. The validation
operations are called "pinning" (more details in arch/x86/xen/mmu.c).

In order to fix the issue we mark all the pages in the entire range
pgt_buf_start-pgt_buf_top as RO, however when the pagetable allocation
is completed only the range pgt_buf_start-pgt_buf_end is reserved by
init_memory_mapping. Hence the kernel is going to crash as soon as one
of the pages in the range pgt_buf_end-pgt_buf_top is reused (b/c those
ranges are RO).

For this reason we need a hook to reserve the kernel pagetable pages we
used and free the other ones so that they can be reused for other
purposes.
On native it just means calling memblock_x86_reserve_range, on Xen it
also means marking RW the pagetable pages that we allocated before but
that haven't been used before.

Another way to fix this is without using the hook is by adding a 'if
(xen_pv_domain)' in the 'init_memory_mapping' code and calling the Xen
counterpart, but that is just nasty.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12 13:05:04 -04:00
Konrad Rzeszutek Wilk 92bdaef7b2 Revert "xen/mmu: Add workaround "x86-64, mm: Put early page table high""
This reverts commit a38647837a.

It does not work with certain AMD machines.

last_pfn = 0x100000 max_arch_pfn = 0x400000000
initial memory mapped : 0 - 02c3a000
Base memory trampoline at [ffff88000009b000] 9b000 size 20480
init_memory_mapping: 0000000000000000-0000000100000000
 0000000000 - 0100000000 page 4k
kernel direct mapping tables up to 100000000 @ ff7fb000-100000000
init_memory_mapping: 0000000100000000-00000001e0800000
 0100000000 - 01e0800000 page 4k
kernel direct mapping tables up to 1e0800000 @ 1df0f3000-1e0000000
xen: setting RW the range fffdc000 - 100000000
RAMDISK: 0203b000 - 02c3a000
No NUMA configuration found
Faking a node at 0000000000000000-00000001e0800000
NUMA: Using 63 for the hash shift.
Initmem setup node 0 0000000000000000-00000001e0800000
  NODE_DATA [00000001dfffb000 - 00000001dfffffff]
BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<ffffffff81cf6a75>] setup_node_bootmem+0x18a/0x1ea
PGD 0
Oops: 0003 [#1] SMP
last sysfs file:
CPU 0
Modules linked in:

Pid: 0, comm: swapper Not tainted 2.6.39-0-virtual #6~smb1
RIP: e030:[<ffffffff81cf6a75>]  [<ffffffff81cf6a75>] setup_node_bootmem+0x18a/0x1ea
RSP: e02b:ffffffff81c01e38  EFLAGS: 00010046
RAX: 0000000000000000 RBX: 00000001e0800000 RCX: 0000000000001040
RDX: 0000000000004100 RSI: 0000000000000000 RDI: ffff8801dfffb000
RBP: ffffffff81c01e58 R08: 0000000000000020 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000bfe400
FS:  0000000000000000(0000) GS:ffffffff81cca000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000001c03000 CR4: 0000000000000660
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffffffff81c00000, task ffffffff81c0b020)
Stack:
 0000000000000040 0000000000000001 0000000000000000 ffffffffffffffff
 ffffffff81c01e88 ffffffff81cf6c25 0000000000000000 0000000000000000
 ffffffff81cf687f 0000000000000000 ffffffff81c01ea8 ffffffff81cf6e45
Call Trace:
 [<ffffffff81cf6c25>] numa_register_memblks.constprop.3+0x150/0x181
 [<ffffffff81cf687f>] ? numa_add_memblk+0x7c/0x7c
 [<ffffffff81cf6e45>] numa_init.part.2+0x1c/0x7c
 [<ffffffff81cf687f>] ? numa_add_memblk+0x7c/0x7c
 [<ffffffff81cf6f67>] numa_init+0x6c/0x70
 [<ffffffff81cf7057>] initmem_init+0x39/0x3b
 [<ffffffff81ce5865>] setup_arch+0x64e/0x769
 [<ffffffff815e43c1>] ? printk+0x51/0x53
 [<ffffffff81cdf92b>] start_kernel+0xd4/0x3f3
 [<ffffffff81cdf388>] x86_64_start_reservations+0x132/0x136
 [<ffffffff81ce2ed4>] xen_start_kernel+0x588/0x58f
Code: 41 00 00 48 8b 3c c5 a0 24 cc 81 31 c0 40 f6 c7 01 74 05 aa 66 ba ff 40 40 f6 c7 02 74 05 66 ab 83 ea 02 89 d1 c1 e9 02 f6 c2 02 <f3> ab 74 02 66 ab 80 e2 01 74 01 aa 49 63 c4 48 c1 eb 0c 44 89
RIP  [<ffffffff81cf6a75>] setup_node_bootmem+0x18a/0x1ea
 RSP <ffffffff81c01e38>
CR2: 0000000000000000
---[ end trace a7919e7f17c0a725 ]---
Kernel panic - not syncing: Attempted to kill the idle task!
Pid: 0, comm: swapper Tainted: G      D     2.6.39-0-virtual #6~smb1

Reported-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12 13:04:29 -04:00
Linus Torvalds 8043f4eb85 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
  sparc32: Fixed unaligned memory copying in function __csum_partial_copy_sparc_generic
  sparc32: fix sparcstation 5 boot
  sparc32: fix section mismatch warnings in apc, pmc and time_32
2011-05-12 07:53:34 -07:00
Linus Torvalds 75c0b3b466 Merge branch 'fixes' of master.kernel.org:/home/rmk/linux-2.6-arm
* 'fixes' of master.kernel.org:/home/rmk/linux-2.6-arm:
  ARM: 6870/1: The mandatory barrier rmb() must be a dsb() in for device accesses
  ARM: 6892/1: handle ptrace requests to change PC during interrupted system calls
  ARM: 6890/1: memmap: only free allocated memmap entries when using SPARSEMEM
  ARM: zImage: the page table memory must be considered before relocation
  ARM: zImage: make sure not to relocate on top of the relocation code
  ARM: zImage: Fix bad SP address after relocating kernel
  ARM: zImage: make sure the stack is 64-bit aligned
  ARM: RiscPC: acornfb: fix section mismatches
  ARM: RiscPC: etherh: fix section mismatches
2011-05-12 07:53:06 -07:00
Richard Weinberger 449a66fd1f x86: Remove warning and warning_symbol from struct stacktrace_ops
Both warning and warning_symbol are nowhere used.
Let's get rid of them.

Signed-off-by: Richard Weinberger <richard@nod.at>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Soeren Sandmann Pedersen <ssp@redhat.com>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: x86 <x86@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Link: http://lkml.kernel.org/r/1305205872-10321-2-git-send-email-richard@nod.at
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2011-05-12 15:31:28 +02:00
Catalin Marinas a904f5f9eb ARM: 6870/1: The mandatory barrier rmb() must be a dsb() in for device accesses
Since mandatory barriers may be used (explicitly or implicitly via readl
etc.) to ensure the ordering between Device and Normal memory accesses,
a DMB is not enough. This patch converts it to a DSB.

Cc: Colin Cross <ccross@android.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-05-12 10:52:00 +01:00
Arnd Bergmann 2af68df02f ARM: 6892/1: handle ptrace requests to change PC during interrupted system calls
GDB's interrupt.exp test cases currenly fail on ARM.  The problem is how do_signal
handled restarting interrupted system calls:

The entry.S assembler code determines that we come from a system call; and that
information is passed as "syscall" parameter to do_signal.  That routine then
calls get_signal_to_deliver [*] and if a signal is to be delivered, calls into
handle_signal.  If a system call is to be restarted either after the signal
handler returns, or if no handler is to be called in the first place, the PC
is updated after the get_signal_to_deliver call, either in handle_signal (if
we have a handler) or at the end of do_signal (otherwise).

Now the problem is that during [*], the call to get_signal_to_deliver, a ptrace
intercept may happen.  During this intercept, the debugger may change registers,
including the PC.  This is done by GDB if it wants to execute an "inferior call",
i.e. the execution of some code in the debugged program triggered by GDB.

To this purpose, GDB will save all registers, allocate a stack frame, set up
PC and arguments as appropriate for the call, and point the link register to
a dummy breakpoint instruction.  Once the process is restarted, it will execute
the call and then trap back to the debugger, at which point GDB will restore
all registers and continue original execution.

This generally works fine.  However, now consider what happens when GDB attempts
to do exactly that while the process was interrupted during execution of a to-be-
restarted system call:  do_signal is called with the syscall flag set; it calls
get_signal_to_deliver, at which point the debugger takes over and changes the PC
to point to a completely different place.  Now get_signal_to_deliver returns
without a signal to deliver; but now do_signal decides it should be restarting
a system call, and decrements the PC by 2 or 4 -- so it now points to 2 or 4
bytes before the function GDB wants to call -- which leads to a subsequent crash.

To fix this problem, two things need to be supported:
- do_signal must be able to recognize that get_signal_to_deliver changed the PC
  to a different location, and skip the restart-syscall sequence
- once the debugger has restored all registers at the end of the inferior call
  sequence, do_signal must recognize that *now* it needs to restart the pending
  system call, even though it was now entered from a breakpoint instead of an
  actual svc instruction

This set of issues is solved on other platforms, usually by one of two
mechanisms:

- The status information "do_signal is handling a system call that may need
  restarting" is itself carried in some register that can be accessed via
  ptrace.  This is e.g. on Intel the "orig_eax" register; on Sparc the kernel
  defines a magic extra bit in the flags register for this purpose.
  This allows GDB to manage that state: reset it when doing an inferior call,
  and restore it after the call is finished.

- On s390, do_signal transparently handles this problem without requiring
  GDB interaction, by performing system call restarting in the following
  way: first, adjust the PC as necessary for restarting the call.  Then,
  call get_signal_to_deliver; and finally just continue execution at the
  PC.  This way, if GDB does not change the PC, everything is as before.
  If GDB *does* change the PC, execution will simply continue there --
  and once GDB restores the PC it saved at that point, it will automatically
  point to the *restarted* system call.  (There is the minor twist how to
  handle system calls that do *not* need restarting -- do_signal will undo
  the PC change in this case, after get_signal_to_deliver has returned, and
  only if ptrace did not change the PC during that call.)

Because there does not appear to be any obvious register to carry the
syscall-restart information on ARM, we'd either have to introduce a new
artificial ptrace register just for that purpose, or else handle the issue
transparently like on s390.  The patch below implements the second option;
using this patch makes the interrupt.exp test cases pass on ARM, with no
regression in the GDB test suite otherwise.

Cc: patches@linaro.org
Signed-off-by: Ulrich Weigand <ulrich.weigand@linaro.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-05-12 10:52:00 +01:00
Will Deacon 9af386c8dc ARM: 6890/1: memmap: only free allocated memmap entries when using SPARSEMEM
The SPARSEMEM code allocates memmap entries only for sections which are
present (i.e. those which contain some valid memory). The membank checks
in free_unused_memmap do not take this into account and can incorrectly
attempt to free memory which is not allocated, resulting in a BUG() in
the bootmem code.

However, if memory is configured as follows:

    |<----section---->|<----hole---->|<----section---->|
    +--------+--------+--------------+--------+--------+
    | bank 0 | unused |              | bank 1 | unused |
    +--------+--------+--------------+--------+--------+

where a bank only occupies part of a section, the memmap allocated for
the remainder of the section *can* be freed.

This patch modifies the checks in free_unused_memmap so that only valid
memmap entries are considered for removal.

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-05-12 10:52:00 +01:00
Ingo Molnar 9cb5baba5e Merge commit 'v2.6.39-rc7' into sched/core 2011-05-12 09:36:18 +02:00
Tkhai Kirill b1054282d7 sparc32: Fixed unaligned memory copying in function __csum_partial_copy_sparc_generic
When we are in the label cc_dword_align, registers %o0 and %o1 have the same last 2 bits,
but it's not guaranteed one of them is zero. So we can get unaligned memory access
in label ccte. Example of parameters which lead to this:
%o0=0x7ff183e9, %o1=0x8e709e7d, %g1=3

With the parameters I had a memory corruption, when the additional 5 bytes were rewritten.
This patch corrects the error.

One comment to the patch. We don't care about the third bit in %o1, because cc_end_cruft
stores word or less.

Signed-off-by: Tkhai Kirill <tkhai@yandex.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-11 21:35:04 -07:00
Linus Torvalds 409ab140e2 Merge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6:
  [S390] fix alloc_pgste check in init_new_context
  [S390] oprofile: fix min/max interval query checks
  [S390] replace diag10() with diag10_range() function
  [S390] disassembler: handle b280/spp instruction
  [S390] kernel: Initialize register 14 when starting new CPU
  [S390] dasd: prevent IO error during reserve/release loop
  [S390] sclp/memory hotplug: fix initial usecount of increments
2011-05-11 18:59:45 -07:00
Rafael J. Wysocki 2e711c04db PM: Remove sysdev suspend, resume and shutdown operations
Since suspend, resume and shutdown operations in struct sysdev_class
and struct sysdev_driver are not used any more, remove them.  Also
drop sysdev_suspend(), sysdev_resume() and sysdev_shutdown() used
for executing those operations and modify all of their users
accordingly.  This reduces kernel code size quite a bit and reduces
its complexity.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-11 21:37:15 +02:00
Rafael J. Wysocki f5a592f7d7 PM / PowerPC: Use struct syscore_ops instead of sysdevs for PM
Make some PowerPC architecture's code use struct syscore_ops
objects for power management instead of sysdev classes and sysdevs.

This simplifies the code and reduces the kernel's memory footprint.
It also is necessary for removing sysdevs from the kernel entirely in
the future.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-11 21:37:15 +02:00
Rafael J. Wysocki f98bf4aa39 PM / UNICORE32: Use struct syscore_ops instead of sysdevs for PM
Make some UNICORE32 architecture's code use struct syscore_ops
objects for power management instead of sysdev classes and sysdevs.

This simplifies the code and reduces the kernel's memory footprint.
It also is necessary for removing sysdevs from the kernel entirely in
the future.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: Guan Xuetao <gxt@mprc.pku.edu.cn>
2011-05-11 21:37:15 +02:00
Rafael J. Wysocki f25f4f522a PM / AVR32: Use struct syscore_ops instead of sysdevs for PM
Convert some AVR32 architecture's code to using struct syscore_ops
objects for power management instead of sysdev classes and sysdevs.

This simplifies the code and reduces the kernel's memory footprint.
It also is necessary for removing sysdevs from the kernel entirely in
the future.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
2011-05-11 21:37:14 +02:00
Laurent Pinchart c56b2ddd5f omap: iommu: Return IRQ_HANDLED in fault handler when no fault occured
Commit d594f1f31a (omap: IOMMU: add
support to callback during fault handling) broke interrupt line sharing
between the OMAP3 ISP and its IOMMU. Because of this, every interrupt
generated by the OMAP3 ISP is handled by the IOMMU driver instead of
being passed to the OMAP3 ISP driver.

Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Acked-by: Hiroshi DOYU <Hiroshi.DOYU@nokia.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
2011-05-11 10:47:50 -07:00
Jiri Olsa 9bbeacf52f kprobes, x86: Disable irqs during optimized callback
Disable irqs during optimized callback, so we dont miss any in-irq kprobes.

The following commands:

 # cd /debug/tracing/
 # echo "p mutex_unlock" >> kprobe_events
 # echo "p _raw_spin_lock" >> kprobe_events
 # echo "p smp_apic_timer_interrupt" >> ./kprobe_events
 # echo 1 > events/enable

Cause the optimized kprobes to be missed. None is missed
with the fix applied.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Link: http://lkml.kernel.org/r/20110511110613.GB2390@jolsa.brq.redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-11 13:21:23 +02:00
Ingo Molnar 2c14ddc135 Merge branch 'iommu/2.6.40' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into core/iommu 2011-05-10 22:15:57 +02:00
Manuel Lauss 780914c3cf MIPS: Alchemy: fix xxs1500 build error
This fixes:
alchemy/xxs1500/init.c: In function 'prom_init':
alchemy/xxs1500/init.c:57:17: error: ignoring return value of 'kstrtoul', declared with attribute warn_unused_result

Signed-off-by: Manuel Lauss <manuel.lauss@googlemail.com>
Cc: Linux-MIPS <linux-mips@linux-mips.org>
Patchwork: https://patchwork.linux-mips.org/patch/2340/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:26 +01:00
David Daney 310f130339 MIPS: Invalidate old TLB mappings when updating huge page PTEs.
Without this, stale Icache or TLB entries may be used.

Signed-off-by: David Daney <ddaney@caviumnetworks.com>
To: linux-mips@linux-mips.org
https://patchwork.linux-mips.org/patch/2318/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:26 +01:00
Wu Zhangjin f850548ef8 MIPS: Hibernation: Fixes for PAGE_SIZE >= 64kb
PAGE_SIZE >= 64kb (1 << 16) is too big to be the immediate of the
addiu/daddiu instruction, so, use addu/daddu instruction instead.

The following compiling error is fixed:

AS      arch/mips/power/hibernate.o
arch/mips/power/hibernate.S: Assembler messages:
arch/mips/power/hibernate.S:38: Error: expression out of range
make[2]: *** [arch/mips/power/hibernate.o] Error 1
make[1]: *** [arch/mips/power] Error 2

Reported-by: Roman Mamedov <rm@romanrm.ru>
Signed-off-by: Wu Zhangjin <wuzhangjin@gmail.com>
To: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/2313/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:26 +01:00
Lars-Peter Clausen 1e2bbde4af MIPS: JZ4740: Set one-shot feature flag for the clockevent
The code for supporting one-shot mode for the clockevent is already there,
only the feature flag was not set.  Setting the one-shot flag allows the
kernel to run in tickless mode.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/2261/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:26 +01:00
Ralf Baechle aa7ce1c303 MIPS: JZ4740: Export symbols to the watchdog driver module
MODPOST 356 modules
ERROR: "jz4740_timer_disable_watchdog" [drivers/watchdog/jz4740_wdt.ko] undefine
d!
ERROR: "jz4740_timer_enable_watchdog" [drivers/watchdog/jz4740_wdt.ko] undefined
!
make[1]: *** [__modpost] Error 1

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:26 +01:00
Ralf Baechle f1b6a5054c MIPS: JZ4740: Fix GCC 4.6.0 build error.
CC      arch/mips/jz4740/dma.o
arch/mips/jz4740/dma.c: In function 'jz4740_dma_chan_irq':
arch/mips/jz4740/dma.c:245:11: error: variable 'status' set but not used [-Werro
r=unused-but-set-variable]

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:26 +01:00
Ralf Baechle b20bff02b2 MIPS: Audit: Fix success success argument pass to audit_syscall_exit
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:25 +01:00
Ralf Baechle 893d20fbae MIPS: Fix calc_vmlinuz_load_addr build warnings.
HOSTCC  arch/mips/boot/compressed/calc_vmlinuz_load_addr
arch/mips/boot/compressed/calc_vmlinuz_load_addr.c: In function 'main':
arch/mips/boot/compressed/calc_vmlinuz_load_addr.c:35:2: warning: format '%llx' expects type 'long long unsigned int *', but argument 3 has type 'uint64_t *'
arch/mips/boot/compressed/calc_vmlinuz_load_addr.c:54:2: warning: format '%llx' expects type 'long long unsigned int', but argument 2 has type 'uint64_t'

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:25 +01:00
Ralf Baechle 403fbdff96 MIPS: Alchemy: Fix GCC 4.6.0 build error.
CC      arch/mips/alchemy/devboards/db1x00/board_setup.o
arch/mips/alchemy/devboards/db1x00/board_setup.c: In function 'board_setup':
arch/mips/alchemy/devboards/db1x00/board_setup.c:130:6: error: variable 'pin_func' set but not used [-Werror=unused-but-set-variable]

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:25 +01:00
Ralf Baechle 8bdd51429d MIPS: Document former use of timerfd(2) syscall number.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:25 +01:00
Ralf Baechle e12f47ef16 MIPS: IP27: Fix GCC 4.6.0 build error.
CC      arch/mips/sgi-ip27/ip27-hubio.o
arch/mips/sgi-ip27/ip27-hubio.c: In function 'hub_pio_map':
arch/mips/sgi-ip27/ip27-hubio.c:32:20: error: variable 'junk' set but not used [-Werror=unused-but-set-variable]
cc1: all warnings being treated as errors

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:25 +01:00
Ralf Baechle a6ab5ca394 MIPS: IP27: Fix GCC 4.6.0 build error.
CC      arch/mips/sgi-ip27/ip27-hubio.o
arch/mips/sgi-ip27/ip27-hubio.c: In function 'hub_pio_map':
arch/mips/sgi-ip27/ip27-hubio.c:32:20: error: variable 'junk' set but not used [-Werror=unused-but-set-variable]
cc1: all warnings being treated as errors

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:25 +01:00
Jonas Gorski 7da34c1dac MIPS: bcm63xx: Fix header_crc comment in bcm963xx_tag.h
The CRC32 actually includes the tag_version.

Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/2275/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:24 +01:00
David Daney 23a271ecdf MIPS: Octeon: Guard the Kconfig body with CPU_CAVIUM_OCTEON
Instead of making each Octeon specific option depend on
CPU_CAVIUM_OCTEON, gate the body of the entire file with
CPU_CAVIUM_OCTEON.  With this change, CAVIUM_OCTEON_SPECIFIC_OPTIONS
becomes useless, so get rid of it as well.

Signed-off-by: David Daney <ddaney@caviumnetworks.com>
To: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/2091/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:24 +01:00
David Daney e3fb3f27a7 MIPS: Octeon: Cleanup Kconfig IRQ_CPU* symbols.
Octeon doesn't use IRQ_CPU, so don't select it.

IRQ_CPU_OCTEON is a completely unused symbol, remove it completely.

Signed-off-by: David Daney <ddaney@caviumnetworks.com>
To: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/2086/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:24 +01:00
Catalin Marinas f8bec75acd MIPS: Rename .data..mostly and properly handle it in linker script
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:24 +01:00
Ralf Baechle 866d7f5622 MIPS: MSP: Fix build error
Reported and original patch by Yoichi Yuasa <yuasa@linux-mips.org>.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:24 +01:00
Yoichi Yuasa 088a42acc4 MIPS: MSP71xx: Fix typo in msp_per_irq_controller
CC      arch/mips/pmc-sierra/msp71xx/msp_irq_per.o
arch/mips/pmc-sierra/msp71xx/msp_irq_per.c:101:2: error: expected identifier before '.' token
make[2]: *** [arch/mips/pmc-sierra/msp71xx/msp_irq_per.o] Error 1

Signed-off-by: Yoichi Yuasa <yuasa@linux-mips.org>
Patchwork: https://patchwork.linux-mips.org/patch/2246/
Cc: linux-mips <linux-mips@linux-mips.org>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:23 +01:00
Ralf Baechle c87444af6f MIPS: Loongson: Fix GCC 2.6.0 build error.
CC      arch/mips/loongson/common/env.o
arch/mips/loongson/common/env.c: In function 'prom_init_env':
arch/mips/loongson/common/env.c:50:12: error: variable 'ret' set but not used [-Werror=unused-but-set-variable]
arch/mips/loongson/common/env.c:51:12: error: variable 'ret' set but not used [-Werror=unused-but-set-variable]
arch/mips/loongson/common/env.c:52:12: error: variable 'ret' set but not used [-Werror=unused-but-set-variable]
arch/mips/loongson/common/env.c:53:12: error: variable 'ret' set but not used [-Werror=unused-but-set-variable]
cc1: all warnings being treated as errors

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:23 +01:00
Ralf Baechle 84d3b0dbac MIPS: Jazz: Fix GCC 4.6.0 build error
CC      arch/mips/jazz/jazzdma.o
arch/mips/jazz/jazzdma.c: In function 'vdma_remap':
arch/mips/jazz/jazzdma.c:214:20: error: variable 'npages' set but not used [-Werror=unused-but-set-variable]
cc1: all warnings being treated as errors

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:23 +01:00
Ralf Baechle 11b9d0eca5 MIPS: SNI: Fix GCC 4.6.0 build error
CC      arch/mips/sni/time.o
arch/mips/sni/time.c: In function 'dosample':
arch/mips/sni/time.c:98:19: error: variable 'lsb' set but not used [-Werror=unused-but-set-variable]
cc1: all warnings being treated as errors

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:23 +01:00
Ralf Baechle 6be63bbbda MIPS: Malta: Fix GCC 4.6.0 build error
CC      arch/mips/mti-malta/malta-int.o
arch/mips/mti-malta/malta-int.c: In function 'mips_pcibios_iack':
arch/mips/mti-malta/malta-int.c:59:6: error: variable 'dummy' set but not used [-Werror=unused-but-set-variable]
cc1: all warnings being treated as errors

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:23 +01:00
Ralf Baechle af3a1f6f48 MIPS: Malta: Fix GCC 4.6.0 build error
CC      arch/mips/mti-malta/malta-init.o
arch/mips/mti-malta/malta-init.c: In function 'prom_init':
arch/mips/mti-malta/malta-init.c:196:6: error: variable 'result' set but not used [-Werror=unused-but-set-variable]
cc1: all warnings being treated as errors

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:23 +01:00
Ralf Baechle 3be1afc8f6 MIPS: IP22: Fix GCC 4.6.0 build error
CC      arch/mips/sgi-ip22/ip22-platform.o
arch/mips/sgi-ip22/ip22-platform.c: In function 'sgiseeq_devinit':
arch/mips/sgi-ip22/ip22-platform.c:135:15: error: variable 'tmp' set but not used [-Werror=unused-but-set-variable]
cc1: all warnings being treated as errors

While at it rename the variable to pbdma for readability; there is a
local variable tmp of different type being used in two nested blocks.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:22 +01:00
Ralf Baechle 6fd78fc1fa MIPS: IP22: Fix GCC 4.6.0 build error
CC      arch/mips/sgi-ip22/ip22-time.o
arch/mips/sgi-ip22/ip22-time.c: In function 'dosample':
arch/mips/sgi-ip22/ip22-time.c:35:10: error: variable 'lsb' set but not used [-Werror=unused-but-set-variable]
cc1: all warnings being treated as errors

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:22 +01:00
Ralf Baechle 4a9040f451 MIPS: tlbex: Fix GCC 4.6.0 build error
CC      arch/mips/mm/tlbex.o
arch/mips/mm/tlbex.c: In function 'build_r4000_tlb_refill_handler':
arch/mips/mm/tlbex.c:1155:22: error: variable 'vmalloc_mode' set but not used [-Werror=unused-but-set-variable]
arch/mips/mm/tlbex.c:1154:28: error: variable 'htlb_info' set but not used [-Werror=unused-but-set-variable]
cc1: all warnings being treated as errors

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:22 +01:00
Ralf Baechle 71271aab8c MIPS: c-r4k: Fix GCC 4.6.0 build error
CC      arch/mips/mm/c-r4k.o
arch/mips/mm/c-r4k.c: In function 'probe_scache':
arch/mips/mm/c-r4k.c:1078:6: error: variable 'tmp' set but not used [-Werror=unused-but-set-variable]
cc1: all warnings being treated as errors

Older GCC versions didn't warn about the unused variable tmp because it was
getting initialized.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:22 +01:00
David Daney c54794d19e MIPS: Mask jump target in ftrace_dyn_arch_init_insns().
The current code is abusing the uasm interface by passing jump target
addresses with high bits set.  Mask the addresses to avoid annoying
messages at boot time.

Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Wu Zhangjin <wuzhangjin@gmail.com>
Patchwork: https://patchwork.linux-mips.org/patch/1922/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2011-05-10 18:15:22 +01:00
Joerg Roedel fffcda1183 x86, gart: Rename pci-gart_64.c to amd_gart_64.c
This file only contains code relevant for the northbridge
gart in AMD processors. This patch renames the file to
represent this fact in the filename.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2011-05-10 17:22:06 +02:00
Martin Schwidefsky badb8bb983 [S390] fix alloc_pgste check in init_new_context
Processes started with kernel_execve from a kernel thread will have
current->mm==NULL. Reading current->mm->context.alloc_pgste will
read a more or less random bit from lowcore in this case. If the
bit turns out to be set the whole process tree started this way
will allocate page table extensions although they have no need
for it.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-05-10 17:13:43 +02:00
Martin Schwidefsky 3d8dcb3c76 [S390] oprofile: fix min/max interval query checks
oprofile_min_interval and oprofile_max_interval are unsigned, checking
for negative values doesn't work. Change hwsampler_query_min_interval
and hwsampler_query_max_interval to return an unsigned long and
check for a zero value instead.

Reported-by: Nicolas Kaiser <nikai@nikai.net>
Acked-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-05-10 17:13:43 +02:00
Michael Holzheu 83ace2701b [S390] replace diag10() with diag10_range() function
Currently the diag10() function can only release one page. For exploiters
that have to call diag10 on a contiguous memory region this is suboptimal.
This patch replaces the diag10() function with diag10_range() that is
able to release multiple pages. In addition to that the new function now
allows to release memory with addresses higher than 2047 MiB. This was
due to a restriction of the diagnose implementation under z/VM prior to
release 5.2.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-05-10 17:13:43 +02:00
Christian Borntraeger 91d378088b [S390] disassembler: handle b280/spp instruction
arch/s390/kvm/sie64a.S uses the b280 instruction. Tell the builtin
disassembler to handle that code.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-05-10 17:13:42 +02:00
Michael Holzheu 8eb4bd666f [S390] kernel: Initialize register 14 when starting new CPU
When starting a new CPU we currently jump to start_secondary() without
setting register 14 (the return address) correctly. Therefore on the stack
frame for start_secondary an invalid return address is stored. This leads
to wrong stack back traces in kernel dumps.

Example:

 #00 [1f33fe48] cpu_idle at 10614a
 #01 [1f33fe90] start_secondary at 54fa88
 #02 [1f33feb8] (null) at 0                 <--- invalid

To fix this start_secondary() is called now with basr/brasl that sets
register 14 correctly. The output of the stack backtrace looks then
like the following:

 #00 [1f33fe48] cpu_idle at 10614a
 #01 [1f33fe90] start_secondary at 54fa88
 #02 [1f33feb8] restart_base at 54f41e      <--- correct

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-05-10 17:13:42 +02:00
Ingo Molnar 932fed4e2e Merge commit 'v2.6.39-rc7' into perf/core
Merge reason: pull in the latest fixes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-10 17:05:45 +02:00
Joerg Roedel 72fe00f01f x86/amd-iommu: Use threaded interupt handler
Move the interupt handling for the iommu into the interupt
thread to reduce latencies and prepare interupt handling for
pri handling.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2011-05-10 11:07:58 +02:00
Joerg Roedel 604c307bf4 Merge branches 'dma-debug/next', 'amd-iommu/command-cleanups', 'amd-iommu/ats' and 'amd-iommu/extended-features' into iommu/2.6.40
Conflicts:
	arch/x86/include/asm/amd_iommu_types.h
	arch/x86/kernel/amd_iommu.c
	arch/x86/kernel/amd_iommu_init.c
2011-05-10 10:25:23 +02:00
Joe Perches e969687595 arch/x86/kernel/pci-iommu_table.c: Convert sprintf_symbol to %pS
Coalesce format as well.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2011-05-10 10:21:35 +02:00
Jack Steiner 1d44e8288a x86, UV: Fix NMI handler for UV platforms
This fixes problems seen on UV systems handling NMIs from the
node controller.

I isolated the "dazed..." messages that I saw earlier to a bug in
the BMC on our platform. It was sending NMIs w/o properly setting
a register that indicated the source of NMI.

So rather than _assuming_ any unhandled NMI came from the UV system
maintenance console (SMC), add a check to verify that the SMC actually
sent the NMI.

Signed-off-by: Jack Steiner <steiner@sgi.com>
Cc: gorcunov@gmail.com
Cc: dzickus@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-10 09:26:55 +02:00
Matthew Garrett 935a638241 x86, efi: Ensure that the entirity of a region is mapped
It's possible for init_memory_mapping() to fail to map the entire region
if it crosses a boundary, so ensure that we complete the mapping.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Link: http://lkml.kernel.org/r/1304623186-18261-5-git-send-email-mjg@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-09 12:14:45 -07:00
Matthew Garrett 7cb00b7287 x86, efi: Pass a minimal map to SetVirtualAddressMap()
Experimentation with various EFI implementations has shown that functions
outside runtime services will still update their pointers if
SetVirtualAddressMap() is called with memory descriptors outside the
runtime area. This is obviously insane, and therefore is unsurprising.
Evidence from instrumenting another EFI implementation suggests that it
only passes the set of descriptors covering runtime regions, so let's
avoid any problems by doing the same. Runtime descriptors are copied to
a separate memory map, and only that map is passed back to the firmware.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Link: http://lkml.kernel.org/r/1304623186-18261-4-git-send-email-mjg@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-09 12:14:39 -07:00
Matthew Garrett 202f9d0a41 x86, efi: Merge contiguous memory regions of the same type and attribute
Some firmware implementations assume that physically contiguous regions
will be contiguous in virtual address space. This assumption is, obviously,
entirely unjustifiable. Said firmware implementations lack the good grace
to handle their failings in a measured and reasonable manner, instead
tending to shit all over address space and oopsing the kernel.

In an ideal universe these firmware implementations would simultaneously
catch fire and cease to be a problem, but since some of them are present
in attractively thin and shiny metal devices vanity wins out and some
poor developer spends an extended period of time surrounded by a
growing array of empty bottles until the underlying reason becomes
apparent. Said developer presents this patch, which simply merges
adjacent regions if they happen to be contiguous and have the same EFI
memory type and caching attributes.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Link: http://lkml.kernel.org/r/1304623186-18261-3-git-send-email-mjg@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-09 12:14:34 -07:00
Matthew Garrett 9cd2b07c19 x86, efi: Consolidate EFI nx control
The core EFI code and 64-bit EFI code currently have independent
implementations of code for setting memory regions as executable or not.
Let's consolidate them.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Link: http://lkml.kernel.org/r/1304623186-18261-2-git-send-email-mjg@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-09 12:14:29 -07:00
Matthew Garrett 2b5e8ef35b x86, efi: Remove virtual-mode SetVirtualAddressMap call
The spec says that SetVirtualAddressMap doesn't work once you're in
virtual mode, so there's no point in having infrastructure for calling
it from there.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Link: http://lkml.kernel.org/r/1304623186-18261-1-git-send-email-mjg@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-09 12:14:25 -07:00
Luis R. Rodriguez 8fab6af215 x86: Fix mrst sparse complaints
Fix these Sparse complaints:

  CHECK   arch/x86/platform/mrst/mrst.c
  arch/x86/platform/mrst/mrst.c:197:13: warning: symbol 'mrst_time_init' was not declared. Should it be static?
  arch/x86/platform/mrst/mrst.c:219:16: warning: symbol 'mrst_arch_setup' was not declared. Should it be static?

Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Roman Gezikov <roman.gezikov@atheros.com>
Cc: Joonas Viskari <joonas.viskari@atheros.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Allen Kao <allen.kao@atheros.com>
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Link: http://lkml.kernel.org/r/1304719209-26913-1-git-send-email-lrodriguez@atheros.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-07 10:52:30 +02:00
Ingo Molnar 4cb1f43ce8 Merge commit 'v2.6.39-rc6' into x86/cleanups
Merge reason: move to a (much) newer upstream base.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-07 10:51:48 +02:00
Russell King 362607df9f Merge first four commits of 'zImage_fixes' of git://git.linaro.org/people/nico/linux into fixes 2011-05-07 08:34:40 +01:00
Nicolas Pitre ea9df3b168 ARM: zImage: the page table memory must be considered before relocation
For correctness, the initial page table located right before the
decompressed kernel should be considered when determining if relocation
is required.

Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Tested-by: Shawn Guo <shawn.guo@linaro.org>
Acked-by: Tony Lindgren <tony@atomide.com>
2011-05-07 00:07:58 -04:00
Nicolas Pitre adcc25915b ARM: zImage: make sure not to relocate on top of the relocation code
If the zImage load address is slightly below the relocation address,
there is a risk for the copied data to overwrite the copy loop or
cache flush code that the relocation process requires.  Always
bump the relocation address by the size of that code to avoid this
issue.

Noticed by Tony Lindgren <tony@atomide.com>.

While at it, let's start the copy from the restart symbol which makes
the above code size computation possible by the assembler directly
(same sections), given that we don't need to preserve the code before
that point anyway. And therefore we don't need to carry the _start
pointer in r5 anymore.

Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Tested-by: Tony Lindgren <tony@atomide.com>
2011-05-07 00:07:53 -04:00
Tony Lindgren 7c2527f0c4 ARM: zImage: Fix bad SP address after relocating kernel
Otherwise cache_clean_flush can overwrite some of the relocated
area depending on where the kernel image gets loaded. This fixes
booting on n900 after commit 6d7d0ae515
(ARM: 6750/1: improvements to compressed/head.S).

Thanks to Aaro Koskinen <aaro.koskinen@nokia.com> for debugging
the address of the relocated area that gets corrupted, and to
Nicolas Pitre <nicolas.pitre@linaro.org> for the other uncompress
related fixes.

Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-05-06 23:56:43 -04:00
Nicolas Pitre 3bd2cbb955 ARM: zImage: make sure the stack is 64-bit aligned
With ARMv5+ and EABI, the compiler expects a 64-bit aligned stack so
instructions like STRD and LDRD can be used.  Without this, mysterious
boot failures were seen semi randomly with the LZMA decompressor.

While at it, let's align .bss as well.

Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Tested-by: Shawn Guo <shawn.guo@linaro.org>
Acked-by: Tony Lindgren <tony@atomide.com>
CC: stable@kernel.org
2011-05-06 23:55:49 -04:00
Ingo Molnar 57d524154f Merge branch 'perf/stat' into perf/core
Merge reason: the perf stat improvements are tested and ready now.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-06 21:07:38 +02:00
Rob Landley 6151658751 Correct occurrences of
- Documentation/kvm/ to Documentation/virtual/kvm
- Documentation/uml/ to Documentation/virtual/uml
- Documentation/lguest/ to Documentation/virtual/lguest
throughout the kernel source tree.

Signed-off-by: Rob Landley <rob@landley.net>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
2011-05-06 09:27:55 -07:00
Peter Zijlstra 63b6a6758e perf events, x86: Fix Intel Nehalem and Westmere last level cache event definitions
The Intel Nehalem offcore bits implemented in:

  e994d7d23a0b: perf: Fix LLC-* events on Intel Nehalem/Westmere

... are wrong: they implemented _ACCESS as _HIT and counted OTHER_CORE_HIT* as
MISS even though its clearly documented as an L3 hit ...

Fix them and the Westmere definitions as well.

Cc: Andi Kleen <ak@linux.intel.com>
Cc: Lin Ming <ming.m.lin@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1299119690-13991-3-git-send-email-ming.m.lin@intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-06 11:24:48 +02:00
Frederic Weisbecker 925f83c085 hw_breakpoints, powerpc: Fix CONFIG_HAVE_HW_BREAKPOINT off-case in ptrace_set_debugreg()
We make use of ptrace_get_breakpoints() / ptrace_put_breakpoints() to
protect ptrace_set_debugreg() even if CONFIG_HAVE_HW_BREAKPOINT if off.
However in this case, these APIs are not implemented.

To fix this, push the protection down inside the relevant ifdef.
Best would be to export the code inside
CONFIG_HAVE_HW_BREAKPOINT into a standalone function to cleanup
the ifdefury there and call the breakpoint ref API inside. But
as it is more invasive, this should be rather made in an -rc1.

Fixes this build error:

  arch/powerpc/kernel/ptrace.c:1594: error: implicit declaration of function 'ptrace_get_breakpoints' make[2]: ***

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: LPPC <linuxppc-dev@lists.ozlabs.org>
Cc: Prasad <prasad@linux.vnet.ibm.com>
Cc: v2.6.33.. <stable@kernel.org>
Link: http://lkml.kernel.org/r/1304639598-4707-1-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-06 11:24:46 +02:00
Lin Ming e04d1b23f9 perf events, x86: Add SandyBridge stalled-cycles-frontend/backend events
Extend the Intel SandyBridge PMU driver with definitions
for generic front-end and back-end stall events.

( As commit 3011203 "perf events, x86: Add Westmere stalled-cycles-frontend/backend
  events" says, these are only approximations. )

Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1304666042-17577-1-git-send-email-ming.m.lin@intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-06 09:37:03 +02:00
Ingo Molnar 4d70230bb4 Merge branch 'master' of ssh://master.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 into perf/urgent 2011-05-06 08:11:28 +02:00
Linus Torvalds bfd412db9e Merge branch 'for-linus' of git://github.com/at91linux/linux-2.6-at91
* 'for-linus' of git://github.com/at91linux/linux-2.6-at91:
  at91: Add ARCH_ID and basic cpu macros definition for 5series chips family.
  arm: at91: fix compiler warning for eb01 board build
  arm: at91: minimal defconfig for at91x40 SoC
  ARM: at91: AT91CAP9 has a macb device
2011-05-05 21:27:57 -07:00
Jack Miller a0496d450a powerpc: Add early debug for WSP platforms
Signed-off-by: Jack Miller <jack@codezen.org>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-05-06 13:32:41 +10:00
David Gibson a1d0d98daf powerpc: Add WSP platform
Add a platform for the Wire Speed Processor, based on the PPC A2.

This includes code for the ICS & OPB interrupt controllers, as well
as a SCOM backend, and SCOM based cpu bringup.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Jack Miller <jack@codezen.org>
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-05-06 13:32:35 +10:00
Richard A Lary 82578e192b powerpc/eeh: Display eeh error location for bus and device
For adapters which have devices under a PCIe switch/bridge it is informative
  to display information for both the PCIe switch/bridge and the device on
  which the bus error was detected.

  rebased to powerpc-next

Signed-off-by: Richard A Lary <rlary@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-05-06 13:32:31 +10:00
Benjamin Herrenschmidt 40bd587a88 powerpc: Rename slb0_limit() to safe_stack_limit() and add Book3E support
slb0_limit() wasn't a very descriptive name. This changes it along with
a comment explaining what it's used for, and provides a 64-bit BookE
implementation.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-05-06 13:32:24 +10:00
Tseng-Hui (Frank) Lin 77eafe101a powerpc/pseries: Add support for IO event interrupts
This patch adds support for handling IO Event interrupts which come
through at the /event-sources/ibm,io-events device tree node.

The interrupts come through ibm,io-events device tree node are generated
by the firmware to report IO events. The firmware uses the same interrupt
to report multiple types of events for multiple devices. Each device may
have its own event handler. This patch implements a plateform interrupt
handler that is triggered by the IO event interrupts come through
ibm,io-events device tree node, pull in the IO events from RTAS and call
device event handlers registered in the notifier list.

Device event handlers are expected to use atomic_notifier_chain_register()
and atomic_notifier_chain_unregister() to register/unregister their
event handler in pseries_ioei_notifier_list list with IO event interrupt.
Device event handlers are responsible to identify if the event belongs
to the device event handler. The device event handle should return NOTIFY_OK
after the event is handled if the event belongs to the device event handler,
or NOTIFY_DONE otherwise.

Signed-off-by: Tseng-Hui (Frank) Lin <thlin@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-05-06 13:19:01 +10:00
Tseng-Hui (Frank) Lin 4cb4638079 powerpc/pseries: Add RTAS event log v6 definition
This patch adds definitions of non-IBM specific v6 extended log
definitions to rtas.h.

Signed-off-by: Tseng-Hui (Frank) Lin <tsenglin@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-05-06 13:18:59 +10:00
Stephen Rothwell 79af2187fa powerpc: Fix compile with icwsx support
Due to a collision between NO_CONTEXT->MMU_NO_CONTEXT change and
Anton's patch.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-05-06 13:18:34 +10:00
Anton Blanchard 228e548e60 net: Add sendmmsg socket system call
This patch adds a multiple message send syscall and is the send
version of the existing recvmmsg syscall. This is heavily
based on the patch by Arnaldo that added recvmmsg.

I wrote a microbenchmark to test the performance gains of using
this new syscall:

http://ozlabs.org/~anton/junkcode/sendmmsg_test.c

The test was run on a ppc64 box with a 10 Gbit network card. The
benchmark can send both UDP and RAW ethernet packets.

64B UDP

batch   pkts/sec
1       804570
2       872800 (+ 8 %)
4       916556 (+14 %)
8       939712 (+17 %)
16      952688 (+18 %)
32      956448 (+19 %)
64      964800 (+20 %)

64B raw socket

batch   pkts/sec
1       1201449
2       1350028 (+12 %)
4       1461416 (+22 %)
8       1513080 (+26 %)
16      1541216 (+28 %)
32      1553440 (+29 %)
64      1557888 (+30 %)

We see a 20% improvement in throughput on UDP send and 30%
on raw socket send.

[ Add sparc syscall entries. -DaveM ]

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-05 11:10:14 -07:00
Ingo Molnar 98bb318864 Merge branch 'perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/urgent 2011-05-04 20:33:42 +02:00
Randy Dunlap 9cddf15f18 x86/net: only select BPF_JIT when NET is enabled
Fix kconfig unmet dependency warning: HAVE_BPF_JIT depends on NET, so
make the "select" of it depend on NET also.

warning: (X86) selects HAVE_BPF_JIT which has unmet direct dependencies (NET)

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-04 11:06:05 -07:00
Dominik Brodowski 2d06d8c49a [CPUFREQ] use dynamic debug instead of custom infrastructure
With dynamic debug having gained the capability to report debug messages
also during the boot process, it offers a far superior interface for
debug messages than the custom cpufreq infrastructure. As a first step,
remove the old cpufreq_debug_printk() function and replace it with a call
to the generic pr_debug() function.

How can dynamic debug be used on cpufreq? You need a kernel which has
CONFIG_DYNAMIC_DEBUG enabled.

To enabled debugging during runtime, mount debugfs and

$ echo -n 'module cpufreq +p' > /sys/kernel/debug/dynamic_debug/control

for debugging the complete "cpufreq" module. To achieve the same goal during
boot, append

	ddebug_query="module cpufreq +p"

as a boot parameter to the kernel of your choice.

For more detailled instructions, please see
Documentation/dynamic-debug-howto.txt

Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Dave Jones <davej@redhat.com>
2011-05-04 11:50:57 -04:00
Naga Chumbalkar 904cc1e637 [CPUFREQ] Fix _OSC UUID in pcc-cpufreq
UUID needs to be written out the way it is described in
Sec 18.5.124 of ACPI 4.0a Specification.

Platform firmware's use of this UUID/_OSC is optional, which is
why we didn't notice this bug earlier.

Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Cc: stable@kernel.org
2011-05-04 11:50:56 -04:00
Richard A Lary ecb7390211 powerpc/pseries/eeh: Handle functional reset on non-PCIe device
Fundamental reset is an optional reset type supported only by PCIe adapters.
  Handle the unexpected case where a non-PCIe device has requested a
  fundamental reset. Try hot-reset as a fallback to handle this case.

Signed-off-by: Richard A Lary <rlary@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-05-04 16:02:38 +10:00
Richard A Lary 308fc4f8e1 powerpc/pseries/eeh: Propagate needs_freset flag to device at PE
For multifunction adapters with a PCI bridge or switch as the device
  at the Partitionable Endpoint(PE), if one or more devices below PE
  sets dev->needs_freset, that value will be set for the PE device.

  In other words, if any device below PE requires a fundamental reset
  the PE will request a fundamental reset.

Signed-off-by: Richard A Lary <rlary@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-05-04 16:02:36 +10:00
Brian King 9ee820fa00 powerpc/pseries: Add page coalescing support
Adds support for page coalescing, which is a feature on IBM Power servers
which allows for coalescing identical pages between logical partitions.
Hint text pages as coalesce candidates, since they are the most likely
pages to be able to be coalesced between partitions. This patch also
exports some page coalescing statistics available from firmware via
lparcfg.

[BenH: Moved a couple of things around to fix compile problems]

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-05-04 16:02:21 +10:00
Ben Hutchings 7707e4110e powerpc/kexec: Fix build failure on 32-bit SMP
Commit b987812b3f left
crash_kexec_wait_realmode() undefined for UP.

Commit 7c7a81b53e defined it for UP but
left it undefined for 32-bit SMP.

Seems like people are getting confused by nested #ifdef's, so move the
definitions of crash_kexec_wait_realmode() after the #ifdef CONFIG_SMP
section.

Compile-tested with 32-bit UP, 32-bit SMP and 64-bit SMP configurations.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-05-04 15:22:59 +10:00
KOSAKI Motohiro 104699c0ab powerpc: Convert old cpumask API into new one
Adapt new API.

Almost change is trivial. Most important change is the below line
because we plan to change task->cpus_allowed implementation.

-       ctx->cpus_allowed = current->cpus_allowed;

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-05-04 15:22:59 +10:00
Paul Mackerras 48404f2e95 powerpc: Save Come-From Address Register (CFAR) in exception frame
Recent 64-bit server processors (POWER6 and POWER7) have a "Come-From
Address Register" (CFAR), that records the address of the most recent
branch or rfid (return from interrupt) instruction for debugging purposes.

This saves the value of the CFAR in the exception entry code and stores
it in the exception frame.  We also make xmon print the CFAR value in
its register dump code.

Rather than extend the pt_regs struct at this time, we steal the orig_gpr3
field, which is only used for system calls, and use it for the CFAR value
for all exceptions/interrupts other than system calls.  This means we
don't save the CFAR on system calls, which is not a great problem since
system calls tend not to happen unexpectedly, and also avoids adding the
overhead of reading the CFAR to the system call entry path.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-05-04 15:22:09 +10:00
Paul Mackerras 1977b50212 powerpc: Save register r9-r13 values accurately on interrupt with bad stack
When we take an interrupt or exception from kernel mode and the stack
pointer is obviously not a kernel address (i.e. the top bit is 0), we
switch to an emergency stack, save register values and panic.  However,
on 64-bit server machines, we don't actually save the values of r9 - r13
at the time of the interrupt, but rather values corrupted by the
exception entry code for r12-r13, and nothing at all for r9-r11.

This fixes it by passing a pointer to the register save area in the paca
through to the bad_stack code in r3.  The register values are saved in
one of the paca register save areas (depending on which exception this
is).  Using the pointer in r3, the bad_stack code now retrieves the
saved values of r9 - r13 and stores them in the exception frame on the
emergency stack.  This also stores the normal exception frame marker
("regshere") in the exception frame.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-05-04 15:19:27 +10:00
Tseng-Hui (Frank) Lin 851d2e2fe8 powerpc: Add Initiate Coprocessor Store Word (icswx) support
Icswx is a PowerPC instruction to send data to a co-processor. On Book-S
processors the LPAR_ID and process ID (PID) of the owning process are
registered in the window context of the co-processor at initialization
time. When the icswx instruction is executed the L2 generates a cop-reg
transaction on PowerBus. The transaction has no address and the
processor does not perform an MMU access to authenticate the transaction.
The co-processor compares the LPAR_ID and the PID included in the
transaction and the LPAR_ID and PID held in the window context to
determine if the process is authorized to generate the transaction.

The OS needs to assign a 16-bit PID for the process. This cop-PID needs
to be updated during context switch. The cop-PID needs to be destroyed
when the context is destroyed.

Signed-off-by: Sonny Rao <sonnyrao@linux.vnet.ibm.com>
Signed-off-by: Tseng-Hui (Frank) Lin <thlin@linux.vnet.ibm.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-05-04 15:19:26 +10:00
Michael Neuling a32e252f7c powerpc: Use new CPU feature bit to select 2.06 tlbie
This removes MMU_FTR_TLBIE_206 as we can now use CPU_FTR_HVMODE_206.  It
also changes the logic to select which tlbie to use to be based on this
new CPU feature bit.

This also duplicates the ASM_FTR_IF/SET/CLR defines for CPU features
(copied from MMU features).

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-05-04 15:19:26 +10:00
Gerald Schaefer 0200f3ecc1 crypto: s390 - add System z hardware support for CTR mode
This patch adds System z hardware acceleration support for AES, DES
and 3DES in CTR mode. The hardware support is available starting with
System z196.

Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2011-05-04 15:09:44 +10:00
Gerald Schaefer df1309ce95 crypto: s390 - add System z hardware support for GHASH
This patch adds System z hardware acceleration support for the GHASH
algorithm for GCM (Galois/Counter Mode).
The hardware support is available beginning with System z196.

Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2011-05-04 15:06:32 +10:00
Gerald Schaefer 99d9722215 crypto: s390 - add System z hardware support for XTS mode
This patch adds System z hardware acceleration support for the AES XTS mode.
The hardware support is available beginning with System z196.

Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2011-05-04 15:06:30 +10:00
Jan Glauber 98971f8439 crypto: s390 - cleanup DES code
Remove a stale file left over from 1efbd15c3b
and and cleanup the DES code a bit to make it easier to add new code.

Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2011-05-04 15:05:52 +10:00
Jan Glauber 1822bc9093 crypto: s390 - extend crypto facility check
The specification which crypto facility is required for an algorithm is added
as a parameter to the availability check which is done before an algorithm is
registered. With this change it is easier to add new algorithms that require
different facilities.

Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2011-05-04 15:05:49 +10:00
Grant Likely 476eb49126 powerpc/irq: Stop exporting irq_map
First step in eliminating irq_map[] table entirely

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-05-04 15:02:15 +10:00
Linus Torvalds 609cfda586 Merge branch 'stable/bug-fixes-for-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/bug-fixes-for-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen: mask_rw_pte mark RO all pagetable pages up to pgt_buf_top
  xen/mmu: Add workaround "x86-64, mm: Put early page table high"
2011-05-03 09:25:42 -07:00
Linus Torvalds bab0dcc717 Merge branches 'x86-fixes-for-linus' and 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, reboot: Fix relocations in reboot_32.S
  x86, NUMA: Fix empty memblk detection in numa_cleanup_meminfo()
  x86, AMD: Fix APIC timer erratum 400 affecting K8 Rev.A-E processors

* 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  genirq: Fix typo CONFIG_GENIRC_IRQ_SHOW_LEVEL
2011-05-03 09:23:44 -07:00
Linus Torvalds 5933f2ae35 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (47 commits)
  sysctl: net: call unregister_net_sysctl_table where needed
  Revert: veth: remove unneeded ifname code from veth_newlink()
  smsc95xx: fix reset check
  tg3: Fix failure to enable WoL by default when possible
  networking: inappropriate ioctl operation should return ENOTTY
  amd8111e: trivial typo spelling: Negotitate -> Negotiate
  ipv4: don't spam dmesg with "Using LC-trie" messages
  af_unix: Only allow recv on connected seqpacket sockets.
  mii: add support of pause frames in mii_get_an
  net: ftmac100: fix scheduling while atomic during PHY link status change
  usbnet: Transfer of maintainership
  usbnet: add support for some Huawei modems with cdc-ether ports
  bnx2: cancel timer on device removal
  iwl4965: fix "Received BA when not expected"
  iwlagn: fix "Received BA when not expected"
  dsa/mv88e6131: fix unknown multicast/broadcast forwarding on mv88e6085
  usbnet: Resubmit interrupt URB if device is open
  iwl4965: fix "TX Power requested while scanning"
  iwlegacy: led stay solid on when no traffic
  b43: trivial: update module info about ucode16_mimo firmware
  ...
2011-05-02 18:00:43 -07:00
H. Peter Anvin 7806a49ab6 x86, reboot: Fix relocations in reboot_32.S
The use of base for %ebx in this file is arbitrary, *except* that we
also use it to compute the real-mode segment.  Therefore, make it so
that r_base really is the true address to which %ebx points.

This resolves kernel bugzilla 33302.

Reported-and-tested-by: Alexey Zaytsev <alexey.zaytsev@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/n/tip-08os5wi3yq1no0y4i5m4z7he@git.kernel.org
2011-05-02 14:44:46 -07:00
Stefano Stabellini b9269dc7bf xen: mask_rw_pte mark RO all pagetable pages up to pgt_buf_top
mask_rw_pte is currently checking if a pfn is a pagetable page if it
falls in the range pgt_buf_start - pgt_buf_end but that is incorrect
because pgt_buf_end is a moving target: pgt_buf_top is the real
boundary.

Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-02 16:33:52 -04:00
Konrad Rzeszutek Wilk a38647837a xen/mmu: Add workaround "x86-64, mm: Put early page table high"
As a consequence of the commit:

commit 4b239f458c
Author: Yinghai Lu <yinghai@kernel.org>
Date:   Fri Dec 17 16:58:28 2010 -0800

    x86-64, mm: Put early page table high

it causes the Linux kernel to crash under Xen:

mapping kernel into physical memory
Xen: setup ISA identity maps
about to get started...
(XEN) mm.c:2466:d0 Bad type (saw 7400000000000001 != exp 1000000000000000) for mfn b1d89 (pfn bacf7)
(XEN) mm.c:3027:d0 Error while pinning mfn b1d89
(XEN) traps.c:481:d0 Unhandled invalid opcode fault/trap [#6] on VCPU 0 [ec=0000]
(XEN) domain_crash_sync called from entry.S
(XEN) Domain 0 (vcpu#0) crashed on cpu#0:
...

The reason is that at some point init_memory_mapping is going to reach
the pagetable pages area and map those pages too (mapping them as normal
memory that falls in the range of addresses passed to init_memory_mapping
as argument). Some of those pages are already pagetable pages (they are
in the range pgt_buf_start-pgt_buf_end) therefore they are going to be
mapped RO and everything is fine.
Some of these pages are not pagetable pages yet (they fall in the range
pgt_buf_end-pgt_buf_top; for example the page at pgt_buf_end) so they
are going to be mapped RW.  When these pages become pagetable pages and
are hooked into the pagetable, xen will find that the guest has already
a RW mapping of them somewhere and fail the operation.
The reason Xen requires pagetables to be RO is that the hypervisor needs
to verify that the pagetables are valid before using them. The validation
operations are called "pinning" (more details in arch/x86/xen/mmu.c).

In order to fix the issue we mark all the pages in the entire range
pgt_buf_start-pgt_buf_top as RO, however when the pagetable allocation
is completed only the range pgt_buf_start-pgt_buf_end is reserved by
init_memory_mapping. Hence the kernel is going to crash as soon as one
of the pages in the range pgt_buf_end-pgt_buf_top is reused (b/c those
ranges are RO).

For this reason, this function is introduced which is called _after_
the init_memory_mapping has completed (in a perfect world we would
call this function from init_memory_mapping, but lets ignore that).

Because we are called _after_ init_memory_mapping the pgt_buf_[start,
end,top] have all changed to new values (b/c another init_memory_mapping
is called). Hence, the first time we enter this function, we save
away the pgt_buf_start value and update the pgt_buf_[end,top].

When we detect that the "old" pgt_buf_start through pgt_buf_end
PFNs have been reserved (so memblock_x86_reserve_range has been called),
we immediately set out to RW the "old" pgt_buf_end through pgt_buf_top.

And then we update those "old" pgt_buf_[end|top] with the new ones
so that we can redo this on the next pagetable.

Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Reviewed-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
[v1: Updated with Jeremy's comments]
[v2: Added the crash output]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-02 16:33:34 -04:00
Linus Torvalds 625a3b6057 Merge branch 'fixes' of master.kernel.org:/home/rmk/linux-2.6-arm
* 'fixes' of master.kernel.org:/home/rmk/linux-2.6-arm: (47 commits)
  CLKDEV: Fix clkdev return value for NULL clk case
  ARM: 6891/1: prevent heap corruption in OABI semtimedop
  ARM: kprobes: Tidy-up kprobes-decode.c
  ARM: kprobes: Add emulation of hint instructions like NOP and WFI
  ARM: kprobes: Add emulation of SBFX, UBFX, BFI and BFC instructions
  ARM: kprobes: Add emulation of MOVW and MOVT instructions
  ARM: kprobes: Reject probing of undefined data processing instructions
  ARM: kprobes: Remove redundant code in space_1111
  ARM: kprobes: Fix emulation of PLD instructions
  ARM: kprobes: Reject probing of SETEND instructions
  ARM: kprobes: Consolidate stub decoding functions
  ARM: kprobes: Reject probing of all coprocessor instructions
  ARM: kprobes: Fix emulation of USAD8 instructions
  ARM: kprobes: Fix emulation of SMUAD, SMUSD and SMMUL instructions
  ARM: kprobes: Fix emulation of SXTB16, SXTB, SXTH, UXTB16, UXTB and UXTH instructions
  ARM: kprobes: Reject probing of undefined media instructions
  ARM: kprobes: Add emulation of RBIT instruction
  ARM: kprobes: Reject probing of LDRB instructions which load PC
  ARM: kprobes: Fix emulation of LDRD and STRD instructions
  ARM: kprobes: Reject probing of LDR/STR instructions which update PC unpredictably
  ...
2011-05-02 12:17:05 -07:00
Linus Torvalds 96f3ee2805 Merge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6:
  [S390] irqstats: fix counting of pfault, dasd diag and virtio irqs
  [S390] prng: fix pointer arithmetic
2011-05-02 08:47:35 -07:00
Yinghai Lu e5a10c1bd1 x86, NUMA: Trim numa meminfo with max_pfn in a separate loop
During testing 32bit numa unifying code from tj, found one system with
more than 64g fails to use numa.  It turns out we do not trim numa
meminfo correctly against max_pfn in case start address of a node is
higher than 64GiB.  Bug fix made it to tip tree.

This patch moves the checking and trimming to a separate loop.  So we
don't need to compare low/high in following merge loops.  It makes the
code more readable.

Also it makes the node merge printouts less strange.  On a 512GiB numa
system with 32bit,

before:
> NUMA: Node 0 [0,a0000) + [100000,80000000) -> [0,80000000)
> NUMA: Node 0 [0,80000000) + [100000000,1080000000) -> [0,1000000000)

after:
> NUMA: Node 0 [0,a0000) + [100000,80000000) -> [0,80000000)
> NUMA: Node 0 [0,80000000) + [100000000,1000000000) -> [0,1000000000)

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
[Updated patch description and comment slightly.]
Signed-off-by: Tejun Heo <tj@kernel.org>
2011-05-02 17:24:49 +02:00
Yinghai Lu a56bca80db x86, NUMA: Rename setup_node_bootmem() to setup_node_data()
After using memblock to replace bootmem, that function only sets up
node_data now.

Change the name to reflect what it actually does.

tj: Minor adjustment to the patch description.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2011-05-02 17:24:49 +02:00
Tejun Heo 1b7e03ef75 x86, NUMA: Enable emulation on 32bit too
Now that NUMA init path is unified, NUMA emulation can be enabled on
32bit.  Make numa_emluation.c safe on 32bit by doing the followings.

* Define MAX_DMA32_PFN on 32bit too.

* Include bootmem.h for max_pfn declaration.

* Use u64 explicitly and always use PFN_PHYS() when converting page
  number to address.

* Avoid __udivdi3() generation on 32bit by doing number of pages
  calculation instead in split_nodes_interleave().

And drop X86_64 dependency from Kconfig.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 17:24:48 +02:00
Tejun Heo 2706a0bf7b x86, NUMA: Enable CONFIG_AMD_NUMA on 32bit too
Now that NUMA init path is unified, amdtopology can be enabled on
32bit.  Make amdtopology.c safe on 32bit by explicitly using u64 and
drop X86_64 dependency from Kconfig.

Inclusion of bootmem.h is added for max_pfn declaration.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 17:24:48 +02:00
Tejun Heo c6f5887820 x86, NUMA: Rename amdtopology_64.c to amdtopology.c
amdtopology is going to be used by 32bit too drop _64 suffix.  This is
pure rename.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 17:24:48 +02:00
Tejun Heo 752d4f372f x86, NUMA: Make numa_init_array() static
numa_init_array() no longer has users outside of numa.c.  Make it
static.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 17:24:48 +02:00
Tejun Heo bd6709a91a x86, NUMA: Make 32bit use common NUMA init path
With both _numa_init() methods converted and the rest of init code
adjusted, numa_32.c now can switch from the 32bit only init code to
the common one in numa.c.

* Shim get_memcfg_*()'s are dropped and initmem_init() calls
  x86_numa_init(), which is updated to handle NUMAQ.

* All boilerplate operations including node range limiting, pgdat
  alloc/init are handled by numa_init().  32bit only implementation is
  removed.

* 32bit numa_add_memblk(), numa_set_distance() and
  memory_add_physaddr_to_nid() removed and common versions in
  numa_32.c enabled for 32bit.

This change causes the following behavior changes.

* NODE_DATA()->node_start_pfn/node_spanned_pages properly initialized
  for 32bit too.

* Much more sanity checks and configuration cleanups.

* Proper handling of node distances.

* The same NUMA init messages as 64bit.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 17:24:48 +02:00
Tejun Heo 7888e96b26 x86, NUMA: Initialize and use remap allocator from setup_node_bootmem()
setup_node_bootmem() is taken from 64bit and doesn't use remap
allocator.  It's about to be shared with 32bit so add support for it.
If NODE_DATA is remapped, it's noted in the debug message and node
locality check is skipped as the __pa() of the remapped address
doesn't reflect the actual physical address.

On 64bit, remap allocator becomes noop and doesn't affect the
behavior.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 14:18:54 +02:00
Tejun Heo 99cca492ea x86-32, NUMA: Add @start and @end to init_alloc_remap()
Instead of dereferencing node_start/end_pfn[] directly, make
init_alloc_remap() take @start and @end and let the caller be
responsible for making sure the range is sane.  This is to prepare for
use from unified NUMA init code.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 14:18:54 +02:00
Tejun Heo 38f3e1ca24 x86, NUMA: Remove long 64bit assumption from numa.c
Code moved from numa_64.c has assumption that long is 64bit in several
places.  This patch removes the assumption by using {s|u}64_t
explicity, using PFN_PHYS() for page number -> addr conversions and
adjusting printf formats.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 14:18:53 +02:00
Tejun Heo 744baba0c4 x86, NUMA: Enable build of generic NUMA init code on 32bit
Generic NUMA init code was moved to numa.c from numa_64.c but is still
guaraded by CONFIG_X86_64.  This patch removes the compile guard and
enables compiling on 32bit.

* numa_add_memblk() and numa_set_distance() clash with the shim
  implementation in numa_32.c and are left out.

* memory_add_physaddr_to_nid() clashes with 32bit implementation and
  is left out.

* MAX_DMA_PFN definition in dma.h moved out of !CONFIG_X86_32.

* node_data definition in numa_32.c removed in favor of the one in
  numa.c.

There are places where ulong is assumed to be 64bit.  The next patch
will fix them up.  Note that although the code is compiled it isn't
used yet and this patch doesn't cause any functional change.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 14:18:53 +02:00
Tejun Heo a4106eae65 x86, NUMA: Move NUMA init logic from numa_64.c to numa.c
Move the generic 64bit NUMA init machinery from numa_64.c to numa.c.

* node_data[], numa_mem_info and numa_distance
* numa_add_memblk[_to](), numa_remove_memblk[_from]()
* numa_set_distance() and friends
* numa_init() and all the numa_meminfo handling helpers called from it
* dummy_numa_init()
* memory_add_physaddr_to_nid()

A new function x86_numa_init() is added and the content of
numa_64.c::initmem_init() is moved into it.  initmem_init() now simply
calls x86_numa_init().

Constants and numa_off declaration are moved from numa_{32|64}.h to
numa.h.

This is code reorganization and doesn't involve any functional change.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 14:18:53 +02:00
Tejun Heo 299a180aec x86-32, NUMA: Update numaq to use new NUMA init protocol
Update numaq such that it calls numa_add_memblk() and sets
numa_nodes_parsed instead of directly diddling with NUMA states.  The
original get_memcfg_numaq() is renamed to numaq_numa_init() and new
get_memcfg_numaq() is created in numa_32.c.

The shim numa_add_memblk() implementation handles node_start/end_pfn[]
and node_set_online() for nodes with memory.  The new
get_memcfg_numaq() exactly the same with get_memcfg_from_srat() other
than calling the numaq init function.  Things get_memcfgs_numaq() do
are not strictly necessary for numaq but added for consistency and to
help unifying NUMA init handling.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 14:18:53 +02:00
Tejun Heo 5acd91ab83 x86-32, NUMA: Replace srat_32.c with srat.c
SRAT support implementation in srat_32.c and srat.c are generally
similar; however, there are some differences.

First of all, 64bit implementation supports more types of SRAT
entries.  64bit supports x2apic, affinity, memory and SLIT.  32bit
only supports processor and memory.

Most other differences stem from different initialization protocols
employed by 64bit and 32bit NUMA init paths.

On 64bit,

* Mappings among PXM, node and apicid are directly done in each SRAT
  entry callback.

* Memory affinity information is passed to numa_add_memblk() which
  takes care of all interfacing with NUMA init.

* Doesn't directly initialize NUMA configurations.  All the
  information is recorded in numa_nodes_parsed and memblks.

On 32bit,

* Checks numa_off.

* Things go through one more level of indirection via private tables
  but eventually end up initializing the same mappings.

* node_start/end_pfn[] are initialized and
  memblock_x86_register_active_regions() is called for each memory
  chunk.

* node_set_online() is called for each online node.

* sort_node_map() is called.

There are also other minor differences in sanity checking and messages
but taking 64bit version should be good enough.

This patch drops the 32bit specific implementation and makes the 64bit
implementation common for both 32 and 64bit.

The init protocol differences are dealt with in two places - the
numa_add_memblk() shim added in the previous patch and new temporary
numa_32.c:get_memcfg_from_srat() which wraps invocation of
x86_acpi_numa_init().

The shim numa_add_memblk() handles the folowings.

* node_start/end_pfn[] initialization.

* node_set_online() for memory nodes.

* Invocation of memblock_x86_register_active_regions().

The shim get_memcfg_from_srat() handles the followings.

* numa_off check.

* node_set_online() for CPU nodes.

* sort_node_map() invocation.

* Clearing of numa_nodes_parsed and active_ranges on failure.

The shims are temporary and will be removed as the generic NUMA init
path in 32bit is replaced with 64bit one.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 14:18:53 +02:00
Tejun Heo b0d310801a x86-32, NUMA: implement temporary NUMA init shims
To help transition to common NUMA init, implement temporary 32bit
shims for numa_add_memblk() and numa_set_distance().
numa_add_memblk() registers the memblk and adjusts
node_start/end_pfn[].  numa_set_distance() is noop.

These shims will allow using 64bit NUMA init functions on 32bit and
gradual transition to common NUMA init path.

For detailed description, please read description of commits which
make use of the shim functions.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 14:18:53 +02:00
Tejun Heo e6df595b37 x86, NUMA: Move numa_nodes_parsed to numa.[hc]
Move numa_nodes_parsed from numa_64.[hc] to numa.[hc] to prepare for
NUMA init path unification.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 14:18:53 +02:00
Tejun Heo daf4f480ae x86-32, NUMA: Move get_memcfg_numa() into numa_32.c
There's no reason get_memcfg_numa() to be implemented inline in
mmzone_32.h.  Move it to numa_32.c and also make
get_memcfg_numa_flag() static.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 14:18:53 +02:00
Tejun Heo eca9ad3132 x86, NUMA: make srat.c 32bit safe
Make srat.c 32bit safe by removing the assumption that unsigned long
is 64bit.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 14:18:52 +02:00
Tejun Heo 7b2600f8ee x86, NUMA: rename srat_64.c to srat.c
Rename srat_64.c to srat.c.  This is to prepare for unification of
NUMA init paths between 32 and 64bit.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 14:18:52 +02:00
Tejun Heo 1201e10a09 x86, NUMA: trivial cleanups
* Kill no longer used struct bootnode.

* Kill dangling declaration of pxm_to_nid() in numa_32.h.

* Make setup_node_bootmem() static.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 14:18:52 +02:00
Tejun Heo 797390d855 x86-32, NUMA: use sparse_memory_present_with_active_regions()
Instead of calling memory_present() for each region from NUMA init,
call sparse_memory_present_with_active_regions() from paging_init()
similarly to x86-64.

For flat and numaq, this results in exactly the same memory_present()
calls.  For srat, if there are multiple memory chunks for a node,
after this change, memory_present() will be called separately for each
chunk instead of being called once to encompass the whole range, which
doesn't cause any harm and actually is the better behavior.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 14:18:52 +02:00
Tejun Heo 84914ed0ec x86-32, NUMA: Make apic->x86_32_numa_cpu_node() optional
NUMAQ is the only meaningful user of this callback and
setup_local_APIC() the only callsite.  Stop torturing everyone else by
making the callback optional and removing all the boilerplate
implementations and assignments.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 14:18:52 +02:00
Tejun Heo 6bd262731b x86, NUMA: Unify 32/64bit numa_cpu_node() implementation
Currently, the only meaningful user of apic->x86_32_numa_cpu_node() is
NUMAQ which returns valid mapping only after CPU is initialized during
SMP bringup; thus, the previous patch to set apicid -> node in
setup_local_APIC() makes __apicid_to_node[] always contain the correct
mapping whether custom apic->x86_32_numa_cpu_node() is used or not.

So, there is no reason to keep separate 32bit implementation.  We can
always consult __apicid_to_node[].  Move 64bit implementation from
numa_64.c to numa.c and remove 32bit implementation from numa_32.c.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 14:18:52 +02:00
Tejun Heo c4b90c1199 x86-32, NUMA: Automatically set apicid -> node in setup_local_APIC()
Some x86-32 NUMA implementations (NUMAQ) don't initialize apicid ->
node mapping using set_apicid_to_node() during NUMA init but implement
custom apic->x86_32_numa_cpu_node() instead.

This patch automatically initializes the default apic -> node mapping
table from apic->x86_32_numa_cpu_node() from setup_local_APIC() such
that the mapping table is in sync with the actual mapping.

As the table isn't used by custom implementations, this doesn't make
any difference at this point.  This is in preparation of unifying
numa_cpu_node() between x86-32 and 64.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 14:18:52 +02:00
Tejun Heo acd26d611e x86-64, NUMA: simplify nodedata allocation
With top-down memblock allocation, the allocation range limits in
ealry_node_mem() can be simplified - try node-local first, then any
node but in any case don't allocate below DMA limit.

Remove early_node_mem() and implement simplified allocation directly
in setup_node_bootmem().

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 14:18:51 +02:00
Tejun Heo ebe685f24e x86-64, NUMA: trivial cleanups for setup_node_bootmem()
Make the following trivial changes in preparation for further updates.

* nodeid -> nid, nid -> tnid
* use nd_ prefix for nodedata related variables
* remove start/end_pfn and use start/end directly

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 14:18:51 +02:00
Tejun Heo 9688678a66 x86-64, NUMA: Simplify hotadd memory handling
The only special handling NUMA needs to do for hotadd memory is
determining the node for the hotadd memory given the address of it and
there's nothing specific to specific config method used.

srat_64.c does somewhat elaborate error checking on
ACPI_SRAT_MEM_HOT_PLUGGABLE regions, remembers them and implements
memory_add_physaddr_to_nid() which determines the node for given
hotadd address.

This is almost completely redundant.  All the information is already
available to the generic NUMA code which already performs all the
sanity checking and merging.  All that's necessary is not using
__initdata from numa_meminfo and providing a function which uses it to
map address to node.

Drop the specific implementation from srat_64.c and add generic
memory_add_physaddr_to_nid() in numa_64.c, which is enabled if
CONFIG_MEMORY_HOTPLUG is set.  Other than dropping the code, srat_64.c
doesn't need any change as it already calls numa_add_memblk() for hot
pluggable regions which is enough.

While at it, change CONFIG_MEMORY_HOTPLUG_SPARSE in srat_64.c to
CONFIG_MEMORY_HOTPLUG, for NUMA on x86-64, the two are always the
same.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 14:18:51 +02:00
Tejun Heo ba67cf5cf2 Merge branch 'x86/urgent' into x86-mm
Merge reason: Pick up the following two fix commits.

  2be19102b7: x86, NUMA: Fix empty memblk detection in numa_cleanup_meminfo()
  765af22da8: x86-32, NUMA: Fix ACPI NUMA init broken by recent x86-64 change

Scheduled NUMA init 32/64bit unification changes depend on these.

Signed-off-by: Tejun Heo <tj@kernel.org>
2011-05-02 14:16:47 +02:00
Tejun Heo aff364860a Merge branch 'x86/numa' into x86-mm
Merge reason: Pick up x86-32 remap allocator cleanup changes - 14
commits, 3fe14ab541^..993ba1585c.

  3fe14ab541: x86-32, numa: Fix failure condition check in alloc_remap()
  993ba1585c: x86-32, numa: Update remap allocator comments

Scheduled NUMA init 32/64bit unification changes depend on them.

Signed-off-by: Tejun Heo <tj@kernel.org>
2011-05-02 14:08:47 +02:00
Bart Van Assche 9de4966a4d x86: Fix spelling error in the memcpy() source code comment
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: http://lkml.kernel.org/r/201105011409.21629.bvanassche@acm.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-01 19:16:18 +02:00
Yinghai Lu 2be19102b7 x86, NUMA: Fix empty memblk detection in numa_cleanup_meminfo()
numa_cleanup_meminfo() trims each memblk between low (0) and
high (max_pfn) limits and discards empty ones.  However, the
emptiness detection incorrectly used equality test.  If the
start of a memblk is higher than max_pfn, it is empty but fails
the equality test and doesn't get discarded.

The condition triggers when max_pfn is lower than start of a
NUMA node and results in memory misconfiguration - leading to
WARN_ON()s and other funnies.  The bug was discovered in devel
branch where 32bit too uses this code path for NUMA init.  If a
node is above the addressing limit, max_pfn ends up lower than
the node triggering this problem.

The failure hasn't been observed on x86-64 but is still possible
with broken hardware e820/NUMA info.  As the fix is very low
risk, it would be better to apply it even for 64bit.

Fix it by using >= instead of ==.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
[ Extracted the actual fix from the original patch and rewrote patch description. ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/20110501171204.GO29280@htj.dyndns.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-01 19:15:11 +02:00
Ingo Molnar 809435ff4f Merge branch 'tip/perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core 2011-05-01 19:09:39 +02:00
Boris Ostrovsky e20a2d205c x86, AMD: Fix APIC timer erratum 400 affecting K8 Rev.A-E processors
Older AMD K8 processors (Revisions A-E) are affected by erratum
400 (APIC timer interrupts don't occur in C states greater than
C1). This, for example, means that X86_FEATURE_ARAT flag should
not be set for these parts.

This addresses regression introduced by commit
b87cf80af3 ("x86, AMD: Set ARAT
feature on AMD processors") where the system may become
unresponsive until external interrupt (such as keyboard input)
occurs. This results, for example, in time not being reported
correctly, lack of progress on the system and other lockups.

Reported-by: Joerg-Volker Peetz <jvpeetz@web.de>
Tested-by: Joerg-Volker Peetz <jvpeetz@web.de>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Boris Ostrovsky <Boris.Ostrovsky@amd.com>
Cc: stable@kernel.org
Link: http://lkml.kernel.org/r/1304113663-6586-1-git-send-email-ostr@amd64.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-01 18:55:51 +02:00
Rafael J. Wysocki 85eb8c8d0b PM / Runtime: Generic clock manipulation rountines for runtime PM (v6)
Many different platforms and subsystems may want to disable device
clocks during suspend and enable them during resume which is going to
be done in a very similar way in all those cases.  For this reason,
provide generic routines for the manipulation of device clocks during
suspend and resume.

Convert the ARM shmobile platform to using the new routines.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-04-30 00:25:44 +02:00
Linus Torvalds 40a963502c Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf, x86, nmi: Move LVT un-masking into irq handlers
  perf events, x86: Work around the Nehalem AAJ80 erratum
  perf, x86: Fix BTS condition
  ftrace: Build without frame pointers on Microblaze
2011-04-29 15:08:53 -07:00
Linus Torvalds a6ab948e65 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: ce4100: Configure IOAPIC pins for USB and SATA to level type
  x86: devicetree: Configure IOAPIC pin only once
  x86, setup: When probing memory with e801, use ax/bx as a pair
2011-04-29 15:07:19 -07:00
Mike Waychison f548ccd47d x86: Better comments for get_bios_ebda()
Make the comments a bit clearer for get_bios_ebda so that it actually
tells us what it is returning.

Signed-off-by: Mike Waychison <mikew@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-29 14:13:15 -07:00
Mike Waychison 57d5f9f808 x86: get_bios_ebda_length()
Add a wrapper routine that tells us the length of the EBDA if it is
present.  This guy also ensures that the returned length doesn't let the
EBDA run past the 640KiB mark.

Signed-off-by: Mike Waychison <mikew@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-29 14:13:15 -07:00
David Decotigny 8ae6daca85 ethtool: Call ethtool's get/set_settings callbacks with cleaned data
This makes sure that when a driver calls the ethtool's
get/set_settings() callback of another driver, the data passed to it
is clean. This guarantees that speed_hi will be zeroed correctly if
the called callback doesn't explicitely set it: we are sure we don't
get a corrupted speed from the underlying driver. We also take care of
setting the cmd field appropriately (ETHTOOL_GSET/SSET).

This applies to dev_ethtool_get_settings(), which now makes sure it
sets up that ethtool command parameter correctly before passing it to
drivers. This also means that whoever calls dev_ethtool_get_settings()
does not have to clean the ethtool command parameter. This function
also becomes an exported symbol instead of an inline.

All drivers visible to make allyesconfig under x86_64 have been
updated.

Signed-off-by: David Decotigny <decot@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-29 14:01:30 -07:00
Linus Torvalds 9748d4d2b4 Merge branch 'omap-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6
* 'omap-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6:
  OMAP3+: voltage: remove initial voltage
  OMAP4: Intialize IVA Device in addition to DSP device.
  omap: rx51: mark reserved memory earlier
  OMAP3: l3: fix for "irq 10: nobody cared" message
  arm: omap2: enable smc instruction for sleep34xx
  OMAP2/3: hwmod: fix gpio-reset timeouts seen during bootup.
  OMAP3: PM: Do not rely on ROM code to restore CM_AUTOIDLE_PLL.AUTO_PERIPH_DPLL
  OMAP2+: PM: Fix the saving of CM_AUTOIDLE_PLL register on scratchpad area
  OMAP4: clock data: Change DSS clock aliases
  OMAP2+: hwmod data: Fix wrong dma_system end address
2011-04-29 07:54:48 -07:00
Dan Rosenberg 0f22072ab5 ARM: 6891/1: prevent heap corruption in OABI semtimedop
When CONFIG_OABI_COMPAT is set, the wrapper for semtimedop does not
bound the nsops argument.  A sufficiently large value will cause an
integer overflow in allocation size, followed by copying too much data
into the allocated buffer.  Fix this by restricting nsops to SEMOPM.
Untested.

Cc: stable@kernel.org
Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-04-29 15:53:14 +01:00
Ingo Molnar 301120396b perf events, x86: Add Westmere stalled-cycles-frontend/backend events
Extend the Intel Westmere PMU driver with definitions for generic front-end and
back-end stall events.

( These are only approximations. )

Reported-by: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-7y40wib8n008io7hjpn1dsrm@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-29 16:22:37 +02:00
Ingo Molnar 91fc4cc000 perf, x86: Add new stalled cycles events for Intel and AMD CPUs
Extend the Intel and AMD event definitions with generic front-end and
back-end stall events.

( These are only approximations - suggestions are welcome for better events. )

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-7y40wib8n001io7hjpn1dsrm@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-29 14:24:15 +02:00
Ingo Molnar 8f62242246 perf events: Add generic front-end and back-end stalled cycle event definitions
Add two generic hardware events: front-end and back-end stalled cycles.

These events measure conditions when the CPU is executing code but its
capabilities are not fully utilized. Understanding such situations and
analyzing them is an important sub-task of code optimization workflows.

Both events limit performance: most front end stalls tend to be caused
by branch misprediction or instruction fetch cachemisses, backend
stalls can be caused by various resource shortages or inefficient
instruction scheduling.

Front-end stalls are the more important ones: code cannot run fast
if the instruction stream is not being kept up.

An over-utilized back-end can cause front-end stalls and thus
has to be kept an eye on as well.

The exact composition is very program logic and instruction mix
dependent.

We use the terms 'stall', 'front-end' and 'back-end' loosely and
try to use the best available events from specific CPUs that
approximate these concepts.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-7y40wib8n000io7hjpn1dsrm@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-29 14:23:58 +02:00
Russell King 408133e9dc Merge branch 'kprobes' of git://git.linaro.org/people/nico/linux into fixes 2011-04-29 11:02:45 +01:00
Heiko Carstens a985183285 [S390] irqstats: fix counting of pfault, dasd diag and virtio irqs
pfault, dasd diag and virtio all use the same external interrupt number.
The respective interrupt handlers decide by the subcode if they are
meant to handle the interrupt.
Counting is currently done before looking at the subcode which means
each handler counts an interrupt even if it is not handling it.
Fix this by moving the kstat code after the code which looks at the
subcode.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-04-29 10:42:25 +02:00
Tim Gardner c7a7b814c9 ioremap: Delay sanity check until after a successful mapping
While tracking down the reason for an ioremap() failure I was
distracted  by the WARN_ONCE() in __ioremap_caller().

Performing a WARN_ONCE() sanity check before the mapping
is successful seems pointless if the caller sends bad values.

A case in point is when the BIOS provides erroneous screen_info
values causing vesafb_probe() to request an outrageuous size.
The WARN_ONCE is then wasted on bogosity. Move the warning to a
point where the mapping has been successfully allocated.

Addresses:

  http://bugs.launchpad.net/bugs/772042

Reviewed-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Link: http://lkml.kernel.org/r/4DB99D2E.9080106@canonical.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-29 08:02:47 +02:00
Jon Medhurst cdc2536115 ARM: kprobes: Tidy-up kprobes-decode.c
- Remove coding standard violations reported by checkpatch.pl
- Delete comment about handling of conditional branches which is no
  longer true.
- Delete comment at end of file which lists all ARM instructions. This
  duplicates data available in the ARM ARM and seems like an
  unnecessary maintenance burden to keep this up to date and accurate.

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-04-28 23:41:01 -04:00
Jon Medhurst 9425493078 ARM: kprobes: Add emulation of hint instructions like NOP and WFI
Being able to probe NOP instructions is useful for hard-coding probeable
locations and is used by the kprobes test code.

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-04-28 23:41:01 -04:00
Jon Medhurst 20e8155e24 ARM: kprobes: Add emulation of SBFX, UBFX, BFI and BFC instructions
These bit field manipulation instructions occur several thousand
times in an ARMv7 kernel.

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-04-28 23:41:00 -04:00
Jon Medhurst c9836777d5 ARM: kprobes: Add emulation of MOVW and MOVT instructions
The MOVW and MOVT instructions account for approximately 7% of all
instructions in a ARMv7 kernel as GCC uses them instead of a literal
pool.

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-04-28 23:40:59 -04:00
Jon Medhurst f704a6e25b ARM: kprobes: Reject probing of undefined data processing instructions
The instruction decoding in space_cccc_000x needs to reject probing of
instructions with undefined patterns as they may in future become
defined and then emulated faultily - as has already happened with the
SMC instruction.

This fix is achieved by testing for the instruction patterns we want to
probe and making the the default fall-through paths reject probes. This
also allows us to remove some explicit tests for instructions that we
wish to reject, as that is now the default action.

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-04-28 23:40:59 -04:00
Jon Medhurst 72c2bab2be ARM: kprobes: Remove redundant code in space_1111
The tests to explicitly reject probing CPS, RFE and SRS instructions
are redundant as the default case is now to reject undecoded patterns.

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-04-28 23:40:59 -04:00
Jon Medhurst 41713d1396 ARM: kprobes: Fix emulation of PLD instructions
The PLD instructions wasn't being decoded correctly and the emulation
code wasn't adjusting PC correctly.

As the PLD instruction is only a performance hint we emulate it as a
simple nop, and we can broaden the instruction decoding to take into
account newer PLI and PLDW instructions.

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-04-28 23:40:59 -04:00
Jon Medhurst f0aeb8bff0 ARM: kprobes: Reject probing of SETEND instructions
The emulation of SETEND was broken as it changed the endianess for
the running kprobes handling code. Rather than adding a new simulation
routine to fix this we'll just reject probing of SETEND as these should
be very rare in the kernel.

Note, the function emulate_none is now unused but it is left in the
source code as future patches will use it.

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-04-28 23:40:59 -04:00
Jon Medhurst ac211c6994 ARM: kprobes: Consolidate stub decoding functions
Following the change to remove support for coprocessor instructions
we are left with three stub functions which can be consolidated.

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-04-28 23:40:59 -04:00
Jon Medhurst fa1a03b429 ARM: kprobes: Reject probing of all coprocessor instructions
The kernel doesn't currently support VFP or Neon code, and probing of
code with CP15 operations is fraught with bad consequences. Therefore we
don't need the ability to probe coprocessor instructions and the code to
support this can be removed.

The removed code also had at least two bugs:
 - MRC into R15 should set CPSR not trash PC
 - LDC and STC which use PC as base register needed the address offset by 8

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-04-28 23:40:58 -04:00
Jon Medhurst c6e4ae3291 ARM: kprobes: Fix emulation of USAD8 instructions
The USAD8 instruction wasn't being explicitly decoded leading
to the incorrect emulation routine being called. It can be correctly
decoded in the same way as the signed multiply instructions so we move
the decoding there.

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-04-28 23:40:58 -04:00
Jon Medhurst 038c3839c9 ARM: kprobes: Fix emulation of SMUAD, SMUSD and SMMUL instructions
The signed multiply instructions were being decoded incorrectly.

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-04-28 23:40:58 -04:00
Jon Medhurst 8dd7cfbed8 ARM: kprobes: Fix emulation of SXTB16, SXTB, SXTH, UXTB16, UXTB and UXTH instructions
These sign extension instructions are encoded as extend-and-add
instructions where the register to add is specified as r15. The decoding
routines weren't checking for this and were using the incorrect
emulation code, giving incorrect results.

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-04-28 23:40:58 -04:00
Jon Medhurst 780b5c1162 ARM: kprobes: Reject probing of undefined media instructions
The instructions space for media instructions contains some undefined
patterns. We need to reject probing of these because they may in future
become defined and the kprobes code may then emulate them faultily.

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-04-28 23:40:58 -04:00
Jon Medhurst 0e384ed164 ARM: kprobes: Add emulation of RBIT instruction
The v6T2 RBIT instruction was accidentally being emulated correctly,
this patch adds correct decoding for the instruction.

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-04-28 23:40:57 -04:00
Jon Medhurst 81ff5720b9 ARM: kprobes: Reject probing of LDRB instructions which load PC
These instructions are specified as UNPREDICTABLE.

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-04-28 23:40:57 -04:00
Jon Medhurst 5c6b76fc7d ARM: kprobes: Fix emulation of LDRD and STRD instructions
The decoding of these instructions got the register indexed and
immediate indexed forms the wrong way around, causing incorrect
emulation.

Instructions like "LDRD Rx, [Rx]" were corrupting Rx because the base
register writeback was being performed unconditionally, overwriting the
value just loaded from memory. The fix is to only writeback the base
register when that form of the instruction is used. Note, now that we
reject probing writeback with PC the emulation code doesn't need the
check rn!=15.

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-04-28 23:40:57 -04:00
Jon Medhurst 54823accfc ARM: kprobes: Reject probing of LDR/STR instructions which update PC unpredictably
Using PC as an base register with writeback is UNPREDICTABLE, as is non
word-sized loads or stores of PC. (We only really care about preventing
loads to PC but it keeps the code simpler if we also exclude stores.)

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-04-28 23:40:57 -04:00
Jon Medhurst 6823fc85fc ARM: kprobes: Fix emulation of LDRH, STRH, LDRSB and LDRSH instructions
The decoding of these instructions got the register indexed and
immediate indexed forms the wrong way around, causing incorrect
emulation.

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-04-28 23:40:56 -04:00
Jon Medhurst ec58d7f237 ARM: kprobes: Reject probing of STREX and LDREX instructions
The emulation code for STREX and LDREX instructions is faulty, however,
rather than attempting to fix this we reject probes of these
instructions. We do this because they can never succeed in gaining
exclusive access as the exception framework clears the exclusivity
monitor when a probes breakpoint is hit. (This is a general problem
when probing all instructions executing between a LDREX and its
corresponding STREX and can lead to infinite retry loops.)

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-04-28 23:40:56 -04:00
Jon Medhurst ba48d40713 ARM: kprobes: Reject probing of undefined multiply instructions
The instructions space for 'Multiply and multiply-accumulate'
instructions contains some undefined patterns. We need to reject
probing of these because they may in future become defined and the
kprobes code may then emulate them faultily.

This has already happened with the new MLS instruction which this patch
also adds correct decoding for as well as tightening up other decoding
tests. (Before this patch the wrong emulation routine was being called
for MLS though it still produced correct results.)

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-04-28 23:40:56 -04:00
Jon Medhurst 75539aea4c ARM: kprobes: Fix error in comment
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-04-28 23:40:56 -04:00
Jon Medhurst 983ebd9365 ARM: kprobes: Reject probing of instructions which write to PC unpredictably.
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-04-28 23:40:55 -04:00
Jon Medhurst c412aba2a1 ARM: kprobes: Fix emulation of MRS instruction
The MRS instruction should set mode and interrupt bits in the read value
so it is simpler to use a new simulation routine (simulate_mrs) rather
than some modified emulation.

prep_emulate_rd12 is now unused and removed.

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-04-28 23:40:55 -04:00
Jon Medhurst 51468ea91e ARM: kprobes: Reject probing MRS instructions which read SPSR
We need to reject probing of instructions which read SPSR because
we can't handle this as the value in SPSR is lost when the exception
handler for the probe breakpoint first runs.

This patch also fixes the bitmask for MRS instructions decoding to
include checking bits 5-7.

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-04-28 23:40:55 -04:00