Commit Graph

57027 Commits

Author SHA1 Message Date
Chris Metcalf 8aaf1dda42 arch/tile: use better definitions of xchg() and cmpxchg()
These definitions use a ({}) construct to avoid some cases where
we were getting warnings about unused return values.  We also
promote the definition to the common <asm/atomic.h>, since it applies
to both the 32- and 64-bit atomics.

In addition, define __HAVE_ARCH_CMPXCHG for TILE-Gx since it has
efficient direct atomic instructions.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2011-05-19 22:55:49 -04:00
Chris Metcalf dd196a2b3d tile: add an RTC driver for the Tilera hypervisor
This is a simple RTC driver that lets Tilera hardware boot up and
set the clock correctly.

Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2011-05-17 14:44:36 -04:00
Chris Metcalf 18aecc2b64 arch/tile: finish enabling support for TILE-Gx 64-bit chip
This support was partially present in the existing code (look for
"__tilegx__" ifdefs) but with this change you can build a working
kernel using the TILE-Gx toolchain and ARCH=tilegx.

Most of these files are new, generally adding a foo_64.c file
where previously there was just a foo_32.c file.

The ARCH=tilegx directive redirects to arch/tile, not arch/tilegx,
using the existing SRCARCH mechanism in the top-level Makefile.

Changes to existing files:

- <asm/bitops.h> and <asm/bitops_32.h> changed to factor the
  include of <asm-generic/bitops/non-atomic.h> in the common header.

- <asm/compat.h> and arch/tile/kernel/compat.c changed to remove
  the "const" markers I had put on compat_sys_execve() when trying
  to match some recent similar changes to the non-compat execve.
  It turns out the compat version wasn't "upgraded" to use const.

- <asm/opcode-tile_64.h> and <asm/opcode_constants_64.h> were
  previously included accidentally, with the 32-bit contents.  Now
  they have the proper 64-bit contents.

Finally, I had to hack the existing hacky drivers/input/input-compat.h
to add yet another "#ifdef" for INPUT_COMPAT_TEST (same as x86_64).

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> [drivers/input]
2011-05-12 15:52:12 -04:00
Chris Metcalf be84cb4383 compat: fixes to allow working with tile arch
The existing <asm-generic/unistd.h> mechanism doesn't really provide
enough to create the 64-bit "compat" ABI properly in a generic way,
since the compat ABI is a mix of things were you can re-use the 64-bit
versions of syscalls and things where you need a compat wrapper.

To provide this in the most direct way possible, I added two new macros
to go along with the existing __SYSCALL and __SC_3264 macros: __SC_COMP
and SC_COMP_3264.  These macros take an additional argument, typically a
"compat_sys_xxx" function, which is passed to __SYSCALL if you define
__SYSCALL_COMPAT when including the header, resulting in a pointer to
the compat function being placed in the generated syscall table.

The change also adds some missing definitions to <linux/compat.h> so that
it actually has declarations for all the compat syscalls, since the
"[nr] = ##call" approach requires proper C declarations for all the
functions included in the syscall table.

Finally, compat.c defines compat_sys_sigpending() and
compat_sys_sigprocmask() even if the underlying architecture doesn't
request it, which tries to pull in undefined compat_old_sigset_t defines.
We need to guard those compat syscall definitions with appropriate
__ARCH_WANT_SYS_xxx ifdefs.

Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2011-05-12 15:51:36 -04:00
Chris Metcalf d2e48c1d41 arch/tile: update defconfig file to something more useful
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2011-05-04 14:42:02 -04:00
Oleg Nesterov ceca3c193e tile: do_hardwall_trap: do not play with task->sighand
1. do_hardwall_trap() checks ->sighand != NULL and then takes ->siglock.

   This is unsafe even if the task can't run (I assume it is pinned to
   the same CPU), its parent can reap the task and set ->sighand = NULL
   right after this check. Even if the compiler dosn't read ->sighand
   twice and this memory can't to away __group_send_sig_info() is wrong
   after that. Use do_send_sig_info().

2. Send SIGILL to the thread, not to the whole process. Unless it has
   the handler or blocked this kills the whole thread-group as before.
   IIUC, different threads can be bound to different rect's.

3. Check PF_EXITING instead of ->sighand. A zombie thread can go away
   but its ->sighand can be !NULL.

Reported-by: Matt Fleming <matt@console-pimps.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2011-05-04 14:41:53 -04:00
KOSAKI Motohiro dc0b124d8e tile: replace mm->cpu_vm_mask with mm_cpumask()
We plan to change mm->cpu_vm_mask definition later. Thus, this patch convert
it into proper macro.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2011-05-04 14:41:44 -04:00
James Hogan ef0aaf873e tile,mn10300: add device parameter to dma_cache_sync()
Since v2.6.20 "Pass struct dev pointer to dma_cache_sync()"
(d3fa72e455), dma_cache_sync() takes a
struct dev pointer, but these appear to be missing from the tile and
mn10300 implementations, so add them.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
[cmetcalf@tilera.com: took only the "tile" portion as I don't maintain mn10300]
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2011-05-04 14:41:36 -04:00
Chris Metcalf d07bd86d82 arch/tile: clarify flush_buffer()/finv_buffer() function names
They are only applicable for locally-homecached memory ranges, so
change their names to {flush,finv}_buffer_local().  Change inv_buffer()
to just do an mf instead of any kind of fancier barrier, since you're
obviously not going to be waiting for anything once the local homecache
is invalidated.

Fix tilepro.c network driver not to bother calling finv_buffer when
stopping the EPP, but just mf after memset to ensure that it will not
see any packet data after we finish stopping; use finv_buffer_remote()
when doing exit-time cleanup.

This also fixes a (not very interesting) generic Linux build failure
where drivers/scsi/st.c declares its own flush_buffer().

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2011-05-04 14:41:20 -04:00
Chris Metcalf 5386e73589 arch/tile: kernel-related cleanups from removing static page size
User space code has been able to discover the static page size
by including a special <hv/pagesize.h> file.  In the current release,
that file is now gone, and <asm/page.h> doesn't rely on it.  The
getpagesize() API is now the only way for userspace to get the page size.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2011-05-04 14:41:13 -04:00
Chris Metcalf 28d717411b arch/tile: various header improvements for building drivers
This change adds a number of missing headers in asm (fb.h, parport.h,
serial.h, and vga.h) using the minimal generic versions.

It also adds a number of missing interfaces that showed up as build
failures when trying to build various drivers not normally included in the
"tile" distribution: ioremap_wc(), memset_io(), io{read,write}{16,32}be(),
virt_to_bus(), bus_to_virt(), irq_canonicalize(), __pte(), __pgd(),
and __pmd().  I also added a cast in virt_to_page() since not all callers
pass a pointer.

I fixed <asm/stat.h> to properly include a __KERNEL__ guard for the
__ARCH_WANT_STAT64 symbol, and <asm/swab.h> to use __builtin_bswap32()
even for our 64-bit architecture, since the same code is produced.

I added an export for get_cycles(), since it's used in some modules.

And I made <arch/spr_def.h> properly include the __KERNEL__ guard,
even though it's not yet exported, since it likely will be soon.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2011-05-04 14:40:54 -04:00
Chris Metcalf dbb434214e arch/tile: disable GX prefetcher during cache flush
Otherwise, it's possible to end up with the prefetcher pulling
data into cache that the code believes has been flushed.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2011-05-04 14:40:46 -04:00
Chris Metcalf 43d9ebba93 arch/tile: tolerate disabling CONFIG_BLK_DEV_INITRD
The code accidentally was relying on this configuration option
being selected.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2011-05-04 14:40:36 -04:00
Chris Metcalf 229f4df1fb arch/tile: properly flush the I$ when unloading kernel modules
Otherwise, in principle, there could be stale I$ data present
next time the page that previously held the kernel module code was
used to run some new code.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2011-05-04 14:40:25 -04:00
Chris Metcalf 7194988fb5 arch/tile: disable SD_WAKE_AFFINE flag on CPU/NODE scheduling domain
This allows processes to spread more effectively to multiple cores
(particularly important on 64-core chips!).

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2011-05-04 14:40:16 -04:00
Chris Metcalf df29ccb6c0 arch/tile: allow nonatomic stores to interoperate with fast atomic syscalls
This semantic was already true for atomic operations within the kernel,
and this change makes it true for the fast atomic syscalls (__NR_cmpxchg
and __NR_atomic_update) as well.  Previously, user-space had to use
the fast atomic syscalls exclusively to update memory, since raw stores
could lose a race with the atomic update code even when the atomic update
hadn't actually modified the value.

With this change, we no longer write back the value to memory if it
hasn't changed.  This allows certain types of idioms in user space to
work as expected, e.g. "atomic exchange" to acquire a spinlock, followed
by a raw store of zero to release the lock.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2011-05-04 14:40:07 -04:00
Chris Metcalf 398fa5a931 arch/tile: improve support for PCI hotplug
Note that this is not complete hot-plug support; hot-unplug is not included.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2011-05-04 14:39:53 -04:00
Chris Metcalf 313ce674d3 arch/tile: support TIF_NOTIFY_RESUME
This support is required for CONFIG_KEYS, NFSv4 kernel DNS, etc.
The change is slightly more complex than the minimal thing, since
I took advantage of having to go into the assembly code to just
move a bunch of stuff into C code: specifically, the schedule(),
do_async_page_fault(), do_signal(), and single_step_once() support,
in addition to the TIF_NOTIFY_RESUME support.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2011-05-02 18:53:35 -04:00
Chris Metcalf 93013a0f53 arch/tile: refactor backtracing code
This change is the result of some work to make the backtrace code more
shareable between kernel, libc, and gdb.

For the kernel, some good effects are to eliminate the hacky
"VirtualAddress" typedef in favor of "unsigned long", to eliminate a
bunch of spurious kernel doc comments, to remove the dead "bt_read_memory"
function, and to use "__tilegx__" in #ifdefs instead of "TILE_CHIP".

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2011-05-02 13:49:14 -04:00
Linus Torvalds fc7b3ff1ac Merge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6:
  [S390] kvm-390: Let kernel exit SIE instruction on work
  [S390] dasd: check sense type in device change handler
  [S390] pfault: fix token handling
  [S390] qdio: reset error states immediately
  [S390] fix page table walk for changing page attributes
  [S390] prng: prevent access beyond end of stack
  [S390] dasd: fix race between open and offline
2011-04-26 11:38:48 -07:00
Linus Torvalds 71e9e6a582 Merge branch 'for-torvalds' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-stericsson
* 'for-torvalds' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-stericsson:
  rtc: fix coh901331 startup crash
  mach-ux500: fix i2c0 device setup regression
2011-04-25 19:00:55 -07:00
Linus Torvalds 686c4cbb10 Merge branch 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6
* 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
  PM: Add missing syscore_suspend() and syscore_resume() calls
  PM: Fix error code paths executed after failing syscore_suspend()
2011-04-23 22:35:16 -07:00
Linus Torvalds 8d082f8f3f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
  ALSA: hda - Fix unused warnings when !SND_HDA_NEEDS_RESUME
  ALSA: hda - Add a fix-up for Acer dmic with ALC271x codec
  ASoC: add a module alias to the FSI driver
  ALSA: emu10k1 - Fix "Music" controls to "Synth" controls in documents
  ARM: s3c2440: gta02; Register dfbmcs320 device for BT audio interface
  ASoC: codecs: JZ4740: Fix OOPS
  ASoC: Fix output PGA enabling in wm_hubs CODECs
  ASoC: sn95031: decorate function with __devexit_p()
  ASoC: SAMSUNG: Fix the inverted clocks handling for pcm driver
  ASoC: sst_platform: Fix lock acquring
  ASoC: fsi: driver safely remove for against irq
  ASoC: fsi: modify vague PM control on probe
  ASoC: fsi: take care in failing case of dai register
  MAINTAINERS: Update Samsung ASoC maintainer's id
  ASoC: WM8903: HP and Line out PGA/mixer DAPM fixes
  ASoC: Set left channel volume update bits for WM8994
  ASoC: fix config error path
  ASoC: check channel mismatch between cpu_dai and codec_dai
  ASoC: Tegra: Suspend/resume support
2011-04-22 14:59:07 -07:00
Linus Torvalds 258ba6a5a9 Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf, x86: Update/fix Intel Nehalem cache events
  perf, x86: P4 PMU - Don't forget to clear cpuc->active_mask on overflow
  x86, perf event: Turn off unstructured raw event access to offcore registers
  perf: Support Xeon E7's via the Westmere PMU driver
2011-04-22 11:31:27 -07:00
Linus Torvalds d6d61c97e6 Merge branch 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  xtensa: Fixup irq conversion fallout and nmi_count
2011-04-22 11:31:21 -07:00
Peter Zijlstra f4929bd372 perf, x86: Update/fix Intel Nehalem cache events
Change the Nehalem cache events to use retired memory instruction counters
(similar to Westmere), this greatly improves the provided stats.

Using:

main ()
{
        int i;

        for (i = 0; i < 1000000000; i++) {
                asm("mov (%%rsp), %%rbx;"
                    "mov %%rbx, (%%rsp);" : : : "rbx");
        }
}

We find:

 $ perf stat --repeat 10 -e instructions:u -e l1-dcache-loads:u -e l1-dcache-stores:u ./loop_1b_loads+stores
  Performance counter stats for './loop_1b_loads+stores' (10 runs):
      4,000,081,056 instructions:u           #      0.000 IPC ( +-   0.000% )
      4,999,502,846 l1-dcache-loads:u          ( +-   0.008% )
      1,000,034,832 l1-dcache-stores:u         ( +-   0.000% )
         1.565184942  seconds time elapsed   ( +-   0.005% )

The 5b is surprising - we'd expect 1b:

 $ perf stat --repeat 10 -e instructions:u -e r10b:u -e l1-dcache-stores:u ./loop_1b_loads+stores
  Performance counter stats for './loop_1b_loads+stores' (10 runs):
      4,000,081,054 instructions:u           #      0.000 IPC ( +-   0.000% )
      1,000,021,961 r10b:u                     ( +-   0.000% )
      1,000,030,951 l1-dcache-stores:u         ( +-   0.000% )
         1.565055422  seconds time elapsed   ( +-   0.003% )

Which this patch thus fixes.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Lin Ming <ming.m.lin@intel.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Link: http://lkml.kernel.org/n/tip-q9rtru7b7840tws75xzboapv@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-22 13:50:27 +02:00
Cyrill Gorcunov 1ea5a6afd9 perf, x86: P4 PMU - Don't forget to clear cpuc->active_mask on overflow
It's not enough to simply disable event on overflow the
cpuc->active_mask should be cleared as well otherwise counter
may stall in "active" even in real being already disabled (which
potentially may lead to the situation that user may not use this
counter further).

Don pointed out that:

 " I also noticed this patch fixed some unknown NMIs
   on a P4 when I stressed the box".

Tested-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Link: http://lkml.kernel.org/r/1303398203-2918-3-git-send-email-dzickus@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-22 10:21:34 +02:00
Ingo Molnar b52c55c6a2 x86, perf event: Turn off unstructured raw event access to offcore registers
Andi Kleen pointed out that the Intel offcore support patches were merged
without user-space tool support to the functionality:

 |
 | The offcore_msr perf kernel code was merged into 2.6.39-rc*, but the
 | user space bits were not. This made it impossible to set the extra mask
 | and actually do the OFFCORE profiling
 |

Andi submitted a preliminary patch for user-space support, as an
extension to perf's raw event syntax:

 |
 | Some raw events -- like the Intel OFFCORE events -- support additional
 | parameters. These can be appended after a ':'.
 |
 | For example on a multi socket Intel Nehalem:
 |
 |    perf stat -e r1b7:20ff -a sleep 1
 |
 | Profile the OFFCORE_RESPONSE.ANY_REQUEST with event mask REMOTE_DRAM_0
 | that measures any access to DRAM on another socket.
 |

But this kind of usability is absolutely unacceptable - users should not
be expected to type in magic, CPU and model specific incantations to get
access to useful hardware functionality.

The proper solution is to expose useful offcore functionality via
generalized events - that way users do not have to care which specific
CPU model they are using, they can use the conceptual event and not some
model specific quirky hexa number.

We already have such generalization in place for CPU cache events,
and it's all very extensible.

"Offcore" events measure general DRAM access patters along various
parameters. They are particularly useful in NUMA systems.

We want to support them via generalized DRAM events: either as the
fourth level of cache (after the last-level cache), or as a separate
generalization category.

That way user-space support would be very obvious, memory access
profiling could be done via self-explanatory commands like:

  perf record -e dram ./myapp
  perf record -e dram-remote ./myapp

... to measure DRAM accesses or more expensive cross-node NUMA DRAM
accesses.

These generalized events would work on all CPUs and architectures that
have comparable PMU features.

( Note, these are just examples: actual implementation could have more
  sophistication and more parameter - as long as they center around
  similarly simple usecases. )

Now we do not want to revert *all* of the current offcore bits, as they
are still somewhat useful for generic last-level-cache events, implemented
in this commit:

  e994d7d23a0b: perf: Fix LLC-* events on Intel Nehalem/Westmere

But we definitely do not yet want to expose the unstructured raw events
to user-space, until better generalization and usability is implemented
for these hardware event features.

( Note: after generalization has been implemented raw offcore events can be
  supported as well: there can always be an odd event that is marginally
  useful but not useful enough to generalize. DRAM profiling is definitely
  *not* such a category so generalization must be done first. )

Furthermore, PERF_TYPE_RAW access to these registers was not intended
to go upstream without proper support - it was a side-effect of the above
e994d7d23a commit, not mentioned in the changelog.

As v2.6.39 is nearing release we go for the simplest approach: disable
the PERF_TYPE_RAW offcore hack for now, before it escapes into a released
kernel and becomes an ABI.

Once proper structure is implemented for these hardware events and users
are offered usable solutions we can revisit this issue.

Reported-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1302658203-4239-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-22 10:02:53 +02:00
Andi Kleen b2508e828d perf: Support Xeon E7's via the Westmere PMU driver
There's a new model number public, 47, for Xeon E7 (aka Westmere EX).

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: a.p.zijlstra@chello.nl
Link: http://lkml.kernel.org/r/1303429715-10202-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-22 08:27:29 +02:00
David Rientjes 7a6c654782 x86, numa: Fix cpu nodemasks for NUMA emulation and CONFIG_DEBUG_PER_CPU_MAPS
The cpu<->node mappings under CONFIG_DEBUG_PER_CPU_MAPS=y
when NUMA emulation is enabled is currently broken because it does
not iterate through every emulated node and bind cpus that have
affinity to it.

NUMA emulation should bind each cpu to every local node to
accurately represent the true NUMA topology of the underlying
machine.

debug_cpumask_set_cpu() needs to be fixed at the same time so
that the debugging information that it emits shows the new
cpumask of the node being assigned when the cpu is being added
or removed.

It can now take responsibility of setting or clearing the cpu
itself to remove the need for duplicate code.

Also change its last parameter, "enable", to have the correct bool
type since it can only be true or false.

 -v2: Fix the return statements, by Kosaki Motohiro

Acked-and-Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Andreas Herrmann <herrmann.der.user@googlemail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.00.1104201918470.12634@chino.kir.corp.google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-21 11:31:00 +02:00
David Rientjes 37f8527dbf Revert "x86, NUMA: Fix fakenuma boot failure"
Andreas Herrmann reported that 7d6b46707f ("x86, NUMA: Fix fakenuma
boot failure") causes certain physical NUMA topologies (for example
AMD Magny-Cours) to move sibling cpus to a single node when in reality
they are in separate domains.

This may result in some nodes being completely void of cpus, which
doesn't accurately represent the correct topology. The system will
boot, but will have suboptimal NUMA performance.

This commit was intended as a fix for NUMA emulation, but should
not cause a regression for real NUMA machines as a side effect.

( There will be a separate fix for the numa-debug code, which
  will not affect physical topologies. )

Reported-by: Andreas Herrmann <herrmann.der.user@googlemail.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.00.1104201918110.12634@chino.kir.corp.google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-21 11:30:59 +02:00
Linus Torvalds f3e96492f6 Merge branch 'fixes' of master.kernel.org:/home/rmk/linux-2.6-arm
* 'fixes' of master.kernel.org:/home/rmk/linux-2.6-arm:
  ARM: 6881/1: cputype.h uses __attribute_const__ which requires including kernel.h
  ARM: Add new syscalls
2011-04-20 17:40:45 -07:00
Linus Torvalds 8653b3f1d5 Merge branch 'stable/bug-fixes-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/bug-fixes-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen: mask_rw_pte: do not apply the early_ioremap checks on x86_32
  xen: do not create the extra e820 region at an addr lower than 4G
2011-04-20 17:40:25 -07:00
Linus Walleij cf568c58eb mach-ux500: fix i2c0 device setup regression
Adding two sets of I2C devices to the same bus doesn't quite work,
atleast not anymore. Stash one array and determine how much of it
shall be added instead.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2011-04-20 18:43:53 +02:00
Stefano Stabellini ee176455e2 xen: mask_rw_pte: do not apply the early_ioremap checks on x86_32
The two "is_early_ioremap_ptep" checks in mask_rw_pte are only used on
x86_64, in fact early_ioremap is not used at all to setup the initial
pagetable on x86_32.
Moreover on x86_32 the two checks are wrong because the range
pgt_buf_start..pgt_buf_end initially should be mapped RW because
the pages in the range are not pagetable pages yet and haven't been
cleared yet. Afterwards considering the pgt_buf_start..pgt_buf_end is
part of the initial mapping, xen_alloc_pte is capable of turning
the ptes RO when they become pagetable pages.

Fix the issue and improve the readability of the code providing two
different implementation of mask_rw_pte for x86_32 and x86_64.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-04-20 09:43:13 -04:00
Stefano Stabellini 24bdb0b62c xen: do not create the extra e820 region at an addr lower than 4G
Do not add the extra e820 region at a physical address lower than 4G
because it breaks e820_end_of_low_ram_pfn().

It is OK for us to move the xen_extra_mem_start up and down because this
is the index of the memory that can be ballooned in/out - it is memory
not available to the kernel during bootup.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-04-20 09:04:40 -04:00
Carsten Otte 9ff4cfb3fc [S390] kvm-390: Let kernel exit SIE instruction on work
From: Christian Borntraeger <borntraeger@de.ibm.com>

This patch fixes the sie exit on interrupts. The low level
interrupt handler returns to the PSW address in pt_regs and not
to the PSW address in the lowcore.
Without this fix a cpu bound guest might never leave guest state
since the host interrupt handler would blindly return to the
SIE instruction, even on need_resched and friends.

Cc: stable@kernel.org
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-04-20 10:15:44 +02:00
Heiko Carstens e35c76cd47 [S390] pfault: fix token handling
f6649a7e "[S390] cleanup lowcore access from external interrupts" changed
handling of external interrupts. Instead of letting the external interrupt
handlers accessing the per cpu lowcore the entry code of the kernel reads
already all fields that are necessary and passes them to the handlers.
The pfault interrupt handler was incorrectly converted. It tries to
dereference a value which used to be a pointer to a lowcore field. After
the conversion however it is not anymore the pointer to the field but its
content. So instead of a dereference only a cast is needed to get the
task pointer that caused the pfault.

Fixes a NULL pointer dereference and a subsequent kernel crash:

Unable to handle kernel pointer dereference at virtual kernel address (null)
Oops: 0004 [#1] SMP
Modules linked in: nfsd exportfs nfs lockd fscache nfs_acl auth_rpcgss sunrpc
                   loop qeth_l3 qeth vmur ccwgroup ext3 jbd mbcache dm_mod
                   dasd_eckd_mod dasd_diag_mod dasd_mod
CPU: 0 Not tainted 2.6.38-2-s390x #1
Process cron (pid: 1106, task: 000000001f962f78, ksp: 000000001fa0f9d0)
Krnl PSW : 0404200180000000 000000000002c03e (pfault_interrupt+0xa2/0x138)
           R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:0 CC:2 PM:0 EA:3
Krnl GPRS: 0000000000000000 0000000000000001 0000000000000000 0000000000000001
           000000001f962f78 0000000000518968 0000000090000002 000000001ff03280
           0000000000000000 000000000064f000 000000001f962f78 0000000000002603
           0000000006002603 0000000000000000 000000001ff7fe68 000000001ff7fe48
Krnl Code: 000000000002c036: 5820d010            l       %r2,16(%r13)
           000000000002c03a: 1832                lr      %r3,%r2
           000000000002c03c: 1a31                ar      %r3,%r1
          >000000000002c03e: ba23d010            cs      %r2,%r3,16(%r13)
           000000000002c042: a744fffc            brc     4,2c03a
           000000000002c046: a7290002            lghi    %r2,2
           000000000002c04a: e320d0000024        stg     %r2,0(%r13)
           000000000002c050: 07f0                bcr     15,%r0
Call Trace:
 ([<000000001f962f78>] 0x1f962f78)
  [<000000000001acda>] do_extint+0xf6/0x138
  [<000000000039b6ca>] ext_no_vtime+0x30/0x34
  [<000000007d706e04>] 0x7d706e04
Last Breaking-Event-Address:
  [<0000000000000000>] 0x0

For stable maintainers:
the first kernel which contains this bug is 2.6.37.

Reported-by: Stephen Powell <zlinuxman@wowway.com>
Cc: Jonathan Nieder <jrnieder@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-04-20 10:15:44 +02:00
Jan Glauber e4c031b4f2 [S390] fix page table walk for changing page attributes
The page table walk for changing page attributes used the wrong
address for pgd/pud/pmd lookups if the range was bigger than
a pmd entry. Fix the lookup by using the correct address.

Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-04-20 10:15:43 +02:00
Jan Glauber c708c57e24 [S390] prng: prevent access beyond end of stack
While initializing the state of the prng only the first 8 bytes of
random data where used, the second 8 bytes were read from the memory
after the stack. If only 64 bytes of the kernel stack are used and
CONFIG_DEBUG_PAGEALLOC is enabled a kernel panic may occur because of
the invalid page access. Use the correct multiplicator to stay within
the random data buffer.

Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-04-20 10:15:43 +02:00
Rafael J. Wysocki 19234c0819 PM: Add missing syscore_suspend() and syscore_resume() calls
Device suspend/resume infrastructure is used not only by the suspend
and hibernate code in kernel/power, but also by APM, Xen and the
kexec jump feature.  However, commit 40dc166cb5
(PM / Core: Introduce struct syscore_ops for core subsystems PM)
failed to add syscore_suspend() and syscore_resume() calls to that
code, which generally leads to breakage when the features in question
are used.

To fix this problem, add the missing syscore_suspend() and
syscore_resume() calls to arch/x86/kernel/apm_32.c, kernel/kexec.c
and drivers/xen/manage.c.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
2011-04-20 00:36:11 +02:00
Thomas Gleixner 2ea4db65be xtensa: Fixup irq conversion fallout and nmi_count
Some unnamed moron fatfingered the arguments of the irq chip callbacks
to irq_chip instead of irq_data.

While at it remove the nmi_count() print in arch_show_interrupts()
which has been broken before the irq conversion already.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-04-20 00:32:09 +02:00
Linus Torvalds 9d914b3ef3 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, gart: Make sure GART does not map physmem above 1TB
  x86, gart: Set DISTLBWALKPRB bit always
  x86, gart: Convert spaces to tabs in enable_gart_translation
2011-04-19 10:58:13 -07:00
Linus Torvalds 96ad999918 Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf, x86: Fix AMD family 15h FPU event constraints
  perf, x86: Fix pre-defined cache-misses event for AMD family 15h cpus
  perf evsel: Fix use of inherit
  perf hists browser: Fix seg fault when annotate null symbol
2011-04-19 10:56:02 -07:00
Robert Richter 855357a217 perf, x86: Fix AMD family 15h FPU event constraints
Depending on the unit mask settings some FPU events may be scheduled
only on cpu counter #3. This patch fixes this.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@googlemail.com>
Link: http://lkml.kernel.org/r/1302913676-14352-3-git-send-email-robert.richter@amd.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-19 10:07:55 +02:00
Andre Przywara 83112e688f perf, x86: Fix pre-defined cache-misses event for AMD family 15h cpus
With AMD cpu family 15h a unit mask was introduced for the Data Cache
Miss event (0x041/L1-dcache-load-misses). We need to enable bit 0
(first data cache miss or streaming store to a 64 B cache line) of
this mask to proper count data cache misses.

Now we set this bit for all families and models. In case a PMU does
not implement a unit mask for event 0x041 the bit is ignored.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1302913676-14352-2-git-send-email-robert.richter@amd.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-19 10:07:54 +02:00
Linus Torvalds e024f69de9 Merge branch 'for-39-rc4' of git://codeaurora.org/quic/kernel/davidb/linux-msm
* 'for-39-rc4' of git://codeaurora.org/quic/kernel/davidb/linux-msm:
  msm: timer: fix missing return value
  msm: Remove extraneous ffa device check
2011-04-18 15:44:29 -07:00
Joerg Roedel 665d3e2af8 x86, gart: Make sure GART does not map physmem above 1TB
The GART can only map physical memory below 1TB. Make sure
the gart driver in the kernel does not try to map memory
above 1TB.

Cc: <stable@kernel.org>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Link: http://lkml.kernel.org/r/1303134346-5805-5-git-send-email-joerg.roedel@amd.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2011-04-18 09:26:49 -07:00
Joerg Roedel c34151a742 x86, gart: Set DISTLBWALKPRB bit always
The DISTLBWALKPRB bit must be set for the GART because the
gatt table is mapped UC. But the current code does not set
the bit at boot when the BIOS setup the aperture correctly.
Fix that by setting this bit when enabling the GART instead
of the other places.

Cc: <stable@kernel.org>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Link: http://lkml.kernel.org/r/1303134346-5805-4-git-send-email-joerg.roedel@amd.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2011-04-18 09:26:48 -07:00
Joerg Roedel af289bfe15 x86, gart: Convert spaces to tabs in enable_gart_translation
Probably by copy&paste this function was indented by spaces.
Convert this to tabs.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Link: http://lkml.kernel.org/r/1303134346-5805-3-git-send-email-joerg.roedel@amd.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2011-04-18 09:26:48 -07:00