The current code groups up to 16 nodes in a level and then puts an
ALLNODES domain spanning the entire tree on top of that. This doesn't
reflect the numa topology and esp for the smaller not-fully-connected
machines out there today this might make a difference.
Therefore, build a proper numa topology based on node_distance().
Since there's no fixed numa layers anymore, the static SD_NODE_INIT
and SD_ALLNODES_INIT aren't usable anymore, the new code tries to
construct something similar and scales some values either on the
number of cpus in the domain and/or the node_distance() ratio.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Anton Blanchard <anton@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: linux-alpha@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-sh@vger.kernel.org
Cc: Matt Turner <mattst88@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: sparclinux@vger.kernel.org
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86@kernel.org
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Greg Pearson <greg.pearson@hp.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: bob.picco@oracle.com
Cc: chris.mason@oracle.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-r74n3n8hhuc2ynbrnp3vt954@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Use asm-generic/dma-mapping-common.h to handle all DMA mapping operations
and establish a default get_dma_ops() that forwards all operations to the
existing code.
Augment dev_archdata to carry a pointer to the struct dma_map_ops, allowing
DMA operations to be overridden on a per device basis. Currently this is
never filled in, so the default dma_map_ops are used. A follow-on patch
sets this for Octeon PCI devices.
Also initialize the dma_debug system as it is now used if it is configured.
Includes fixes by Kevin Cernekee <cernekee@gmail.com>.
Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Patchwork: http://patchwork.linux-mips.org/patch/1637/
Patchwork: http://patchwork.linux-mips.org/patch/1678/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Any function defined in a header file should be inline. This helps us
avoid 'unused' compiler warnings when we include the files in more
places in subsequent patches.
Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Patchwork: http://patchwork.linux-mips.org/patch/1636/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Now each architecture has the own dma_get_cache_alignment implementation.
dma_get_cache_alignment returns the minimum DMA alignment. Architectures
define it as ARCH_KMALLOC_MINALIGN (it's used to make sure that malloc'ed
buffer is DMA-safe; the buffer doesn't share a cache with the others). So
we can unify dma_get_cache_alignment implementations.
This patch:
dma_get_cache_alignment() needs to know if an architecture defines
ARCH_KMALLOC_MINALIGN or not (needs to know if architecture has DMA
alignment restriction). However, slab.h define ARCH_KMALLOC_MINALIGN if
architectures doesn't define it.
Let's rename ARCH_KMALLOC_MINALIGN to ARCH_DMA_MINALIGN.
ARCH_KMALLOC_MINALIGN is used only in the internals of slab/slob/slub
(except for crypto).
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
pcibus_to_node can return -1 if we cannot determine which node a pci bus
is on. If passed -1, cpumask_of_node will negatively index the lookup array
and pull in random data:
# cat /sys/devices/pci0000:00/0000:00:01.0/local_cpus
00000000,00000003,00000000,00000000
# cat /sys/devices/pci0000:00/0000:00:01.0/local_cpulist
64-65
Change cpumask_of_node to check for -1 and return cpu_all_mask in this
case:
# cat /sys/devices/pci0000:00/0000:00:01.0/local_cpus
ffffffff,ffffffff,ffffffff,ffffffff
# cat /sys/devices/pci0000:00/0000:00:01.0/local_cpulist
0-127
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andrew Morton <akpm@linux-foundation.org>
Patchwork: http://patchwork.linux-mips.org/patch/831/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Sysbench thinks SD_BALANCE_WAKE is too agressive and kbuild doesn't
really mind too much, SD_BALANCE_NEWIDLE picks up most of the
slack.
On a dual socket, quad core, dual thread nehalem system:
sysbench (--num_threads=16):
SD_BALANCE_WAKE-: 13982 tx/s
SD_BALANCE_WAKE+: 15688 tx/s
kbuild (-j16):
SD_BALANCE_WAKE-: 47.648295846 seconds time elapsed ( +- 0.312% )
SD_BALANCE_WAKE+: 47.608607360 seconds time elapsed ( +- 0.026% )
(same within noise)
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The problem with wake_idle() is that is doesn't respect things like
cpu_power, which means it doesn't deal well with SMT nor the recent
RT interaction.
To cure this, it needs to do what sched_balance_self() does, which
leads to the possibility of merging select_task_rq_fair() and
sched_balance_self().
Modify sched_balance_self() to:
- update_shares() when walking up the domain tree,
(it only called it for the top domain, but it should
have done this anyway), which allows us to remove
this ugly bit from try_to_wake_up().
- do wake_affine() on the smallest domain that contains
both this (the waking) and the prev (the wakee) cpu for
WAKE invocations.
Then use the top-down balance steps it had to replace wake_idle().
This leads to the dissapearance of SD_WAKE_BALANCE and
SD_WAKE_IDLE_FAR, with SD_WAKE_IDLE replaced with SD_BALANCE_WAKE.
SD_WAKE_AFFINE needs SD_BALANCE_WAKE to be effective.
Touch all topology bits to replace the old with new SD flags --
platforms might need re-tuning, enabling SD_BALANCE_WAKE
conditionally on a NUMA distance seems like a good additional
feature, magny-core and small nehalem systems would want this
enabled, systems with slow interconnects would not.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Everyone defines it, and only one person uses it
(arch/mips/sgi-ip27/ip27-nmi.c). So just open code it there.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: linux-mips@linux-mips.org
We add a dev parameter to plat_unmap_dma_mem(), and hooks for
plat_dma_supported() and plat_extra_sync_for_device() which should be
nop changes for all existing targets.
Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Impact: New APIs
The old node_to_cpumask/node_to_pcibus returned a cpumask_t: these
return a pointer to a struct cpumask. Part of removing cpumasks from
the stack.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Ralf Baechle <ralf@linux-mips.org>
Mathieu Desnoyers reported this build failure on powerpc:
kernel/sched.c: In function 'sd_init_NODE':
kernel/sched.c:7319: error: non-static initialization of a flexible array member
kernel/sched.c:7319: error: (near initialization for '(anonymous)')
this happens because .span changed to cpumask_var_t, hence
the static CPU_MASK_NONE initializers in the SD_*_INIT
templates are not type-correct anymore.
Remove them, as they default to empty anyway.
Also remove them from IA64, MIPS and SH.
Reported-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: Ingo Molnar <mingo@elte.hu>