linux

Commit Graph

Author	SHA1	Message	Date
David Rientjes	8f9f8d9e80	slab: add memory hotplug support Slab lacks any memory hotplug support for nodes that are hotplugged without cpus being hotplugged. This is possible at least on x86 CONFIG_MEMORY_HOTPLUG_SPARSE kernels where SRAT entries are marked ACPI_SRAT_MEM_HOT_PLUGGABLE and the regions of RAM represent a seperate node. It can also be done manually by writing the start address to /sys/devices/system/memory/probe for kernels that have CONFIG_ARCH_MEMORY_PROBE set, which is how this patch was tested, and then onlining the new memory region. When a node is hotadded, a nodelist for that node is allocated and initialized for each slab cache. If this isn't completed due to a lack of memory, the hotadd is aborted: we have a reasonable expectation that kmalloc_node(nid) will work for all caches if nid is online and memory is available. Since nodelists must be allocated and initialized prior to the new node's memory actually being online, the struct kmem_list3 is allocated off-node due to kmalloc_node()'s fallback. When an entire node would be offlined, its nodelists are subsequently drained. If slab objects still exist and cannot be freed, the offline is aborted. It is possible that objects will be allocated between this drain and page isolation, so it's still possible that the offline will still fail, however. Acked-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2010-04-07 19:28:31 +03:00
Joe Perches	e92dd4fd1a	slab: Fix continuation lines Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2010-03-28 20:08:16 +03:00
Pekka Enberg	e2b093f3e9	Merge branches 'slab/cleanups', 'slab/failslab', 'slab/fixes' and 'slub/percpu' into slab-for-linus	2010-03-04 12:07:50 +02:00
Dmitry Monakhov	4c13dd3b48	failslab: add ability to filter slab caches This patch allow to inject faults only for specific slabs. In order to preserve default behavior cache filter is off by default (all caches are faulty). One may define specific set of slabs like this: # mark skbuff_head_cache as faulty echo 1 > /sys/kernel/slab/skbuff_head_cache/failslab # Turn on cache filter (off by default) echo 1 > /sys/kernel/debug/failslab/cache-filter # Turn on fault injection echo 1 > /sys/kernel/debug/failslab/times echo 1 > /sys/kernel/debug/failslab/probability Acked-by: David Rientjes <rientjes@google.com> Acked-by: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2010-02-26 19:19:39 +02:00
Nick Piggin	44b57f1cc7	slab: fix regression in touched logic When factoring common code into transfer_objects in commit `3ded175` ("slab: add transfer_objects() function"), the 'touched' logic got a bit broken. When refilling from the shared array (taking objects from the shared array), we are making use of the shared array so it should be marked as touched. Subsequently pulling an element from the cpu array and allocating it should also touch the cpu array, but that is taken care of after the alloc_done label. (So yes, the cpu array was getting touched = 1 twice). So revert this logic to how it worked in earlier kernels. This also affects the behaviour in __drain_alien_cache, which would previously 'touch' the shared array and now does not. I think it is more logical not to touch there, because we are pushing objects into the shared array rather than pulling them off. So there is no good reason to postpone reaping them -- if the shared array is getting utilized, then it will get 'touched' in the alloc path (where this patch now restores the touch). Acked-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2010-01-30 15:02:39 +02:00
Haicheng Li	f3186a9c51	slab: initialize unused alien cache entry as NULL at alloc_alien_cache(). Comparing with existing code, it's a simpler way to use kzalloc_node() to ensure that each unused alien cache entry is NULL. CC: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Andi Kleen <ak@linux.intel.com> Acked-by: Christoph Lameter <cl@linux-foundation.org> Acked-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2010-01-11 18:56:07 +02:00
Pekka Enberg	00afa75806	SLAB: Fix lockdep annotation breakage Commit `ce79ddc8e2` ("SLAB: Fix lockdep annotations for CPU hotplug") broke init_node_lock_keys() off-slab logic which causes lockdep false positives. Fix that up by reverting the logic back to original while keeping CPU hotplug fixes intact. Reported-and-tested-by: Heiko Carstens <heiko.carstens@de.ibm.com> Reported-and-tested-by: Andi Kleen <andi@firstfloor.org> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2009-12-28 20:57:27 +02:00
Linus Torvalds	55db493b65	Merge branch 'cpumask-cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus * 'cpumask-cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: cpumask: rename tsk_cpumask to tsk_cpus_allowed cpumask: don't recommend set_cpus_allowed hack in Documentation/cpu-hotplug.txt cpumask: avoid dereferencing struct cpumask cpumask: convert drivers/idle/i7300_idle.c to cpumask_var_t cpumask: use modern cpumask style in drivers/scsi/fcoe/fcoe.c cpumask: avoid deprecated function in mm/slab.c cpumask: use cpu_online in kernel/perf_event.c	2009-12-17 17:00:20 -08:00
Linus Torvalds	dcc7cd0112	Merge branch 'kmemleak' of git://linux-arm.org/linux-2.6 * 'kmemleak' of git://linux-arm.org/linux-2.6: kmemleak: fix kconfig for crc32 build error kmemleak: Reduce the false positives by checking for modified objects kmemleak: Show the age of an unreferenced object kmemleak: Release the object lock before calling put_object() kmemleak: Scan the _ftrace_events section in modules kmemleak: Simplify the kmemleak_scan_area() function prototype kmemleak: Do not use off-slab management with SLAB_NOLEAKTRACE	2009-12-17 16:00:19 -08:00
Rusty Russell	58463c1fe2	cpumask: avoid deprecated function in mm/slab.c These days we use cpumask_empty() which takes a pointer. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Christoph Lameter <cl@linux-foundation.org>	2009-12-17 11:43:13 +10:30
Linus Torvalds	2205afa7d1	Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf sched: Fix build failure on sparc perf bench: Add "all" pseudo subsystem and "all" pseudo suite perf tools: Introduce perf_session class perf symbols: Ditch dso->find_symbol perf symbols: Allow lookups by symbol name too perf symbols: Add missing "Variables" entry to map_type__name perf symbols: Add support for 'variable' symtabs perf symbols: Introduce ELF counterparts to symbol_type__is_a perf symbols: Introduce symbol_type__is_a perf symbols: Rename kthreads to kmaps, using another abstraction for it perf tools: Allow building for ARM hw-breakpoints: Handle bad modify_user_hw_breakpoint off-case return value perf tools: Allow cross compiling tracing, slab: Fix no callsite ifndef CONFIG_KMEMTRACE tracing, slab: Define kmem_cache_alloc_notrace ifdef CONFIG_TRACING Trivial conflict due to different fixes to modify_user_hw_breakpoint() in include/linux/hw_breakpoint.h	2009-12-14 10:13:22 -08:00
Linus Torvalds	d0316554d3	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (34 commits) m68k: rename global variable vmalloc_end to m68k_vmalloc_end percpu: add missing per_cpu_ptr_to_phys() definition for UP percpu: Fix kdump failure if booted with percpu_alloc=page percpu: make misc percpu symbols unique percpu: make percpu symbols in ia64 unique percpu: make percpu symbols in powerpc unique percpu: make percpu symbols in x86 unique percpu: make percpu symbols in xen unique percpu: make percpu symbols in cpufreq unique percpu: make percpu symbols in oprofile unique percpu: make percpu symbols in tracer unique percpu: make percpu symbols under kernel/ and mm/ unique percpu: remove some sparse warnings percpu: make alloc_percpu() handle array types vmalloc: fix use of non-existent percpu variable in put_cpu_var() this_cpu: Use this_cpu_xx in trace_functions_graph.c this_cpu: Use this_cpu_xx for ftrace this_cpu: Use this_cpu_xx in nmi handling this_cpu: Use this_cpu operations in RCU this_cpu: Use this_cpu ops for VM statistics ... Fix up trivial (famous last words) global per-cpu naming conflicts in arch/x86/kvm/svm.c mm/slab.c	2009-12-14 09:58:24 -08:00
Pekka Enberg	355d79c87a	Merge branches 'slab/fixes', 'slab/kmemleak', 'slub/perf' and 'slub/stats' into for-linus	2009-12-12 10:12:19 +02:00
Li Zefan	0bb38a5cde	tracing, slab: Fix no callsite ifndef CONFIG_KMEMTRACE For slab, if CONFIG_KMEMTRACE and CONFIG_DEBUG_SLAB are not set, __do_kmalloc() will not track callers: # ./perf record -f -a -R -e kmem:kmalloc ^C # ./perf trace ... perf-2204 [000] 147.376774: kmalloc: call_site=c0529d2d ... perf-2204 [000] 147.400997: kmalloc: call_site=c0529d2d ... Xorg-1461 [001] 147.405413: kmalloc: call_site=0 ... Xorg-1461 [001] 147.405609: kmalloc: call_site=0 ... konsole-1776 [001] 147.405786: kmalloc: call_site=0 ... Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: linux-mm@kvack.org <linux-mm@kvack.org> Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> LKML-Reference: <4B21F8AE.6020804@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-12-11 09:17:03 +01:00
Li Zefan	0f24f1287a	tracing, slab: Define kmem_cache_alloc_notrace ifdef CONFIG_TRACING Define kmem_trace_alloc_{,node}_notrace() if CONFIG_TRACING is enabled, otherwise perf-kmem will show wrong stats ifndef CONFIG_KMEM_TRACE, because a kmalloc() memory allocation may be traced by both trace_kmalloc() and trace_kmem_cache_alloc(). Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: linux-mm@kvack.org <linux-mm@kvack.org> Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> LKML-Reference: <4B21F89A.7000801@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-12-11 09:17:02 +01:00
J. R. Okajima	ddbf2e8366	slab, kmemleak: pass the correct pointer to kmemleak_erase() In ____cache_alloc(), the variable 'ac' may be changed after cache_alloc_refill() and the following kmemleak_erase() may get an incorrect pointer. Update 'ac' after cache_alloc_refill() unconditionally. See the following URL for the discussion of this patch: http://marc.info/?l=linux-kernel&m=125873373124187&w=2 Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: J. R. Okajima <hooanon05@yahoo.co.jp> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2009-12-06 10:24:03 +02:00
J. R. Okajima	f3d8b53a3a	slab, kmemleak: stop calling kmemleak_erase() unconditionally When the gotten object is NULL (probably due to ENOMEM), kmemleak_erase() is unnecessary here, It just sets NULL to where already is NULL. Add a condition. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: J. R. Okajima <hooanon05@yahoo.co.jp> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2009-12-06 10:23:05 +02:00
Tim Blechmann	8e15b79cf4	SLAB: Fix unlikely() annotation in __cache_alloc_node() Branch profiling on my nehalem machine showed 99% incorrect branch hints: 28459 7678524 99 __cache_alloc_node slab.c 3551 Discussion on lkml [1] led to the solution to remove this hint. [1] http://patchwork.kernel.org/patch/63517/ Signed-off-by: Tim Blechmann <tim@klingt.org> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2009-12-06 10:21:21 +02:00
Pekka Enberg	ce79ddc8e2	SLAB: Fix lockdep annotations for CPU hotplug As reported by Paul McKenney: I am seeing some lockdep complaints in rcutorture runs that include frequent CPU-hotplug operations. The tests are otherwise successful. My first thought was to send a patch that gave each array_cache structure's ->lock field its own struct lock_class_key, but you already have a init_lock_keys() that seems to be intended to deal with this. ------------------------------------------------------------------------ ============================================= [ INFO: possible recursive locking detected ] 2.6.32-rc4-autokern1 #1 --------------------------------------------- syslogd/2908 is trying to acquire lock: (&nc->lock){..-...}, at: [<c0000000001407f4>] .kmem_cache_free+0x118/0x2d4 but task is already holding lock: (&nc->lock){..-...}, at: [<c0000000001411bc>] .kfree+0x1f0/0x324 other info that might help us debug this: 3 locks held by syslogd/2908: #0: (&u->readlock){+.+.+.}, at: [<c0000000004556f8>] .unix_dgram_recvmsg+0x70/0x338 #1: (&nc->lock){..-...}, at: [<c0000000001411bc>] .kfree+0x1f0/0x324 #2: (&parent->list_lock){-.-...}, at: [<c000000000140f64>] .__drain_alien_cache+0x50/0xb8 stack backtrace: Call Trace: [c0000000e8ccafc0] [c0000000000101e4] .show_stack+0x70/0x184 (unreliable) [c0000000e8ccb070] [c0000000000afebc] .validate_chain+0x6ec/0xf58 [c0000000e8ccb180] [c0000000000b0ff0] .__lock_acquire+0x8c8/0x974 [c0000000e8ccb280] [c0000000000b2290] .lock_acquire+0x140/0x18c [c0000000e8ccb350] [c000000000468df0] ._spin_lock+0x48/0x70 [c0000000e8ccb3e0] [c0000000001407f4] .kmem_cache_free+0x118/0x2d4 [c0000000e8ccb4a0] [c000000000140b90] .free_block+0x130/0x1a8 [c0000000e8ccb540] [c000000000140f94] .__drain_alien_cache+0x80/0xb8 [c0000000e8ccb5e0] [c0000000001411e0] .kfree+0x214/0x324 [c0000000e8ccb6a0] [c0000000003ca860] .skb_release_data+0xe8/0x104 [c0000000e8ccb730] [c0000000003ca2ec] .__kfree_skb+0x20/0xd4 [c0000000e8ccb7b0] [c0000000003cf2c8] .skb_free_datagram+0x1c/0x5c [c0000000e8ccb830] [c00000000045597c] .unix_dgram_recvmsg+0x2f4/0x338 [c0000000e8ccb920] [c0000000003c0f14] .sock_recvmsg+0xf4/0x13c [c0000000e8ccbb30] [c0000000003c28ec] .SyS_recvfrom+0xb4/0x130 [c0000000e8ccbcb0] [c0000000003bfb78] .sys_recv+0x18/0x2c [c0000000e8ccbd20] [c0000000003ed388] .compat_sys_recv+0x14/0x28 [c0000000e8ccbd90] [c0000000003ee1bc] .compat_sys_socketcall+0x178/0x220 [c0000000e8ccbe30] [c0000000000085d4] syscall_exit+0x0/0x40 This patch fixes the issue by setting up lockdep annotations during CPU hotplug. Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2009-11-30 19:16:08 +02:00
Tejun Heo	1871e52c76	percpu: make percpu symbols under kernel/ and mm/ unique This patch updates percpu related symbols under kernel/ and mm/ such that percpu symbols are unique and don't clash with local symbols. This serves two purposes of decreasing the possibility of global percpu symbol collision and allowing dropping per_cpu__ prefix from percpu symbols. * kernel/lockdep.c: s/lock_stats/cpu_lock_stats/ * kernel/sched.c: s/init_rq_rt/init_rt_rq_var/ (any better idea?) s/sched_group_cpus/sched_groups/ * kernel/softirq.c: s/ksoftirqd/run_ksoftirqd/a * kernel/softlockup.c: s/()_timestamp/softlockup_\1_ts/ s/watchdog_task/softlockup_watchdog/ s/timestamp/ts/ for local variables kernel/time/timer_stats: s/lookup_lock/tstats_lookup_lock/ * mm/slab.c: s/reap_work/slab_reap_work/ s/reap_node/slab_reap_node/ * mm/vmstat.c: local variable changed to avoid collision with vmstat_work Partly based on Rusty Russell's "alloc_percpu: rename percpu vars which cause name clashes" patch. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: (slab/vmstat) Christoph Lameter <cl@linux-foundation.org> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Nick Piggin <npiggin@suse.de>	2009-10-29 22:34:13 +09:00
Catalin Marinas	c017b4be3e	kmemleak: Simplify the kmemleak_scan_area() function prototype This function was taking non-necessary arguments which can be determined by kmemleak. The patch also modifies the calling sites. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Rusty Russell <rusty@rustcorp.com.au>	2009-10-28 15:11:00 +00:00
Catalin Marinas	e7cb55b946	kmemleak: Do not use off-slab management with SLAB_NOLEAKTRACE With the slab allocator, if off-slab management is enabled for the kmem_caches used by kmemleak, it leads to recursive calls into kmemleak_alloc(). Off-slab management can be triggered by other config options increasing the slab size, e.g. DEBUG_PAGEALLOC. Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2009-10-28 13:33:08 +00:00
Jan Beulich	4481374ce8	mm: replace various uses of num_physpages by totalram_pages Sizing of memory allocations shouldn't depend on the number of physical pages found in a system, as that generally includes (perhaps a huge amount of) non-RAM pages. The amount of what actually is usable as storage should instead be used as a basis here. Some of the calculations (i.e. those not intending to use high memory) should likely even use (totalram_pages - totalhigh_pages). Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Dave Airlie <airlied@linux.ie> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: "David S. Miller" <davem@davemloft.net> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-22 07:17:38 -07:00
Pekka Enberg	ec5a36f94e	SLAB: Fix lockdep annotations Commit 8429db5... ("slab: setup cpu caches later on when interrupts are enabled") broke mm/slab.c lockdep annotations: [ 11.554715] ============================================= [ 11.555249] [ INFO: possible recursive locking detected ] [ 11.555560] 2.6.31-rc1 #896 [ 11.555861] --------------------------------------------- [ 11.556127] udevd/1899 is trying to acquire lock: [ 11.556436] (&nc->lock){-.-...}, at: [<ffffffff810c337f>] kmem_cache_free+0xcd/0x25b [ 11.557101] [ 11.557102] but task is already holding lock: [ 11.557706] (&nc->lock){-.-...}, at: [<ffffffff810c3cd0>] kfree+0x137/0x292 [ 11.558109] [ 11.558109] other info that might help us debug this: [ 11.558720] 2 locks held by udevd/1899: [ 11.558983] #0: (&nc->lock){-.-...}, at: [<ffffffff810c3cd0>] kfree+0x137/0x292 [ 11.559734] #1: (&parent->list_lock){-.-...}, at: [<ffffffff810c36c7>] __drain_alien_cache+0x3b/0xbd [ 11.560442] [ 11.560443] stack backtrace: [ 11.561009] Pid: 1899, comm: udevd Not tainted 2.6.31-rc1 #896 [ 11.561276] Call Trace: [ 11.561632] [<ffffffff81065ed6>] __lock_acquire+0x15ec/0x168f [ 11.561901] [<ffffffff81065f60>] ? __lock_acquire+0x1676/0x168f [ 11.562171] [<ffffffff81063c52>] ? trace_hardirqs_on_caller+0x113/0x13e [ 11.562490] [<ffffffff8150c337>] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 11.562807] [<ffffffff8106603a>] lock_acquire+0xc1/0xe5 [ 11.563073] [<ffffffff810c337f>] ? kmem_cache_free+0xcd/0x25b [ 11.563385] [<ffffffff8150c8fc>] _spin_lock+0x31/0x66 [ 11.563696] [<ffffffff810c337f>] ? kmem_cache_free+0xcd/0x25b [ 11.563964] [<ffffffff810c337f>] kmem_cache_free+0xcd/0x25b [ 11.564235] [<ffffffff8109bf8c>] ? __free_pages+0x1b/0x24 [ 11.564551] [<ffffffff810c3564>] slab_destroy+0x57/0x5c [ 11.564860] [<ffffffff810c3641>] free_block+0xd8/0x123 [ 11.565126] [<ffffffff810c372e>] __drain_alien_cache+0xa2/0xbd [ 11.565441] [<ffffffff810c3ce5>] kfree+0x14c/0x292 [ 11.565752] [<ffffffff8144a007>] skb_release_data+0xc6/0xcb [ 11.566020] [<ffffffff81449cf0>] __kfree_skb+0x19/0x86 [ 11.566286] [<ffffffff81449d88>] consume_skb+0x2b/0x2d [ 11.566631] [<ffffffff8144cbe0>] skb_free_datagram+0x14/0x3a [ 11.566901] [<ffffffff81462eef>] netlink_recvmsg+0x164/0x258 [ 11.567170] [<ffffffff81443461>] sock_recvmsg+0xe5/0xfe [ 11.567486] [<ffffffff810ab063>] ? might_fault+0xaf/0xb1 [ 11.567802] [<ffffffff81053a78>] ? autoremove_wake_function+0x0/0x38 [ 11.568073] [<ffffffff810d84ca>] ? core_sys_select+0x3d/0x2b4 [ 11.568378] [<ffffffff81065f60>] ? __lock_acquire+0x1676/0x168f [ 11.568693] [<ffffffff81442dc1>] ? sockfd_lookup_light+0x1b/0x54 [ 11.568961] [<ffffffff81444416>] sys_recvfrom+0xa3/0xf8 [ 11.569228] [<ffffffff81063c8a>] ? trace_hardirqs_on+0xd/0xf [ 11.569546] [<ffffffff8100af2b>] system_call_fastpath+0x16/0x1b# Fix that up. Closes-bug: http://bugzilla.kernel.org/show_bug.cgi?id=13654 Tested-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2009-06-29 09:57:10 +03:00
Paul E. McKenney	7ed9f7e5db	fix RCU-callback-after-kmem_cache_destroy problem in sl[aou]b Jesper noted that kmem_cache_destroy() invokes synchronize_rcu() rather than rcu_barrier() in the SLAB_DESTROY_BY_RCU case, which could result in RCU callbacks accessing a kmem_cache after it had been destroyed. Cc: <stable@kernel.org> Acked-by: Matt Mackall <mpm@selenic.com> Reported-by: Jesper Dangaard Brouer <hawk@comx.dk> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2009-06-26 12:10:47 +03:00
Benjamin Herrenschmidt	dcce284a25	mm: Extend gfp masking to the page allocator The page allocator also needs the masking of gfp flags during boot, so this moves it out of slab/slub and uses it with the page allocator as well. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:12:57 -07:00
Pekka Enberg	e03ab9d415	Merge branches 'slab/documentation', 'slab/fixes', 'slob/cleanups' and 'slub/fixes' into for-linus	2009-06-17 08:30:15 +03:00
Linus Torvalds	517d08699b	Merge branch 'akpm' * akpm: (182 commits) fbdev: bf54x-lq043fb: use kzalloc over kmalloc/memset fbdev: bfin: fix __dev{init,exit} markings fbdev: bfin: drop unnecessary calls to memset fbdev: bfin-t350mcqb-fb: drop unused local variables fbdev: blackfin has __raw I/O accessors, so use them in fb.h fbdev: s1d13xxxfb: add accelerated bitblt functions tcx: use standard fields for framebuffer physical address and length fbdev: add support for handoff from firmware to hw framebuffers intelfb: fix a bug when changing video timing fbdev: use framebuffer_release() for freeing fb_info structures radeon: P2G2CLK_ALWAYS_ONb tested twice, should 2nd be P2G2CLK_DAC_ALWAYS_ONb? s3c-fb: CPUFREQ frequency scaling support s3c-fb: fix resource releasing on error during probing carminefb: fix possible access beyond end of carmine_modedb[] acornfb: remove fb_mmap function mb862xxfb: use CONFIG_OF instead of CONFIG_PPC_OF mb862xxfb: restrict compliation of platform driver to PPC Samsung SoC Framebuffer driver: add Alpha Channel support atmel-lcdc: fix pixclock upper bound detection offb: use framebuffer_alloc() to allocate fb_info struct ... Manually fix up conflicts due to kmemcheck in mm/slab.c	2009-06-16 19:50:13 -07:00
Mel Gorman	b6e68bc1ba	page allocator: slab: use nr_online_nodes to check for a NUMA platform SLAB currently avoids checking a bitmap repeatedly by checking once and storing a flag. When the addition of nr_online_nodes as a cheaper version of num_online_nodes(), this check can be replaced by nr_online_nodes. (Christoph did a patch that this is lifted almost verbatim from) Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-16 19:47:35 -07:00
Mel Gorman	6484eb3e2a	page allocator: do not check NUMA node ID when the caller knows the node is valid Callers of alloc_pages_node() can optionally specify -1 as a node to mean "allocate from the current node". However, a number of the callers in fast paths know for a fact their node is valid. To avoid a comparison and branch, this patch adds alloc_pages_exact_node() that only checks the nid with VM_BUG_ON(). Callers that know their node is valid are then converted. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Paul Mundt <lethal@linux-sh.org> [for the SLOB NUMA bits] Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-16 19:47:32 -07:00
Vegard Nossum	722f2a6c87	Merge commit 'linus/master' into HEAD Conflicts: MAINTAINERS Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>	2009-06-15 15:50:49 +02:00
Vegard Nossum	b1eeab6768	kmemcheck: add hooks for the page allocator This adds support for tracking the initializedness of memory that was allocated with the page allocator. Highmem requests are not tracked. Cc: Dave Hansen <dave@linux.vnet.ibm.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> [build fix for !CONFIG_KMEMCHECK] Signed-off-by: Ingo Molnar <mingo@elte.hu> [rebased for mainline inclusion] Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>	2009-06-15 15:48:33 +02:00
Pekka Enberg	c175eea466	slab: add hooks for kmemcheck We now have SLAB support for kmemcheck! This means that it doesn't matter whether one chooses SLAB or SLUB, or indeed whether Linus chooses to chuck SLAB or SLUB.. ;-) Cc: Ingo Molnar <mingo@elte.hu> Cc: Christoph Lameter <clameter@sgi.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> [rebased for mainline inclusion] Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>	2009-06-15 12:40:08 +02:00
Pekka Enberg	8eae985f08	slab: move struct kmem_cache to headers Move the SLAB struct kmem_cache definition to <linux/slab_def.h> like with SLUB so kmemcheck can access ->ctor and ->flags. Cc: Ingo Molnar <mingo@elte.hu> Cc: Christoph Lameter <clameter@sgi.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> [rebased for mainline inclusion] Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>	2009-06-13 08:58:43 +02:00
Pekka Enberg	8429db5c63	slab: setup cpu caches later on when interrupts are enabled Fixes the following boot-time warning: [ 0.000000] ------------[ cut here ]------------ [ 0.000000] WARNING: at kernel/smp.c:369 smp_call_function_many+0x56/0x1bc() [ 0.000000] Hardware name: [ 0.000000] Modules linked in: [ 0.000000] Pid: 0, comm: swapper Not tainted 2.6.30 #492 [ 0.000000] Call Trace: [ 0.000000] [<ffffffff8149e021>] ? _spin_unlock+0x4f/0x5c [ 0.000000] [<ffffffff8108f11b>] ? smp_call_function_many+0x56/0x1bc [ 0.000000] [<ffffffff81061764>] warn_slowpath_common+0x7c/0xa9 [ 0.000000] [<ffffffff810617a5>] warn_slowpath_null+0x14/0x16 [ 0.000000] [<ffffffff8108f11b>] smp_call_function_many+0x56/0x1bc [ 0.000000] [<ffffffff810f3e00>] ? do_ccupdate_local+0x0/0x54 [ 0.000000] [<ffffffff810f3e00>] ? do_ccupdate_local+0x0/0x54 [ 0.000000] [<ffffffff8108f2be>] smp_call_function+0x3d/0x68 [ 0.000000] [<ffffffff810f3e00>] ? do_ccupdate_local+0x0/0x54 [ 0.000000] [<ffffffff81066fd8>] on_each_cpu+0x31/0x7c [ 0.000000] [<ffffffff810f64f5>] do_tune_cpucache+0x119/0x454 [ 0.000000] [<ffffffff81087080>] ? lockdep_init_map+0x94/0x10b [ 0.000000] [<ffffffff818133b0>] ? kmem_cache_init+0x421/0x593 [ 0.000000] [<ffffffff810f69cf>] enable_cpucache+0x68/0xad [ 0.000000] [<ffffffff818133c3>] kmem_cache_init+0x434/0x593 [ 0.000000] [<ffffffff8180987c>] ? mem_init+0x156/0x161 [ 0.000000] [<ffffffff817f8aae>] start_kernel+0x1cc/0x3b9 [ 0.000000] [<ffffffff817f829a>] x86_64_start_reservations+0xaa/0xae [ 0.000000] [<ffffffff817f837f>] x86_64_start_kernel+0xe1/0xe8 [ 0.000000] ---[ end trace 4eaa2a86a8e2da22 ]--- Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Nick Piggin <npiggin@suse.de> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2009-06-12 18:53:58 +03:00
Pekka Enberg	7e85ee0c1d	slab,slub: don't enable interrupts during early boot As explained by Benjamin Herrenschmidt: Oh and btw, your patch alone doesn't fix powerpc, because it's missing a whole bunch of GFP_KERNEL's in the arch code... You would have to grep the entire kernel for things that check slab_is_available() and even then you'll be missing some. For example, slab_is_available() didn't always exist, and so in the early days on powerpc, we used a mem_init_done global that is set form mem_init() (not perfect but works in practice). And we still have code using that to do the test. Therefore, mask out __GFP_WAIT, __GFP_IO, and __GFP_FS in the slab allocators in early boot code to avoid enabling interrupts. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2009-06-12 18:53:33 +03:00
Pekka Enberg	eb91f1d0a5	slab: fix gfp flag in setup_cpu_cache() Fixes the following warning during bootup when compiling with CONFIG_SLAB: [ 0.000000] ------------[ cut here ]------------ [ 0.000000] WARNING: at kernel/lockdep.c:2282 lockdep_trace_alloc+0x91/0xb9() [ 0.000000] Hardware name: [ 0.000000] Modules linked in: [ 0.000000] Pid: 0, comm: swapper Not tainted 2.6.30 #491 [ 0.000000] Call Trace: [ 0.000000] [<ffffffff81087d84>] ? lockdep_trace_alloc+0x91/0xb9 [ 0.000000] [<ffffffff81061764>] warn_slowpath_common+0x7c/0xa9 [ 0.000000] [<ffffffff810617a5>] warn_slowpath_null+0x14/0x16 [ 0.000000] [<ffffffff81087d84>] lockdep_trace_alloc+0x91/0xb9 [ 0.000000] [<ffffffff810f5b03>] kmem_cache_alloc_node_notrace+0x26/0xdf [ 0.000000] [<ffffffff81487f4e>] ? setup_cpu_cache+0x7e/0x210 [ 0.000000] [<ffffffff81487fe3>] setup_cpu_cache+0x113/0x210 [ 0.000000] [<ffffffff810f73ff>] kmem_cache_create+0x409/0x486 [ 0.000000] [<ffffffff818131c1>] kmem_cache_init+0x232/0x593 [ 0.000000] [<ffffffff8180987c>] ? mem_init+0x156/0x161 [ 0.000000] [<ffffffff817f8aae>] start_kernel+0x1cc/0x3b9 [ 0.000000] [<ffffffff817f829a>] x86_64_start_reservations+0xaa/0xae [ 0.000000] [<ffffffff817f837f>] x86_64_start_kernel+0xe1/0xe8 [ 0.000000] ---[ end trace 4eaa2a86a8e2da22 ]--- Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2009-06-12 18:34:32 +03:00
Linus Torvalds	512626a04e	Merge branch 'for-linus' of git://linux-arm.org/linux-2.6 * 'for-linus' of git://linux-arm.org/linux-2.6: kmemleak: Add the corresponding MAINTAINERS entry kmemleak: Simple testing module for kmemleak kmemleak: Enable the building of the memory leak detector kmemleak: Remove some of the kmemleak false positives kmemleak: Add modules support kmemleak: Add kmemleak_alloc callback from alloc_large_system_hash kmemleak: Add the vmalloc memory allocation/freeing hooks kmemleak: Add the slub memory allocation/freeing hooks kmemleak: Add the slob memory allocation/freeing hooks kmemleak: Add the slab memory allocation/freeing hooks kmemleak: Add documentation on the memory leak detector kmemleak: Add the base support Manual conflict resolution (with the slab/earlyboot changes) in: drivers/char/vt.c init/main.c mm/slab.c	2009-06-11 14:15:57 -07:00
Pekka Enberg	83b519e8b9	slab: setup allocators earlier in the boot sequence This patch makes kmalloc() available earlier in the boot sequence so we can get rid of some bootmem allocations. The bulk of the changes are due to kmem_cache_init() being called with interrupts disabled which requires some changes to allocator boostrap code. Note: 32-bit x86 does WP protect test in mem_init() so we must setup traps before we call mem_init() during boot as reported by Ingo Molnar: We have a hard crash in the WP-protect code: [ 0.000000] Checking if this processor honours the WP bit even in supervisor mode...BUG: Int 14: CR2 ffcff000 [ 0.000000] EDI 00000188 ESI 00000ac7 EBP c17eaf9c ESP c17eaf8c [ 0.000000] EBX 000014e0 EDX 0000000e ECX 01856067 EAX 00000001 [ 0.000000] err 00000003 EIP c10135b1 CS 00000060 flg 00010002 [ 0.000000] Stack: c17eafa8 c17fd410 c16747bc c17eafc4 c17fd7e5 000011fd f8616000 c18237cc [ 0.000000] 00099800 c17bb000 c17eafec c17f1668 000001c5 c17f1322 c166e039 c1822bf0 [ 0.000000] c166e033 c153a014 c18237cc 00020800 c17eaff8 c17f106a 00020800 01ba5003 [ 0.000000] Pid: 0, comm: swapper Not tainted 2.6.30-tip-02161-g7a74539-dirty #52203 [ 0.000000] Call Trace: [ 0.000000] [<c15357c2>] ? printk+0x14/0x16 [ 0.000000] [<c10135b1>] ? do_test_wp_bit+0x19/0x23 [ 0.000000] [<c17fd410>] ? test_wp_bit+0x26/0x64 [ 0.000000] [<c17fd7e5>] ? mem_init+0x1ba/0x1d8 [ 0.000000] [<c17f1668>] ? start_kernel+0x164/0x2f7 [ 0.000000] [<c17f1322>] ? unknown_bootoption+0x0/0x19c [ 0.000000] [<c17f106a>] ? __init_begin+0x6a/0x6f Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by Linus Torvalds <torvalds@linux-foundation.org> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Matt Mackall <mpm@selenic.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2009-06-11 19:15:56 +03:00
Catalin Marinas	d5cff63529	kmemleak: Add the slab memory allocation/freeing hooks This patch adds the callbacks to kmemleak_(alloc\|free) functions from the slab allocator. The patch also adds the SLAB_NOLEAKTRACE flag to avoid recursive calls to kmemleak when it allocates its own data structures. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>	2009-06-11 17:03:29 +01:00
Ron Lee	6746136520	slab: fix generic PAGE_POISONING conflict with SLAB_RED_ZONE A generic page poisoning mechanism was added with commit: `6a11f75b6a` which destructively poisons full pages with a bitpattern. On arches where PAGE_POISONING is used, this conflicts with the slab redzone checking enabled by DEBUG_SLAB, scribbling bits all over its magic words and making it complain about that quite emphatically. On x86 (and I presume at present all the other arches which set ARCH_SUPPORTS_DEBUG_PAGEALLOC too), the kernel_map_pages() operation is non destructive so it can coexist with the other DEBUG_SLAB mechanisms just fine. This patch favours the expensive full page destruction test for cases where there is a collision and it is explicitly selected. Signed-off-by: Ron Lee <ron@debian.org> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2009-05-22 11:01:12 +03:00
Zhaolei	02af61bb50	tracing, kmemtrace: Separate include/trace/kmemtrace.h to kmemtrace part and tracepoint part Impact: refactor code for future changes Current kmemtrace.h is used both as header file of kmemtrace and kmem's tracepoints definition. Tracepoints' definition file may be used by other code, and should only have definition of tracepoint. We can separate include/trace/kmemtrace.h into 2 files: include/linux/kmemtrace.h: header file for kmemtrace include/trace/kmem.h: definition of kmem tracepoints Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Acked-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <49DEE68A.5040902@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-12 15:22:55 +02:00
Linus Torvalds	12fe32e4f9	Merge branch 'kmemtrace-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'kmemtrace-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: kmemtrace: trace kfree() calls with NULL or zero-length objects kmemtrace: small cleanups kmemtrace: restore original tracing data binary format, improve ABI kmemtrace: kmemtrace_alloc() must fill type_id kmemtrace: use tracepoints kmemtrace, rcu: don't include unnecessary headers, allow kmemtrace w/ tracepoints kmemtrace, rcu: fix rcupreempt.c data structure dependencies kmemtrace, rcu: fix rcu_tree_trace.c data structure dependencies kmemtrace, rcu: fix linux/rcutree.h and linux/rcuclassic.h dependencies kmemtrace, mm: fix slab.h dependency problem in mm/failslab.c kmemtrace, kbuild: fix slab.h dependency problem in lib/decompress_unlzma.c kmemtrace, kbuild: fix slab.h dependency problem in lib/decompress_bunzip2.c kmemtrace, kbuild: fix slab.h dependency problem in lib/decompress_inflate.c kmemtrace, squashfs: fix slab.h dependency problem in squasfs kmemtrace, befs: fix slab.h dependency problem kmemtrace, security: fix linux/key.h header file dependencies kmemtrace, fs: fix linux/fdtable.h header file dependencies kmemtrace, fs: uninline simple_transaction_set() kmemtrace, fs, security: move alloc_secdata() and free_secdata() to linux/security.h	2009-04-06 13:30:00 -07:00
Linus Torvalds	714f83d5d9	Merge branch 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (413 commits) tracing, net: fix net tree and tracing tree merge interaction tracing, powerpc: fix powerpc tree and tracing tree interaction ring-buffer: do not remove reader page from list on ring buffer free function-graph: allow unregistering twice trace: make argument 'mem' of trace_seq_putmem() const tracing: add missing 'extern' keywords to trace_output.h tracing: provide trace_seq_reserve() blktrace: print out BLK_TN_MESSAGE properly blktrace: extract duplidate code blktrace: fix memory leak when freeing struct blk_io_trace blktrace: fix blk_probes_ref chaos blktrace: make classic output more classic blktrace: fix off-by-one bug blktrace: fix the original blktrace blktrace: fix a race when creating blk_tree_root in debugfs blktrace: fix timestamp in binary output tracing, Text Edit Lock: cleanup tracing: filter fix for TRACE_EVENT_FORMAT events ftrace: Using FTRACE_WARN_ON() to check "freed record" in ftrace_release() x86: kretprobe-booster interrupt emulation code fix ... Fix up trivial conflicts in arch/parisc/include/asm/ftrace.h include/linux/memory.h kernel/extable.c kernel/module.c	2009-04-05 11:04:19 -07:00
Linus Torvalds	90975ef712	Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-cpumask * git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-cpumask: (36 commits) cpumask: remove cpumask allocation from idle_balance, fix numa, cpumask: move numa_node_id default implementation to topology.h, fix cpumask: remove cpumask allocation from idle_balance x86: cpumask: x86 mmio-mod.c use cpumask_var_t for downed_cpus x86: cpumask: update 32-bit APM not to mug current->cpus_allowed x86: microcode: cleanup x86: cpumask: use work_on_cpu in arch/x86/kernel/microcode_core.c cpumask: fix CONFIG_CPUMASK_OFFSTACK=y cpu hotunplug crash numa, cpumask: move numa_node_id default implementation to topology.h cpumask: convert node_to_cpumask_map[] to cpumask_var_t cpumask: remove x86 cpumask_t uses. cpumask: use cpumask_var_t in uv_flush_tlb_others. cpumask: remove cpumask_t assignment from vector_allocation_domain() cpumask: make Xen use the new operators. cpumask: clean up summit's send_IPI functions cpumask: use new cpumask functions throughout x86 x86: unify cpu_callin_mask/cpu_callout_mask/cpu_initialized_mask/cpu_sibling_setup_mask cpumask: convert struct cpuinfo_x86's llc_shared_map to cpumask_var_t cpumask: convert node_to_cpumask_map[] to cpumask_var_t x86: unify 32 and 64-bit node_to_cpumask_map ...	2009-04-05 10:33:07 -07:00
Pekka Enberg	2121db74ba	kmemtrace: trace kfree() calls with NULL or zero-length objects Impact: also output kfree(NULL) entries This patch moves the trace_kfree() calls before the ZERO_OR_NULL_PTR check so that we can trace call-sites that call kfree() with NULL many times which might be an indication of a bug. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> LKML-Reference: <1237971957.30175.18.camel@penberg-laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-03 12:23:10 +02:00
Eduard - Gabriel Munteanu	ca2b84cb3c	kmemtrace: use tracepoints kmemtrace now uses tracepoints instead of markers. We no longer need to use format specifiers to pass arguments. Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> [ folded: Use the new TP_PROTO and TP_ARGS to fix the build. ] [ folded: fix build when CONFIG_KMEMTRACE is disabled. ] [ folded: define tracepoints when CONFIG_TRACEPOINTS is enabled. ] Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> LKML-Reference: <ae61c0f37156db8ec8dc0d5778018edde60a92e3.1237813499.git.eduard.munteanu@linux360.ro> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-03 12:23:06 +02:00
Jean Delvare	bf6aede712	workqueue: add to_delayed_work() helper function It is a fairly common operation to have a pointer to a work and to need a pointer to the delayed work it is contained in. In particular, all delayed works which want to rearm themselves will have to do that. So it would seem fair to offer a helper function for this operation. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Jean Delvare <khali@linux-fr.org> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: "David S. Miller" <davem@davemloft.net> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Greg KH <greg@kroah.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:04:50 -07:00
Rusty Russell	558f6ab910	Merge branch 'cpumask-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip Conflicts: arch/x86/include/asm/topology.h drivers/oprofile/buffer_sync.c (Both cases: changed in Linus' tree, removed in Ingo's).	2009-03-31 13:33:50 +10:30
Linus Torvalds	c4e1aa67ed	Merge branch 'locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (33 commits) lockdep: fix deadlock in lockdep_trace_alloc lockdep: annotate reclaim context (__GFP_NOFS), fix SLOB lockdep: annotate reclaim context (__GFP_NOFS), fix lockdep: build fix for !PROVE_LOCKING lockstat: warn about disabled lock debugging lockdep: use stringify.h lockdep: simplify check_prev_add_irq() lockdep: get_user_chars() redo lockdep: simplify get_user_chars() lockdep: add comments to mark_lock_irq() lockdep: remove macro usage from mark_held_locks() lockdep: fully reduce mark_lock_irq() lockdep: merge the !_READ mark_lock_irq() helpers lockdep: merge the _READ mark_lock_irq() helpers lockdep: simplify mark_lock_irq() helpers #3 lockdep: further simplify mark_lock_irq() helpers lockdep: simplify the mark_lock_irq() helpers lockdep: split up mark_lock_irq() lockdep: generate usage strings lockdep: generate the state bit definitions ...	2009-03-30 17:17:35 -07:00
Rusty Russell	a70f730282	cpumask: replace node_to_cpumask with cpumask_of_node. Impact: cleanup node_to_cpumask (and the blecherous node_to_cpumask_ptr which contained a declaration) are replaced now everyone implements cpumask_of_node. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2009-03-13 14:49:46 +10:30
Ingo Molnar	28b1bd1cbc	Merge branch 'core/locking' into tracing/ftrace	2009-03-04 18:49:19 +01:00
Nick Piggin	cf40bd16fd	lockdep: annotate reclaim context (__GFP_NOFS) Here is another version, with the incremental patch rolled up, and added reclaim context annotation to kswapd, and allocation tracing to slab allocators (which may only ever reach the page allocator in rare cases, so it is good to put annotations here too). Haven't tested this version as such, but it should be getting closer to merge worthy ;) -- After noticing some code in mm/filemap.c accidentally perform a __GFP_FS allocation when it should not have been, I thought it might be a good idea to try to catch this kind of thing with lockdep. I coded up a little idea that seems to work. Unfortunately the system has to actually be in __GFP_FS page reclaim, then take the lock, before it will mark it. But at least that might still be some orders of magnitude more common (and more debuggable) than an actual deadlock condition, so we have some improvement I hope (the concept is no less complete than discovery of a lock's interrupt contexts). I guess we could even do the same thing with __GFP_IO (normal reclaim), and even GFP_NOIO locks too... but filesystems will have the most locks and fiddly code paths, so let's start there and see how it goes. It seems to work. I did a quick test. ================================= [ INFO: inconsistent lock state ] 2.6.28-rc6-00007-ged31348-dirty #26 --------------------------------- inconsistent {in-reclaim-W} -> {ov-reclaim-W} usage. modprobe/8526 [HC0[0]:SC0[0]:HE1:SE1] takes: (testlock){--..}, at: [<ffffffffa0020055>] brd_init+0x55/0x216 [brd] {in-reclaim-W} state was registered at: [<ffffffff80267bdb>] __lock_acquire+0x75b/0x1a60 [<ffffffff80268f71>] lock_acquire+0x91/0xc0 [<ffffffff8070f0e1>] mutex_lock_nested+0xb1/0x310 [<ffffffffa002002b>] brd_init+0x2b/0x216 [brd] [<ffffffff8020903b>] _stext+0x3b/0x170 [<ffffffff80272ebf>] sys_init_module+0xaf/0x1e0 [<ffffffff8020c3fb>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff irq event stamp: 3929 hardirqs last enabled at (3929): [<ffffffff8070f2b5>] mutex_lock_nested+0x285/0x310 hardirqs last disabled at (3928): [<ffffffff8070f089>] mutex_lock_nested+0x59/0x310 softirqs last enabled at (3732): [<ffffffff8061f623>] sk_filter+0x83/0xe0 softirqs last disabled at (3730): [<ffffffff8061f5b6>] sk_filter+0x16/0xe0 other info that might help us debug this: 1 lock held by modprobe/8526: #0: (testlock){--..}, at: [<ffffffffa0020055>] brd_init+0x55/0x216 [brd] stack backtrace: Pid: 8526, comm: modprobe Not tainted 2.6.28-rc6-00007-ged31348-dirty #26 Call Trace: [<ffffffff80265483>] print_usage_bug+0x193/0x1d0 [<ffffffff80266530>] mark_lock+0xaf0/0xca0 [<ffffffff80266735>] mark_held_locks+0x55/0xc0 [<ffffffffa0020000>] ? brd_init+0x0/0x216 [brd] [<ffffffff802667ca>] trace_reclaim_fs+0x2a/0x60 [<ffffffff80285005>] __alloc_pages_internal+0x475/0x580 [<ffffffff8070f29e>] ? mutex_lock_nested+0x26e/0x310 [<ffffffffa0020000>] ? brd_init+0x0/0x216 [brd] [<ffffffffa002006a>] brd_init+0x6a/0x216 [brd] [<ffffffffa0020000>] ? brd_init+0x0/0x216 [brd] [<ffffffff8020903b>] _stext+0x3b/0x170 [<ffffffff8070f8b9>] ? mutex_unlock+0x9/0x10 [<ffffffff8070f83d>] ? __mutex_unlock_slowpath+0x10d/0x180 [<ffffffff802669ec>] ? trace_hardirqs_on_caller+0x12c/0x190 [<ffffffff80272ebf>] sys_init_module+0xaf/0x1e0 [<ffffffff8020c3fb>] system_call_fastpath+0x16/0x1b Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-14 23:27:49 +01:00
Ingo Molnar	1c511f740f	Merge branches 'tracing/ftrace', 'tracing/ring-buffer', 'tracing/sysprof', 'tracing/urgent' and 'linus' into tracing/core	2009-02-13 10:25:18 +01:00
Kirill A. Shutemov	b1aabecd55	mm: Export symbol ksize() Commit `7b2cd92adc` ("crypto: api - Fix zeroing on free") added modular user of ksize(). Export that to fix crypto.ko compilation. Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2009-02-12 17:50:46 +02:00
Ingo Molnar	3d7a96f5a4	Merge branch 'linus' into tracing/kmemtrace2	2009-01-06 09:53:05 +01:00
Rusty Russell	174596a0b9	cpumask: convert mm/ Impact: Use new API Convert kernel mm functions to use struct cpumask. We skip include/linux/percpu.h and mm/allocpercpu.c, which are in flux. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Mike Travis <travis@sgi.com> Reviewed-by: Christoph Lameter <cl@linux-foundation.org>	2009-01-01 10:12:29 +10:30
Ingo Molnar	f09eac9034	tracing/kmemtrace: fix typo Impact: build fix Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-31 09:43:46 +01:00
Ingo Molnar	818fa7f390	Merge branch 'tracing/kmemtrace' into tracing/kmemtrace2	2008-12-31 08:19:48 +01:00
Ingo Molnar	5fdf7e5975	Merge branch 'linus' into tracing/kmemtrace Conflicts: mm/slub.c	2008-12-31 08:14:29 +01:00
Ingo Molnar	3fd4bc015e	tracing/kmemtrace: export kmemtrace_mark_alloc_node() / kmemtrace_mark_free() Impact: build fix Also fix up Kconfig dependencies and include files. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-30 16:06:00 +01:00
Ingo Molnar	2a38b1c4f1	kmemtrace: move #include lines Impact: avoid conflicts with kmemcheck kmemcheck modifies the same area of slab.c and slub.c - move the include lines up a bit. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-30 06:56:21 +01:00
Eduard - Gabriel Munteanu	36555751c6	kmemtrace: SLAB hooks. This adds hooks for the SLAB allocator, to allow tracing with kmemtrace. We also convert some inline functions to __always_inline to make sure _RET_IP_, which expands to __builtin_return_address(0), always works as expected. Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-12-29 15:34:04 +02:00
Eduard - Gabriel Munteanu	35995a4d81	SLUB: Replace __builtin_return_address(0) with _RET_IP_. This patch replaces __builtin_return_address(0) with _RET_IP_, since a previous patch moved _RET_IP_ and _THIS_IP_ to include/linux/kernel.h and they're widely available now. This makes for shorter and easier to read code. [penberg@cs.helsinki.fi: remove _RET_IP_ casts to void pointer] Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-12-29 15:33:59 +02:00
Pekka Enberg	3c506efd7e	Merge branch 'topic/failslab' into for-linus Conflicts: mm/slub.c Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-12-29 11:47:05 +02:00
Pekka Enberg	fd37617e69	Merge branches 'topic/fixes', 'topic/cleanups' and 'topic/documentation' into for-linus	2008-12-29 11:45:47 +02:00
Pekka Enberg	8759ec50a6	slab: remove GFP_THISNODE clearing from alloc_slabmgmt() Commit `6cb062296f` ("Categorize GFP flags") left one call-site in alloc_slabmgmt() to clear GFP_THISNODE instead of GFP_CONSTRAINT_MASK. Unfortunately, that ends up clearing __GFP_NOWARN and __GFP_NORETRY as well which is not what we want. As the only caller of alloc_slabmgmt() already clears GFP_CONSTRAINT_MASK before passing local_flags to it, we can just remove the clearing of GFP_THISNODE. This patch should fix spurious page allocation failure warnings on the mempool_alloc() path. See the following URL for the original discussion of the bug: http://lkml.org/lkml/2008/10/27/100 Acked-by: Christoph Lameter <cl@linux-foundation.org> Reported-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-12-29 11:40:53 +02:00
Akinobu Mita	773ff60e84	SLUB: failslab support Currently fault-injection capability for SLAB allocator is only available to SLAB. This patch makes it available to SLUB, too. [penberg@cs.helsinki.fi: unify slab and slub implementations] Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Matt Mackall <mpm@selenic.com> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-12-29 11:27:46 +02:00
Catalin Marinas	249da16658	slab: Update the kmem_cache_create documentation regarding the name parameter kmem_cache implementations like slub are allowed to merge multiple caches but only the initial name is preserved. Therefore, kmem_cache_name() is not guaranteed to return the same pointer passed to the former function. This patch updates the documentation to make this clearer. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-11-26 16:48:47 +02:00
roel kluin	249b9f331e	slab: unsigned slabp->inuse cannot be less than 0 unsigned slabp->inuse cannot be less than 0 Acked-by: Christoph Lameter <cl@linux-foundation.org Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-11-26 16:47:26 +02:00
Eduard - Gabriel Munteanu	ce71e27c6f	SLUB: Replace __builtin_return_address(0) with _RET_IP_. This patch replaces __builtin_return_address(0) with _RET_IP_, since a previous patch moved _RET_IP_ and _THIS_IP_ to include/linux/kernel.h and they're widely available now. This makes for shorter and easier to read code. [penberg@cs.helsinki.fi: remove _RET_IP_ casts to void pointer] Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-11-26 16:47:25 +02:00
Alexey Dobriyan	7b3c3a50a3	proc: move /proc/slabinfo boilerplate to mm/slub.c, mm/slab.c Lose dummy ->write hook in case of SLUB, it's possible now. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-10-23 15:20:06 +04:00
Alexey Dobriyan	a0ec95a8e6	proc: move /proc/slab_allocators boilerplate to mm/slab.c Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-10-23 15:17:27 +04:00
Adrian Bunk	231367fd9b	mm: unexport ksize This patch removes the obsolete and no longer used exports of ksize. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-07-29 23:44:26 +03:00
Alexey Dobriyan	51cc50685a	SL*B: drop kmem cache argument from constructor Kmem cache passed to constructor is only needed for constructors that are themselves multiplexeres. Nobody uses this "feature", nor does anybody uses passed kmem cache in non-trivial way, so pass only pointer to object. Non-trivial places are: arch/powerpc/mm/init_64.c arch/powerpc/mm/hugetlbpage.c This is flag day, yes. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Christoph Lameter <cl@linux-foundation.org> Cc: Jon Tollefson <kniht@linux.vnet.ibm.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Matt Mackall <mpm@selenic.com> [akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c] [akpm@linux-foundation.org: fix mm/slab.c] [akpm@linux-foundation.org: fix ubifs] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:07 -07:00
Ingo Molnar	1a781a777b	Merge branch 'generic-ipi' into generic-ipi-for-linus Conflicts: arch/powerpc/Kconfig arch/s390/kernel/time.c arch/x86/kernel/apic_32.c arch/x86/kernel/cpu/perfctr-watchdog.c arch/x86/kernel/i8259_64.c arch/x86/kernel/ldt.c arch/x86/kernel/nmi_64.c arch/x86/kernel/smpboot.c arch/x86/xen/smp.c include/asm-x86/hw_irq_32.h include/asm-x86/hw_irq_64.h include/asm-x86/mach-default/irq_vectors.h include/asm-x86/mach-voyager/irq_vectors.h include/asm-x86/smp.h kernel/Makefile Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-15 21:55:59 +02:00
Rabin Vincent	e79aec291d	slab: rename slab_destroy_objs With the removal of destructors, slab_destroy_objs no longer actually destroys any objects, making the kernel doc incorrect and the function name misleading. In keeping with the other debug functions, rename it to slab_destroy_debugcheck and drop the kernel doc. Signed-off-by: Rabin Vincent <rabin@rab.in> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-07-15 20:36:02 +03:00
Jens Axboe	15c8b6c1aa	on_each_cpu(): kill unused 'retry' parameter It's not even passed on to smp_call_function() anymore, since that was removed. So kill it. Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-06-26 11:24:38 +02:00
Christoph Lameter	481c5346d0	Slab: Fix memory leak in fallback_alloc() The zonelist patches caused the loop that checks for available objects in permitted zones to not terminate immediately. One object per zone per allocation may be allocated and then abandoned. Break the loop when we have successfully allocated one object. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-06-21 16:51:02 -07:00
Harvey Harrison	d40cee245f	mm: remove remaining __FUNCTION__ occurrences __FUNCTION__ is gcc-specific, use __func__ Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-30 08:29:53 -07:00
Thomas Gleixner	3ac7fe5a4a	infrastructure to debug (dynamic) objects We can see an ever repeating problem pattern with objects of any kind in the kernel: 1) freeing of active objects 2) reinitialization of active objects Both problems can be hard to debug because the crash happens at a point where we have no chance to decode the root cause anymore. One problem spot are kernel timers, where the detection of the problem often happens in interrupt context and usually causes the machine to panic. While working on a timer related bug report I had to hack specialized code into the timer subsystem to get a reasonable hint for the root cause. This debug hack was fine for temporary use, but far from a mergeable solution due to the intrusiveness into the timer code. The code further lacked the ability to detect and report the root cause instantly and keep the system operational. Keeping the system operational is important to get hold of the debug information without special debugging aids like serial consoles and special knowledge of the bug reporter. The problems described above are not restricted to timers, but timers tend to expose it usually in a full system crash. Other objects are less explosive, but the symptoms caused by such mistakes can be even harder to debug. Instead of creating specialized debugging code for the timer subsystem a generic infrastructure is created which allows developers to verify their code and provides an easy to enable debug facility for users in case of trouble. The debugobjects core code keeps track of operations on static and dynamic objects by inserting them into a hashed list and sanity checking them on object operations and provides additional checks whenever kernel memory is freed. The tracked object operations are: - initializing an object - adding an object to a subsystem list - deleting an object from a subsystem list Each operation is sanity checked before the operation is executed and the subsystem specific code can provide a fixup function which allows to prevent the damage of the operation. When the sanity check triggers a warning message and a stack trace is printed. The list of operations can be extended if the need arises. For now it's limited to the requirements of the first user (timers). The core code enqueues the objects into hash buckets. The hash index is generated from the address of the object to simplify the lookup for the check on kfree/vfree. Each bucket has it's own spinlock to avoid contention on a global lock. The debug code can be compiled in without being active. The runtime overhead is minimal and could be optimized by asm alternatives. A kernel command line option enables the debugging code. Thanks to Ingo Molnar for review, suggestions and cleanup patches. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Greg KH <greg@kroah.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-30 08:29:53 -07:00
Pekka Enberg	1b27d05b6e	mm: move cache_line_size() to <linux/cache.h> Not all architectures define cache_line_size() so as suggested by Andrew move the private implementations in mm/slab.c and mm/slob.c to <linux/cache.h>. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Reviewed-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:19 -07:00
Mel Gorman	dd1a239f6f	mm: have zonelist contains structs with both a zone pointer and zone_idx Filtering zonelists requires very frequent use of zone_idx(). This is costly as it involves a lookup of another structure and a substraction operation. As the zone_idx is often required, it should be quickly accessible. The node idx could also be stored here if it was found that accessing zone->node is significant which may be the case on workloads where nodemasks are heavily used. This patch introduces a struct zoneref to store a zone pointer and a zone index. The zonelist then consists of an array of these struct zonerefs which are looked up as necessary. Helpers are given for accessing the zone index as well as the node index. [kamezawa.hiroyu@jp.fujitsu.com: Suggested struct zoneref instead of embedding information in pointers] [hugh@veritas.com: mm-have-zonelist: fix memcg ooms] [hugh@veritas.com: just return do_try_to_free_pages] [hugh@veritas.com: do_try_to_free_pages gfp_mask redundant] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Christoph Lameter <clameter@sgi.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Christoph Lameter <clameter@sgi.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:18 -07:00
Mel Gorman	54a6eb5c47	mm: use two zonelist that are filtered by GFP mask Currently a node has two sets of zonelists, one for each zone type in the system and a second set for GFP_THISNODE allocations. Based on the zones allowed by a gfp mask, one of these zonelists is selected. All of these zonelists consume memory and occupy cache lines. This patch replaces the multiple zonelists per-node with two zonelists. The first contains all populated zones in the system, ordered by distance, for fallback allocations when the target/preferred node has no free pages. The second contains all populated zones in the node suitable for GFP_THISNODE allocations. An iterator macro is introduced called for_each_zone_zonelist() that interates through each zone allowed by the GFP flags in the selected zonelist. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Christoph Lameter <clameter@sgi.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:18 -07:00
Mel Gorman	0e88460da6	mm: introduce node_zonelist() for accessing the zonelist for a GFP mask Introduce a node_zonelist() helper function. It is used to lookup the appropriate zonelist given a node and a GFP mask. The patch on its own is a cleanup but it helps clarify parts of the two-zonelist-per-node patchset. If necessary, it can be merged with the next patch in this set without problems. Reviewed-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Christoph Lameter <clameter@sgi.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:18 -07:00
Mike Travis	c5f59f0833	nodemask: use new node_to_cpumask_ptr function * Use new node_to_cpumask_ptr. This creates a pointer to the cpumask for a given node. This definition is in mm patch: asm-generic-add-node_to_cpumask_ptr-macro.patch * Use new set_cpus_allowed_ptr function. Depends on: [mm-patch]: asm-generic-add-node_to_cpumask_ptr-macro.patch [sched-devel]: sched: add new set_cpus_allowed_ptr function [x86/latest]: x86: add cpus_scnprintf function Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Greg Banks <gnb@melbourne.sgi.com> Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-19 19:44:59 +02:00
Daniel Yeisley	ec1f5eeeb5	slab: fix cache_cache bootstrap in kmem_cache_init() Commit `556a169dab` ("slab: fix bootstrap on memoryless node") introduced bootstrap-time cache_cache list3s for all nodes but forgot that initkmem_list3 needs to be accessed by [somevalue + node]. This patch fixes list_add() corruption in mm/slab.c seen on the ES7000. Cc: Mel Gorman <mel@csn.ul.ie> Cc: Olaf Hering <olaf@aepfle.de> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: Dan Yeisley <dan.yeisley@unisys.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Christoph Lameter <clameter@sgi.com>	2008-03-26 10:44:17 -07:00
Randy Dunlap	7682486b3e	mm: fix various kernel-doc comments Fix various kernel-doc notation in mm/: filemap.c: add function short description; convert 2 to kernel-doc fremap.c: change parameter 'prot' to @prot pagewalk.c: change "-" in function parameters to ":" slab.c: fix short description of kmem_ptr_validate() swap.c: fix description & parameters of put_pages_list() swap_state.c: fix function parameters vmalloc.c: change "@returns" to "Returns:" since that is not a parameter Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-19 18:53:35 -07:00
Joe Korty	6d2144d355	slab: NUMA slab allocator migration bugfix NUMA slab allocator cpu migration bugfix The NUMA slab allocator (specifically, cache_alloc_refill) is not refreshing its local copies of what cpu and what numa node it is on, when it drops and reacquires the irq block that it inherited from its caller. As a result those values become invalid if an attempt to migrate the process to another numa node occured while the irq block had been dropped. The solution is to make cache_alloc_refill reload these variables whenever it drops and reacquires the irq block. The error is very difficult to hit. When it does occur, one gets the following oops + stack traceback bits in check_spinlock_acquired: kernel BUG at mm/slab.c:2417 cache_alloc_refill+0xe6 kmem_cache_alloc+0xd0 ... This patch was developed against 2.6.23, ported to and compiled-tested only against 2.6.25-rc4. Signed-off-by: Joe Korty <joe.korty@ccur.com> Signed-off-by: Christoph Lameter <clameter@sgi.com>	2008-03-06 16:21:50 -08:00
Joe Perches	1c61fc40fc	slab - use angle brackets for include of kmalloc_sizes.h Make them all use angle brackets and the directory name. Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Christoph Lameter <clameter@sgi.com>	2008-03-06 16:21:49 -08:00
Christoph Lameter	9ac33b2b74	slab numa fallback logic: Do not pass unfiltered flags to page allocator The NUMA fallback logic should be passing local_flags to kmem_get_pages() and not simply the flags passed in. Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Christoph Lameter <clameter@sgi.com>	2008-03-06 16:21:49 -08:00
Marcin Slusarz	e51bfd0ad1	slab: avoid double initialization & do initialization in 1 place - alloc_slabmgmt: initialize all slab fields in 1 place - slab->nodeid was initialized twice: in alloc_slabmgmt and immediately after it in cache_grow Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> CC: Christoph Lameter <clameter@sgi.com> Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Christoph Lameter <clameter@sgi.com>	2008-02-14 15:30:01 -08:00
Gautham R Shenoy	95402b3829	cpu-hotplug: replace per-subsystem mutexes with get_online_cpus() This patch converts the known per-subsystem mutexes to get_online_cpus put_online_cpus. It also eliminates the CPU_LOCK_ACQUIRE and CPU_LOCK_RELEASE hotplug notification events. Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-01-25 21:08:02 +01:00
Pekka Enberg	556a169dab	slab: fix bootstrap on memoryless node If the node we're booting on doesn't have memory, bootstrapping kmalloc() caches resorts to fallback_alloc() which requires ->nodelists set for all nodes. Fix that by calling set_up_list3s() for CACHE_CACHE in kmem_cache_init(). As kmem_getpages() is called with GFP_THISNODE set, this used to work before because of breakage in 2.6.22 and before with GFP_THISNODE returning pages from the wrong node if a node had no memory. So it may have worked accidentally and in an unsafe manner because the pages would have been associated with the wrong node which could trigger bug ons and locking troubles. Tested-by: Mel Gorman <mel@csn.ul.ie> Tested-by: Olaf Hering <olaf@aepfle.de> Reviewed-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> [ With additional one-liner by Olaf Hering - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-01-25 08:30:36 -08:00
Mel Gorman	9c09a95cf4	slab: partially revert list3 changes Partial revert the changes made by `04231b3002` to the kmem_list3 management. On a machine with a memoryless node, this BUG_ON was triggering static void ____cache_alloc_node(struct kmem_cache cachep, gfp_t flags, int nodeid) { struct list_head entry; struct slab slabp; struct kmem_list3 l3; void obj; int x; l3 = cachep->nodelists[nodeid]; BUG_ON(!l3); Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Christoph Lameter <clameter@sgi.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Nishanth Aravamudan <nacc@us.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-01-24 08:07:27 -08:00
Linus Torvalds	158a962422	Unify /proc/slabinfo configuration Both SLUB and SLAB really did almost exactly the same thing for /proc/slabinfo setup, using duplicate code and per-allocator #ifdef's. This just creates a common CONFIG_SLABINFO that is enabled by both SLUB and SLAB, and shares all the setup code. Maybe SLOB will want this some day too. Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-01-02 13:04:48 -08:00
Tetsuo Handa	f8fcc93319	Add EXPORT_SYMBOL(ksize); mm/slub.c exports ksize(), but mm/slob.c and mm/slab.c don't. It's used by binfmt_flat, which can be built as a module. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Christoph Lameter <clameter@sgi.com> Cc: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-12-05 09:21:18 -08:00
Matthew Wilcox	80cbd911ca	Fix kmem_cache_free performance regression in slab The database performance group have found that half the cycles spent in kmem_cache_free are spent in this one call to BUG_ON. Moving it into the CONFIG_SLAB_DEBUG-only function cache_free_debugcheck() is a performance win of almost 0.5% on their particular benchmark. The call was added as part of commit `ddc2e812d5` with the comment that "overhead should be minimal". It may have been minimal at the time, but it isn't now. [ Quoth Pekka Enberg: "I don't think the BUG_ON per se caused the performance regression but rather the virt_to_head_page() changes to virt_to_cache() that were added later." ] Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Acked-by: Pekka J Enberg <penberg@cs.helsinki.fi> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-11-30 08:08:05 -08:00
Akinobu Mita	cc550defe9	slab: fix typo in allocation failure handling This patch fixes wrong array index in allocation failure handling. Cc: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-11-14 18:45:36 -08:00
Simon Arlott	183ff22bb6	spelling fixes: mm/ Spelling fixes in mm/. Signed-off-by: Simon Arlott <simon@fire.lp0.eu> Signed-off-by: Adrian Bunk <bunk@kernel.org>	2007-10-20 01:27:18 +02:00
Akinobu Mita	12d00f6a12	cpu hotplug: slab: fix memory leak in cpu hotplug error path This patch fixes memory leak in error path. In reality, we don't need to call cpuup_canceled(cpu) for now. But upcoming cpu hotplug error handling change needs this. Cc: Christoph Lameter <clameter@sgi.com> Cc: Gautham R Shenoy <ego@in.ibm.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Gautham R Shenoy <ego@in.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:21 -07:00
Akinobu Mita	fbf1e473bd	cpu hotplug: slab: cleanup cpuup_callback() cpuup_callback() is too long. This patch factors out CPU_UP_CANCELLED and CPU_UP_PREPARE handlings from cpuup_callback(). Cc: Christoph Lameter <clameter@sgi.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Gautham R Shenoy <ego@in.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:21 -07:00
Robert P. J. Day	bda5b655fe	Delete gcc-2.95 compatible structure definition. Since nothing earlier than gcc-3.2 is supported for kernel compilation, that 2.95 hack can be removed. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:42:58 -07:00
Christoph Lameter	4ba9b9d0ba	Slab API: remove useless ctor parameter and reorder parameters Slab constructors currently have a flags parameter that is never used. And the order of the arguments is opposite to other slab functions. The object pointer is placed before the kmem_cache pointer. Convert ctor(void object, struct kmem_cache s, unsigned long flags) to ctor(struct kmem_cache s, void object) throughout the kernel [akpm@linux-foundation.org: coupla fixes] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:42:45 -07:00
Mel Gorman	e12ba74d8f	Group short-lived and reclaimable kernel allocations This patch marks a number of allocations that are either short-lived such as network buffers or are reclaimable such as inode allocations. When something like updatedb is called, long-lived and unmovable kernel allocations tend to be spread throughout the address space which increases fragmentation. This patch groups these allocations together as much as possible by adding a new MIGRATE_TYPE. The MIGRATE_RECLAIMABLE type is for allocations that can be reclaimed on demand, but not moved. i.e. they can be migrated by deleting them and re-reading the information from elsewhere. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: Andy Whitcroft <apw@shadowen.org> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:43:00 -07:00
Christoph Lameter	6cb062296f	Categorize GFP flags The function of GFP_LEVEL_MASK seems to be unclear. In order to clear up the mystery we get rid of it and replace GFP_LEVEL_MASK with 3 sets of GFP flags: GFP_RECLAIM_MASK Flags used to control page allocator reclaim behavior. GFP_CONSTRAINT_MASK Flags used to limit where allocations can occur. GFP_SLAB_BUG_MASK Flags that the slab allocator BUG()s on. These replace the uses of GFP_LEVEL mask in the slab allocators and in vmalloc.c. The use of the flags not included in these sets may occur as a result of a slab allocation standing in for a page allocation when constructing scatter gather lists. Extraneous flags are cleared and not passed through to the page allocator. __GFP_MOVABLE/RECLAIMABLE, __GFP_COLD and __GFP_COMP will now be ignored if passed to a slab allocator. Change the allocation of allocator meta data in SLAB and vmalloc to not pass through flags listed in GFP_CONSTRAINT_MASK. SLAB already removes the __GFP_THISNODE flag for such allocations. Generalize that to also cover vmalloc. The use of GFP_CONSTRAINT_MASK also includes __GFP_HARDWALL. The impact of allocator metadata placement on access latency to the cachelines of the object itself is minimal since metadata is only referenced on alloc and free. The attempt is still made to place the meta data optimally but we consistently allow fallback both in SLAB and vmalloc (SLUB does not need to allocate metadata like that). Allocator metadata may serve multiple in kernel users and thus should not be subject to the limitations arising from a single allocation context. [akpm@linux-foundation.org: fix fallback_alloc()] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:42:59 -07:00
Christoph Lameter	04231b3002	Memoryless nodes: Slab support Slab should not allocate control structures for nodes without memory. This may seem to work right now but its unreliable since not all allocations can fall back due to the use of GFP_THISNODE. Switching a few for_each_online_node's to N_NORMAL_MEMORY will allow us to only allocate for nodes that have regular memory. Signed-off-by: Christoph Lameter <clameter@sgi.com> Acked-by: Nishanth Aravamudan <nacc@us.ibm.com> Acked-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Acked-by: Bob Picco <bob.picco@hp.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@skynet.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:42:58 -07:00
Christoph Lameter	ef8b4520bd	Slab allocators: fail if ksize is called with a NULL parameter A NULL pointer means that the object was not allocated. One cannot determine the size of an object that has not been allocated. Currently we return 0 but we really should BUG() on attempts to determine the size of something nonexistent. krealloc() interprets NULL to mean a zero sized object. Handle that separately in krealloc(). Signed-off-by: Christoph Lameter <clameter@sgi.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:42:53 -07:00
Siddha, Suresh B	1807a1aaf5	slab: skip calling cache_free_alien() when the platform is not numa capable Skip calling cache_free_alien() when the platform is not numa capable. This will avoid cache misses that happen while accessing slabp (which is per page memory reference) to get nodeid. Instead use a global variable to skip the call, which is mostly likely to be present in the cache. This gives a 0.8% performance boost with the database oltp workload on a quad-core SMP platform and by any means the number is not small :) Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-22 19:52:46 -07:00
Andrew Morton	b8c1c5da15	slab: correctly handle __GFP_ZERO Use the correct local variable when calling into the page allocator. Local `flags' can have __GFP_ZERO set, which causes us to pass __GFP_ZERO into the page allocator, possibly from illegal contexts. The page allocator will later do prep_zero_page()->kmap_atomic(..., KM_USER0) from irq contexts and will then go BUG. Cc: Mike Galbraith <efault@gmx.de> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-24 12:24:59 -07:00
Paul Mundt	20c2df83d2	mm: Remove slab destructors from kmem_cache_create(). Slab destructors were no longer supported after Christoph's `c59def9f22` change. They've been BUGs for both slab and slub, and slob never supported them either. This rips out support for the dtor pointer from kmem_cache_create() completely and fixes up every single callsite in the kernel (there were about 224, not including the slab allocator definitions themselves, or the documentation references). Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2007-07-20 10:11:58 +09:00
Linus Torvalds	a5c96d8a1c	Fix up non-NUMA SLAB configuration for zero-sized allocations I suspect Christoph tested his code only in the NUMA configuration, for the combination of SLAB+non-NUMA the zero-sized kmalloc's would not work. Of course, this would only trigger in configurations where those zero- sized allocations happen (not very common), so that may explain why it wasn't more widely noticed. Seen by by Andi Kleen under qemu, and there seems to be a report by Michael Tsirkin on it too. Cc: Andi Kleen <ak@suse.de> Cc: Roland Dreier <rdreier@cisco.com> Cc: Michael S. Tsirkin <mst@dev.mellanox.co.il> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Christoph Lameter <clameter@sgi.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 13:17:15 -07:00
David Howells	ea02e3dde3	FRV: work around a possible compiler bug Work around a possible bug in the FRV compiler. What appears to be happening is that gcc resolves the __builtin_constant_p() in kmalloc() to true, but then fails to reduce the therefore constant conditions in the if-statements it guards to constant results. When compiling with -O2 or -Os, one single spurious error crops up in cpuup_callback() in mm/slab.c. This can be avoided by making the memsize variable const. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:50 -07:00
Tejun Heo	9281acea6a	kallsyms: make KSYM_NAME_LEN include space for trailing '\0' KSYM_NAME_LEN is peculiar in that it does not include the space for the trailing '\0', forcing all users to use KSYM_NAME_LEN + 1 when allocating buffer. This is nonsense and error-prone. Moreover, when the caller forgets that it's very likely to subtly bite back by corrupting the stack because the last position of the buffer is always cleared to zero. This patch increments KSYM_NAME_LEN by one and updates code accordingly. * off-by-one bug in asm-powerpc/kprobes.h::kprobe_lookup_name() macro is fixed. * Where MODULE_NAME_LEN and KSYM_NAME_LEN were used together, MODULE_NAME_LEN was treated as if it didn't include space for the trailing '\0'. Fix it. Signed-off-by: Tejun Heo <htejun@gmail.com> Acked-by: Paulo Marques <pmarques@grupopie.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:03 -07:00
Christoph Lameter	81cda66261	Slab allocators: Cleanup zeroing allocations It becomes now easy to support the zeroing allocs with generic inline functions in slab.h. Provide inline definitions to allow the continued use of kzalloc, kmem_cache_zalloc etc but remove other definitions of zeroing functions from the slab allocators and util.c. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:01 -07:00
Christoph Lameter	d07dbea464	Slab allocators: support __GFP_ZERO in all allocators A kernel convention for many allocators is that if __GFP_ZERO is passed to an allocator then the allocated memory should be zeroed. This is currently not supported by the slab allocators. The inconsistency makes it difficult to implement in derived allocators such as in the uncached allocator and the pool allocators. In addition the support zeroed allocations in the slab allocators does not have a consistent API. There are no zeroing allocator functions for NUMA node placement (kmalloc_node, kmem_cache_alloc_node). The zeroing allocations are only provided for default allocs (kzalloc, kmem_cache_zalloc_node). __GFP_ZERO will make zeroing universally available and does not require any addititional functions. So add the necessary logic to all slab allocators to support __GFP_ZERO. The code is added to the hot path. The gfp flags are on the stack and so the cacheline is readily available for checking if we want a zeroed object. Zeroing while allocating is now a frequent operation and we seem to be gradually approaching a 1-1 parity between zeroing and not zeroing allocs. The current tree has 3476 uses of kmalloc vs 2731 uses of kzalloc. Signed-off-by: Christoph Lameter <clameter@sgi.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:01 -07:00
Christoph Lameter	6cb8f91320	Slab allocators: consistent ZERO_SIZE_PTR support and NULL result semantics Define ZERO_OR_NULL_PTR macro to be able to remove the checks from the allocators. Move ZERO_SIZE_PTR related stuff into slab.h. Make ZERO_SIZE_PTR work for all slab allocators and get rid of the WARN_ON_ONCE(size == 0) that is still remaining in SLAB. Make slub return NULL like the other allocators if a too large memory segment is requested via __kmalloc. Signed-off-by: Christoph Lameter <clameter@sgi.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:01 -07:00
Christoph Lameter	ef2ad80c7d	Slab allocators: consolidate code for krealloc in mm/util.c The size of a kmalloc object is readily available via ksize(). ksize is provided by all allocators and thus we can implement krealloc in a generic way. Implement krealloc in mm/util.c and drop slab specific implementations of krealloc. Signed-off-by: Christoph Lameter <clameter@sgi.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:01 -07:00
Adrian Bunk	897e679b17	mm/slab.c: start_cpu_timer() should be __cpuinit start_cpu_timer() should be __cpuinit (which also matches what it's callers are). __devinit didn't cause problems, it simply wasted a few bytes of memory for the common CONFIG_HOTPLUG_CPU=n case. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:36 -07:00
Pavel Emelianov	b92151bab9	Make /proc/slabinfo use seq_list_xxx helpers This entry prints a header in .start callback. This is OK, but the more elegant solution would be to move this into the .show callback and use seq_list_start_head() in .start one. I have left it as is in order to make the patch just switch to new API and noting more. [adobriyan@sw.ru: Wrong pointer was used as kmem_cache pointer] Signed-off-by: Pavel Emelianov <xemul@openvz.org> Cc: Christoph Lameter <clameter@sgi.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-16 09:05:35 -07:00
David Woodhouse	87a927c715	Fix slab redzone alignment Commit `b46b8f19c9` fixed a couple of bugs by switching the redzone to 64 bits. Unfortunately, it neglected to ensure that the _second_ redzone, after the slab object, is aligned correctly. This caused illegal instruction faults on sparc32, which for some reason not entirely clear to me are not trapped and fixed up. Two things need to be done to fix this: - increase the object size, rounding up to alignof(long long) so that the second redzone can be aligned correctly. - If SLAB_STORE_USER is set but alignof(long long)==8, allow a full 64 bits of space for the user word at the end of the buffer, even though we may not _use_ the whole 64 bits. This patch should be a no-op on any 64-bit architecture or any 32-bit architecture where alignof(long long) == 4. Of the others, it's tested on ppc32 by myself and a very similar patch was tested on sparc32 by Mark Fortescue, who reported the new problem. Also, fix the conditions for FORCED_DEBUG, which hadn't been adjusted to the new sizes. Again noticed by Mark. Signed-off-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-05 15:54:13 -07:00
Christoph Lameter	17022220dd	SLAB: remove WARN_ON_ONCE for zero sized objects for 2.6.22 release We agreed to remove the WARN_ON_ONCE before 2.6.22 is released. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-01 12:29:43 -07:00
Christoph Lameter	3cdc0ed0ce	slab: fix alien cache handling cache_free_alien must be called regardless if we use alien caches or not. cache_free_alien() will do the right thing if there are no alien caches available. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Paul Mundt <lethal@linux-sh.org> Acked-by: Pekka J Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-06-08 17:23:32 -07:00
Sam Ravnborg	38bdc32af4	mm/slab: fix section mismatch warning Use the new __init_refok marker to avoid the section mismatch warning from slab.c Signed-off-by: Sam Ravnborg <sam@ravnborg.org>	2007-05-19 09:11:58 +02:00
Christoph Lameter	0aa817f078	Slab allocators: define common size limitations Currently we have a maze of configuration variables that determine the maximum slab size. Worst of all it seems to vary between SLAB and SLUB. So define a common maximum size for kmalloc. For conveniences sake we use the maximum size ever supported which is 32 MB. We limit the maximum size to a lower limit if MAX_ORDER does not allow such large allocations. For many architectures this patch will have the effect of adding large kmalloc sizes. x86_64 adds 5 new kmalloc sizes. So a small amount of memory will be needed for these caches (contemporary SLAB has dynamically sizeable node and cpu structure so the waste is less than in the past) Most architectures will then be able to allocate object with sizes up to MAX_ORDER. We have had repeated breakage (in fact whenever we doubled the number of supported processors) on IA64 because one or the other struct grew beyond what the slab allocators supported. This will avoid future issues and f.e. avoid fixes for 2k and 4k cpu support. CONFIG_LARGE_ALLOCS is no longer necessary so drop it. It fixes sparc64 with SLAB. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-17 05:23:04 -07:00
Christoph Lameter	a35afb830f	Remove SLAB_CTOR_CONSTRUCTOR SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: David Howells <dhowells@redhat.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Steven French <sfrench@us.ibm.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Dave Kleikamp <shaggy@austin.ibm.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Anton Altaparmakov <aia21@cantab.net> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@ucw.cz> Cc: David Chinner <dgc@sgi.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-17 05:23:04 -07:00
Christoph Lameter	0b44f7a5b5	slab: warn on zero-length allocations slub warns on this, and we're working on making kmalloc(0) return NULL. Let's make slab warn as well so our testers detect such callers more rapidly. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-17 05:23:03 -07:00
Christoph Lameter	c59def9f22	Slab allocators: Drop support for destructors There is no user of destructors left. There is no reason why we should keep checking for destructors calls in the slab allocators. The RFC for this patch was discussed at http://marc.info/?l=linux-kernel&m=117882364330705&w=2 Destructors were mainly used for list management which required them to take a spinlock. Taking a spinlock in a destructor is a bit risky since the slab allocators may run the destructors anytime they decide a slab is no longer needed. Patch drops destructor support. Any attempt to use a destructor will BUG(). Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-17 05:23:03 -07:00
Christoph Lameter	4037d45220	Move remote node draining out of slab allocators Currently the slab allocators contain callbacks into the page allocator to perform the draining of pagesets on remote nodes. This requires SLUB to have a whole subsystem in order to be compatible with SLAB. Moving node draining out of the slab allocators avoids a section of code in SLUB. Move the node draining so that is is done when the vm statistics are updated. At that point we are already touching all the cachelines with the pagesets of a processor. Add a expire counter there. If we have to update per zone or global vm statistics then assume that the pageset will require subsequent draining. The expire counter will be decremented on each vm stats update pass until it reaches zero. Then we will drain one batch from the pageset. The draining will cause vm counter updates which will then cause another expiration until the pcp is empty. So we will drain a batch every 3 seconds. Note that remote node draining is a somewhat esoteric feature that is required on large NUMA systems because otherwise significant portions of system memory can become trapped in pcp queues. The number of pcp is determined by the number of processors and nodes in a system. A system with 4 processors and 2 nodes has 8 pcps which is okay. But a system with 1024 processors and 512 nodes has 512k pcps with a high potential for large amount of memory being caught in them. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:56 -07:00
Christoph Lameter	d1187ed210	vmstat: use our own timer events vmstat is currently using the cache reaper to periodically bring the statistics up to date. The cache reaper does only exists in SLUB as a way to provide compatibility with SLAB. This patch removes the vmstat calls from the slab allocators and provides its own handling. The advantage is also that we can use a different frequency for the updates. Refreshing vm stats is a pretty fast job so we can run this every second and stagger this by only one tick. This will lead to some overlap in large systems. F.e a system running at 250 HZ with 1024 processors will have 4 vm updates occurring at once. However, the vm stats update only accesses per node information. It is only necessary to stagger the vm statistics updates per processor in each node. Vm counter updates occurring on distant nodes will not cause cacheline contention. We could implement an alternate approach that runs the first processor on each node at the second and then each of the other processor on a node on a subsequent tick. That may be useful to keep a large amount of the second free of timer activity. Maybe the timer folks will have some feedback on this one? [jirislaby@gmail.com: add missing break] Cc: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:56 -07:00
Rafael J. Wysocki	8bb7844286	Add suspend-related notifications for CPU hotplug Since nonboot CPUs are now disabled after tasks and devices have been frozen and the CPU hotplug infrastructure is used for this purpose, we need special CPU hotplug notifications that will help the CPU-hotplug-aware subsystems distinguish normal CPU hotplug events from CPU hotplug events related to a system-wide suspend or resume operation in progress. This patch introduces such notifications and causes them to be used during suspend and resume transitions. It also changes all of the CPU-hotplug-aware subsystems to take these notifications into consideration (for now they are handled in the same way as the corresponding "normal" ones). [oleg@tv-sign.ru: cleanups] Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Gautham R Shenoy <ego@in.ibm.com> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:56 -07:00
Christoph Lameter	5830c59021	slab: shut down cache_reaper when cpu goes down Shutdown the cache_reaper if the cpu is brought down and set the cache_reap.func to NULL. Otherwise hotplug shuts down the reaper for good. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:53 -07:00
Heiko Carstens	38c3bd96a0	slab: use CPU_LOCK_[ACQUIRE\|RELEASE] Looks like this was forgotten when CPU_LOCK_[ACQUIRE\|RELEASE] was introduced. Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Gautham Shenoy <ego@in.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:51 -07:00
Pekka J Enberg	7ae439ce0c	krealloc: fix kerneldoc comments No "blank" (or "*") line is allowed between the function name and lines for it parameter(s). Cc: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:46 -07:00
Alexey Dobriyan	a5c43dae7a	Fix race between cat /proc/slab_allocators and rmmod Same story as with cat /proc/*/wchan race vs rmmod race, only /proc/slab_allocators want more info than just symbol name. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:08 -07:00
David Woodhouse	b46b8f19c9	Increase slab redzone to 64bits There are two problems with the existing redzone implementation. Firstly, it's causing misalignment of structures which contain a 64-bit integer, such as netfilter's 'struct ipt_entry' -- causing netfilter modules to fail to load because of the misalignment. (In particular, the first check in net/ipv4/netfilter/ip_tables.c::check_entry_size_and_hooks()) On ppc32 and sparc32, amongst others, __alignof__(uint64_t) == 8. With slab debugging, we use 32-bit redzones. And allocated slab objects aren't sufficiently aligned to hold a structure containing a uint64_t. By _just_ setting ARCH_KMALLOC_MINALIGN to __alignof__(u64) we'd disable redzone checks on those architectures. By using 64-bit redzones we avoid that loss of debugging, and also fix the other problem while we're at it. When investigating this, I noticed that on 64-bit platforms we're using a 32-bit value of RED_ACTIVE/RED_INACTIVE in the 64-bit memory location set aside for the redzone. Which means that the four bytes immediately before or after the allocated object at 0x00,0x00,0x00,0x00 for LE and BE machines, respectively. Which is probably not the most useful choice of poison value. One way to fix both of those at once is just to switch to 64-bit redzones in all cases. Signed-off-by: David Woodhouse <dwmw2@infradead.org> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Christoph Lameter <clameter@engr.sgi.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:14:57 -07:00
Christoph Lameter	cfce66047f	Slab allocators: remove useless __GFP_NO_GROW flag There is no user remaining and I have never seen any use of that flag. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:57 -07:00
Christoph Lameter	4f10493459	slab allocators: Remove SLAB_CTOR_ATOMIC SLAB_CTOR atomic is never used which is no surprise since I cannot imagine that one would want to do something serious in a constructor or destructor. In particular given that the slab allocators run with interrupts disabled. Actions in constructors and destructors are by their nature very limited and usually do not go beyond initializing variables and list operations. (The i386 pgd ctor and dtors do take a spinlock in constructor and destructor..... I think that is the furthest we go at this point.) There is no flag passed to the destructor so removing SLAB_CTOR_ATOMIC also establishes a certain symmetry. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:57 -07:00
Christoph Lameter	50953fe9e0	slab allocators: Remove SLAB_DEBUG_INITIAL flag I have never seen a use of SLAB_DEBUG_INITIAL. It is only supported by SLAB. I think its purpose was to have a callback after an object has been freed to verify that the state is the constructor state again? The callback is performed before each freeing of an object. I would think that it is much easier to check the object state manually before the free. That also places the check near the code object manipulation of the object. Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was compiled with SLAB debugging on. If there would be code in a constructor handling SLAB_DEBUG_INITIAL then it would have to be conditional on SLAB_DEBUG otherwise it would just be dead code. But there is no such code in the kernel. I think SLUB_DEBUG_INITIAL is too problematic to make real use of, difficult to understand and there are easier ways to accomplish the same effect (i.e. add debug code before kfree). There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be clear in fs inode caches. Remove the pointless checks (they would even be pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors. This is the last slab flag that SLUB did not support. Remove the check for unimplemented flags from SLUB. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:57 -07:00
Akinobu Mita	824ebef122	fault injection: fix failslab with CONFIG_NUMA Currently failslab injects failures into ____cache_alloc(). But with enabling CONFIG_NUMA it's not enough to let actual slab allocator functions (kmalloc, kmem_cache_alloc, ...) return NULL. This patch moves fault injection hook inside of __cache_alloc() and __cache_alloc_node(). These are lower call path than ____cache_alloc() and enable to inject faulures to slab allocators with CONFIG_NUMA. Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:55 -07:00
Christoph Lameter	5af6083990	slab allocators: Remove obsolete SLAB_MUST_HWCACHE_ALIGN This patch was recently posted to lkml and acked by Pekka. The flag SLAB_MUST_HWCACHE_ALIGN is 1. Never checked by SLAB at all. 2. A duplicate of SLAB_HWCACHE_ALIGN for SLUB 3. Fulfills the role of SLAB_HWCACHE_ALIGN for SLOB. The only remaining use is in sparc64 and ppc64 and their use there reflects some earlier role that the slab flag once may have had. If its specified then SLAB_HWCACHE_ALIGN is also specified. The flag is confusing, inconsistent and has no purpose. Remove it. Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:55 -07:00
matze	b4169525bc	include KERN_* constant in printk() calls in mm/slab.c Signed-off-by: Matthias Kaehlcke <matthias.kaehlcke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:54 -07:00
Christoph Lameter	b49af68ff9	Add virt_to_head_page and consolidate code in slab and slub Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:54 -07:00
Christoph Lameter	d85f33855c	Make page->private usable in compound pages If we add a new flag so that we can distinguish between the first page and the tail pages then we can avoid to use page->private in the first page. page->private == page for the first page, so there is no real information in there. Freeing up page->private makes the use of compound pages more transparent. They become more usable like real pages. Right now we have to be careful f.e. if we are going beyond PAGE_SIZE allocations in the slab on i386 because we can then no longer use the private field. This is one of the issues that cause us not to support debugging for page size slabs in SLAB. Having page->private available for SLUB would allow more meta information in the page struct. I can probably avoid the 16 bit ints that I have in there right now. Also if page->private is available then a compound page may be equipped with buffer heads. This may free up the way for filesystems to support larger blocks than page size. We add PageTail as an alias of PageReclaim. Compound pages cannot currently be reclaimed. Because of the alias one needs to check PageCompound first. The RFC for the this approach was discussed at http://marc.info/?t=117574302800001&r=1&w=2 [nacc@us.ibm.com: fix hugetlbfs] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:53 -07:00
Andrew Morton	a3a02be791	slab: mark set_up_list3s() __init It is only ever used prior to free_initmem(). (It will cause a warning when we run the section checking, but that's a false-positive and it simply changes the source of an existing warning, which is also a false-positive) Cc: Christoph Lameter <clameter@engr.sgi.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:53 -07:00
Eric Dumazet	8da3430d8a	slab: NUMA kmem_cache diet Some NUMA machines have a big MAX_NUMNODES (possibly 1024), but fewer possible nodes. This patch dynamically sizes the 'struct kmem_cache' to allocate only needed space. I moved nodelists[] field at the end of struct kmem_cache, and use the following computation in kmem_cache_init() cache_cache.buffer_size = offsetof(struct kmem_cache, nodelists) + nr_node_ids * sizeof(struct kmem_list3 *); On my two nodes x86_64 machine, kmem_cache.obj_size is now 192 instead of 704 (This is because on x86_64, MAX_NUMNODES is 64) On bigger NUMA setups, this might reduce the gfporder of "cache_cache" Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Andy Whitcroft <apw@shadowen.org> Cc: Christoph Lameter <clameter@engr.sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:53 -07:00
Eric Dumazet	6310984694	SLAB: don't allocate empty shared caches We can avoid allocating empty shared caches and avoid unecessary check of cache->limit. We save some memory. We avoid bringing into CPU cache unecessary cache lines. All accesses to l3->shared are already checking NULL pointers so this patch is safe. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Christoph Lameter <clameter@engr.sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:53 -07:00
Eric Dumazet	364fbb29a0	SLAB: use num_possible_cpus() in enable_cpucache() The existing comment in mm/slab.c is perfect, so I reproduce it : /* * CPU bound tasks (e.g. network routing) can exhibit cpu bound * allocation behaviour: Most allocs on one cpu, most free operations * on another cpu. For these cases, an efficient object passing between * cpus is necessary. This is provided by a shared array. The array * replaces Bonwick's magazine layer. * On uniprocessor, it's functionally equivalent (but less efficient) * to a larger limit. Thus disabled by default. */ As most shiped linux kernels are now compiled with CONFIG_SMP, there is no way a preprocessor #if can detect if the machine is UP or SMP. Better to use num_possible_cpus(). This means on UP we allocate a 'size=0 shared array', to be more efficient. Another patch can later avoid the allocations of 'empty shared arrays', to save some memory. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:52 -07:00
Pekka Enberg	714b8171af	slab: ensure cache_alloc_refill terminates If slab->inuse is corrupted, cache_alloc_refill can enter an infinite loop as detailed by Michael Richardson in the following post: <http://lkml.org/lkml/2007/2/16/292>. This adds a BUG_ON to catch those cases. Cc: Michael Richardson <mcr@sandelman.ca> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:51 -07:00
Pekka Enberg	fd76bab2fa	slab: introduce krealloc This introduce krealloc() that reallocates memory while keeping the contents unchanged. The allocator avoids reallocation if the new size fits the currently used cache. I also added a simple non-optimized version for mm/slob.c for compatibility. [akpm@linux-foundation.org: fix warnings] Acked-by: Josef Sipek <jsipek@fsl.cs.sunysb.edu> Acked-by: Matt Mackall <mpm@selenic.com> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:50 -07:00

1 2 3 4 5 ...

430 Commits