linux/mm
Lee Schermerhorn 480eccf9ae Fix NUMA Memory Policy Reference Counting
This patch proposes fixes to the reference counting of memory policy in the
page allocation paths and in show_numa_map().  Extracted from my "Memory
Policy Cleanups and Enhancements" series as stand-alone.

Shared policy lookup [shmem] has always added a reference to the policy,
but this was never unrefed after page allocation or after formatting the
numa map data.

Default system policy should not require additional ref counting, nor
should the current task's task policy.  However, show_numa_map() calls
get_vma_policy() to examine what may be [likely is] another task's policy.
The latter case needs protection against freeing of the policy.

This patch adds a reference count to a mempolicy returned by
get_vma_policy() when the policy is a vma policy or another task's
mempolicy.  Again, shared policy is already reference counted on lookup.  A
matching "unref" [__mpol_free()] is performed in alloc_page_vma() for
shared and vma policies, and in show_numa_map() for shared and another
task's mempolicy.  We can call __mpol_free() directly, saving an admittedly
inexpensive inline NULL test, because we know we have a non-NULL policy.

Handling policy ref counts for hugepages is a bit trickier.
huge_zonelist() returns a zone list that might come from a shared or vma
'BIND policy.  In this case, we should hold the reference until after the
huge page allocation in dequeue_hugepage().  The patch modifies
huge_zonelist() to return a pointer to the mempolicy if it needs to be
unref'd after allocation.

Kernel Build [16cpu, 32GB, ia64] - average of 10 runs:

		w/o patch	w/ refcount patch
	    Avg	  Std Devn	   Avg	  Std Devn
Real:	 100.59	    0.38	 100.63	    0.43
User:	1209.60	    0.37	1209.91	    0.31
System:   81.52	    0.42	  81.64	    0.34

Signed-off-by:  Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Andi Kleen <ak@suse.de>
Cc: Christoph Lameter <clameter@sgi.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-19 11:24:18 -07:00
..
allocpercpu.c Slab allocators: Replace explicit zeroing with __GFP_ZERO 2007-07-17 10:23:02 -07:00
backing-dev.c
bootmem.c
bounce.c [BLOCK] Get rid of request_queue_t typedef 2007-07-24 09:28:11 +02:00
fadvise.c
filemap_xip.c mm: fault feedback #2 2007-07-19 10:04:41 -07:00
filemap.c Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block 2007-08-11 16:01:06 -07:00
filemap.h
fremap.c only allow nonlinear vmas for ram backed filesystems 2007-07-19 10:04:41 -07:00
highmem.c
hugetlb.c Fix NUMA Memory Policy Reference Counting 2007-09-19 11:24:18 -07:00
internal.h
Kconfig Replace CONFIG_SOFTWARE_SUSPEND with CONFIG_HIBERNATION 2007-07-29 16:45:38 -07:00
madvise.c
Makefile CONFIG_BOUNCE to avoid useless inclusion of bounce buffer logic 2007-07-17 10:23:02 -07:00
memory_hotplug.c
memory.c remove handle_mm_fault export 2007-07-21 17:49:16 -07:00
mempolicy.c Fix NUMA Memory Policy Reference Counting 2007-09-19 11:24:18 -07:00
mempool.c Slab allocators: Replace explicit zeroing with __GFP_ZERO 2007-07-17 10:23:02 -07:00
migrate.c fix rcu_read_lock() in page migraton 2007-08-31 01:42:22 -07:00
mincore.c
mlock.c
mmap.c fix NULL pointer dereference in __vm_enough_memory() 2007-08-22 19:52:45 -07:00
mmzone.c
mprotect.c mm: variable length argument support 2007-07-19 10:04:45 -07:00
mremap.c mm: variable length argument support 2007-07-19 10:04:45 -07:00
msync.c
nommu.c fix NULL pointer dereference in __vm_enough_memory() 2007-08-22 19:52:45 -07:00
oom_kill.c oom: print points as unsigned long 2007-07-31 15:39:36 -07:00
page_alloc.c process_zones(): fix recovery code 2007-08-31 01:42:22 -07:00
page_io.c
page-writeback.c move page writeback acounting out of macros 2007-07-19 10:04:52 -07:00
pdflush.c Freezer: make kernel threads nonfreezable by default 2007-07-17 10:23:02 -07:00
prio_tree.c
quicklist.c
readahead.c readahead: sanify file_ra_state names 2007-07-19 10:04:44 -07:00
rmap.c mm: Remove slab destructors from kmem_cache_create(). 2007-07-20 10:11:58 +09:00
shmem_acl.c
shmem.c mm: Remove slab destructors from kmem_cache_create(). 2007-07-20 10:11:58 +09:00
slab.c slab: skip calling cache_free_alien() when the platform is not numa capable 2007-08-22 19:52:46 -07:00
slob.c slob: reduce list scanning 2007-07-21 17:49:16 -07:00
slub.c SLUB: accurately compare debug flags during slab cache merge 2007-09-11 17:21:27 -07:00
sparse.c sparsemem: ensure we initialise the node mapping for SPARSEMEM_STATIC 2007-08-22 19:52:44 -07:00
swap_state.c
swap.c
swapfile.c Replace CONFIG_SOFTWARE_SUSPEND with CONFIG_HIBERNATION 2007-07-29 16:45:38 -07:00
thrash.c
tiny-shmem.c
truncate.c mm: merge populate and nopage into fault (fixes nonlinear) 2007-07-19 10:04:41 -07:00
util.c add kstrndup 2007-07-18 08:47:39 -07:00
vmalloc.c lguest: export symbols for lguest as a module 2007-07-19 10:04:52 -07:00
vmscan.c synchronous lumpy reclaim: wait for page writeback when directly reclaiming contiguous areas 2007-08-22 19:52:45 -07:00
vmstat.c Remove fs.h from mm.h 2007-07-29 17:09:29 -07:00