linux/mm
Mel Gorman dac1d27bc8 mm: use zonelists instead of zones when direct reclaiming pages
The following patches replace multiple zonelists per node with two zonelists
that are filtered based on the GFP flags.  The patches as a set fix a bug with
regard to the use of MPOL_BIND and ZONE_MOVABLE.  With this patchset, the
MPOL_BIND will apply to the two highest zones when the highest zone is
ZONE_MOVABLE.  This should be considered as an alternative fix for the
MPOL_BIND+ZONE_MOVABLE in 2.6.23 to the previously discussed hack that filters
only custom zonelists.

The first patch cleans up an inconsistency where direct reclaim uses
zonelist->zones where other places use zonelist.

The second patch introduces a helper function node_zonelist() for looking up
the appropriate zonelist for a GFP mask which simplifies patches later in the
set.

The third patch defines/remembers the "preferred zone" for numa statistics, as
it is no longer always the first zone in a zonelist.

The forth patch replaces multiple zonelists with two zonelists that are
filtered.  The two zonelists are due to the fact that the memoryless patchset
introduces a second set of zonelists for __GFP_THISNODE.

The fifth patch introduces helper macros for retrieving the zone and node
indices of entries in a zonelist.

The final patch introduces filtering of the zonelists based on a nodemask.
Two zonelists exist per node, one for normal allocations and one for
__GFP_THISNODE.

Performance results varied depending on the machine configuration.  In real
workloads the gain/loss will depend on how much the userspace portion of the
benchmark benefits from having more cache available due to reduced referencing
of zonelists.

These are the range of performance losses/gains when running against
2.6.24-rc4-mm1.  The set and these machines are a mix of i386, x86_64 and
ppc64 both NUMA and non-NUMA.
			     loss   to  gain
Total CPU time on Kernbench: -0.86% to  1.13%
Elapsed   time on Kernbench: -0.79% to  0.76%
page_test from aim9:         -4.37% to  0.79%
brk_test  from aim9:         -0.71% to  4.07%
fork_test from aim9:         -1.84% to  4.60%
exec_test from aim9:         -0.71% to  1.08%

This patch:

The allocator deals with zonelists which indicate the order in which zones
should be targeted for an allocation.  Similarly, direct reclaim of pages
iterates over an array of zones.  For consistency, this patch converts direct
reclaim to use a zonelist.  No functionality is changed by this patch.  This
simplifies zonelist iterators in the next patch.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:18 -07:00
..
Kconfig sh: Bump number of quicklists for SH-5. 2008-01-28 13:18:55 +09:00
Makefile uaccess: add probe_kernel_write() 2008-04-17 20:05:36 +02:00
allocpercpu.c cpumask: Cleanup more uses of CPU_MASK and NODE_MASK 2008-04-19 19:44:58 +02:00
backing-dev.c mm/backing-dev.c: fix percpu_counter_destroy call bug in bdi_init 2007-12-05 09:21:18 -08:00
bootmem.c mm: allow reserve_bootmem() cross nodes 2008-04-26 22:51:08 +02:00
bounce.c block: Initial support for data-less (or empty) barrier support 2007-10-16 11:03:56 +02:00
dmapool.c pool: Improve memory usage for devices which can't cross boundaries 2007-12-04 10:39:58 -05:00
fadvise.c check ADVICE of fadvise64_64 even if get_xip_page is given 2008-02-05 09:44:19 -08:00
filemap.c mm: fix various kernel-doc comments 2008-03-19 18:53:35 -07:00
filemap_xip.c Use pgoff_t instead of unsigned long 2008-02-08 09:22:32 -08:00
fremap.c mm: fix various kernel-doc comments 2008-03-19 18:53:35 -07:00
highmem.c mm: highmem kernel-doc additions 2008-03-19 18:53:35 -07:00
hugetlb.c hugetlb: fix potential livelock in return_unused_surplus_hugepages() 2008-03-26 15:01:33 -07:00
internal.h Solve section mismatch for free_area_init_core. 2008-02-23 17:13:24 -08:00
maccess.c kgdb: fix optional arch functions and probe_kernel_* 2008-04-17 20:05:39 +02:00
madvise.c speed up madvise_need_mmap_write() usage 2007-07-16 09:05:36 -07:00
memcontrol.c memcg: fix node_state handling 2008-04-08 18:25:53 -07:00
memory.c mm: remove nopage 2008-04-28 08:58:18 -07:00
memory_hotplug.c hotplug-memory: make online_page() common 2008-04-28 08:58:17 -07:00
mempolicy.c mempolicy: fix reference counting bugs 2008-03-10 18:01:19 -07:00
mempool.c spelling fixes: mm/ 2007-10-20 01:27:18 +02:00
migrate.c memcg: fix VM_BUG_ON from page migration 2008-03-04 16:35:14 -08:00
mincore.c mm: remove nopage 2008-04-28 08:58:18 -07:00
mlock.c do not limit locked memory when RLIMIT_MEMLOCK is RLIM_INFINITY 2007-07-16 09:05:37 -07:00
mmap.c mmap_region: cleanup the final vma_merge() related code 2008-04-28 08:58:18 -07:00
mmzone.c [PATCH] remove EXPORT_UNUSED_SYMBOL'ed symbols 2006-12-07 08:39:44 -08:00
mprotect.c fix mprotect vma_wants_writenotify prot 2007-10-23 08:32:06 -07:00
mremap.c sparse pointer use of zero as null 2007-10-18 14:37:31 -07:00
msync.c Detach sched.h from mm.h 2007-05-21 09:18:19 -07:00
nommu.c nommu: add new vmalloc_user() and remap_vmalloc_range() interfaces. 2008-02-05 09:44:21 -08:00
oom_kill.c memcg: fix oops in oom handling 2008-04-15 19:35:40 -07:00
page-writeback.c writeback: speed up writeback of big dirty files 2008-02-05 09:44:19 -08:00
page_alloc.c mm: use zonelists instead of zones when direct reclaiming pages 2008-04-28 08:58:18 -07:00
page_io.c mm: fix PageUptodate data race 2008-02-05 09:44:19 -08:00
page_isolation.c memory hotremove: unset migrate type "ISOLATE" after removal 2007-11-14 18:45:38 -08:00
pagewalk.c mm: fix possible off-by-one in walk_pte_range() 2008-04-28 08:58:16 -07:00
pdflush.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/juhl/trivial 2008-04-21 16:36:46 -07:00
prio_tree.c spelling fixes: mm/ 2007-10-20 01:27:18 +02:00
quicklist.c quicklists: Only consider memory that can be used with GFP_KERNEL 2008-01-14 08:52:22 -08:00
readahead.c mm/readahead: fix kernel-doc notation 2008-03-19 18:53:37 -07:00
rmap.c mm: remove nopage 2008-04-28 08:58:18 -07:00
shmem.c mm/shmem and tiny-shmem: fix some kernel-doc 2008-03-19 18:53:35 -07:00
shmem_acl.c [PATCH] Fix typos in mm/shmem_acl.c 2006-10-11 11:14:23 -07:00
slab.c nodemask: use new node_to_cpumask_ptr function 2008-04-19 19:44:59 +02:00
slob.c slob: reduce external fragmentation by using three free lists 2008-02-05 09:44:19 -08:00
slub.c slab_err: Pass parameters correctly to slab_bug 2008-04-23 12:47:48 -07:00
sparse-vmemmap.c NULL noise: fs/*, mm/*, kernel/* 2008-03-30 14:18:41 -07:00
sparse.c hotplug memory remove: generic __remove_pages() support 2008-04-28 08:58:17 -07:00
swap.c mm: fix various kernel-doc comments 2008-03-19 18:53:35 -07:00
swap_state.c mm: fix various kernel-doc comments 2008-03-19 18:53:35 -07:00
swapfile.c d_path: Make seq_path() use a struct path argument 2008-02-14 21:17:08 -08:00
thrash.c Bug in mm/thrash.c function grab_swap_token() 2007-05-11 08:29:32 -07:00
tiny-shmem.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2008-03-25 08:57:47 -07:00
truncate.c fix invalidate_inode_pages2_range() to not clear ret 2008-04-28 08:58:18 -07:00
util.c fix mm/util.c:krealloc() 2007-11-14 18:45:41 -08:00
vmalloc.c mm: fix various kernel-doc comments 2008-03-19 18:53:35 -07:00
vmscan.c mm: use zonelists instead of zones when direct reclaiming pages 2008-04-28 08:58:18 -07:00
vmstat.c add "Isolate" migratetype name to /proc/pagetypeinfo 2008-04-15 19:35:41 -07:00