linux/mm
Rik van Riel a79311c14e vmscan: bail out of direct reclaim after swap_cluster_max pages
When the VM is under pressure, it can happen that several direct reclaim
processes are in the pageout code simultaneously.  It also happens that
the reclaiming processes run into mostly referenced, mapped and dirty
pages in the first round.

This results in multiple direct reclaim processes having a lower
pageout priority, which corresponds to a higher target of pages to
scan.

This in turn can result in each direct reclaim process freeing
many pages.  Together, they can end up freeing way too many pages.

This kicks useful data out of memory (in some cases more than half
of all memory is swapped out).  It also impacts performance by
keeping tasks stuck in the pageout code for too long.

A 30% improvement in hackbench has been observed with this patch.

The fix is relatively simple: in shrink_zone() we can check how many
pages we have already freed, direct reclaim tasks break out of the
scanning loop if they have already freed enough pages and have reached
a lower priority level.

We do not break out of shrink_zone() when priority == DEF_PRIORITY,
to ensure that equal pressure is applied to every zone in the common
case.

However, in order to do this we do need to know how many pages we already
freed, so move nr_reclaimed into scan_control.

akpm: a historical interlude...

We tried this in 2004:

:commit e468e46a9bea3297011d5918663ce6d19094cf87
:Author: akpm <akpm>
:Date:   Thu Jun 24 15:53:52 2004 +0000
:
:[PATCH] vmscan.c: dont reclaim too many pages
:
:    The shrink_zone() logic can, under some circumstances, cause far too many
:    pages to be reclaimed.  Say, we're scanning at high priority and suddenly hit
:    a large number of reclaimable pages on the LRU.
:    Change things so we bale out when SWAP_CLUSTER_MAX pages have been reclaimed.

And we reverted it in 2006:

:commit 210fe53030
:Author: Andrew Morton <akpm@osdl.org>
:Date:   Fri Jan 6 00:11:14 2006 -0800
:
:    [PATCH] vmscan: balancing fix
:
:    Revert a patch which went into 2.6.8-rc1.  The changelog for that patch was:
:
:      The shrink_zone() logic can, under some circumstances, cause far too many
:      pages to be reclaimed.  Say, we're scanning at high priority and suddenly
:      hit a large number of reclaimable pages on the LRU.
:
:      Change things so we bale out when SWAP_CLUSTER_MAX pages have been
:      reclaimed.
:
:    Problem is, this change caused significant imbalance in inter-zone scan
:    balancing by truncating scans of larger zones.
:
:    Suppose, for example, ZONE_HIGHMEM is 10x the size of ZONE_NORMAL.  The zone
:    balancing algorithm would require that if we're scanning 100 pages of
:    ZONE_HIGHMEM, we should scan 10 pages of ZONE_NORMAL.  But this logic will
:    cause the scanning of ZONE_HIGHMEM to bale out after only 32 pages are
:    reclaimed.  Thus effectively causing smaller zones to be scanned relatively
:    harder than large ones.
:
:    Now I need to remember what the workload was which caused me to write this
:    patch originally, then fix it up in a different way...

And we haven't demonstrated that whatever problem caused that reversion is
not being reintroduced by this change in 2008.

Signed-off-by: Rik van Riel <riel@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:06 -08:00
..
allocpercpu.c
backing-dev.c mm: change dirty limit type specifiers to unsigned long 2009-01-06 15:59:02 -08:00
bootmem.c misc: replace __FUNCTION__ with __func__ 2008-10-16 11:21:30 -07:00
bounce.c bounce: don't rely on a zeroed bio_vec list 2008-12-29 08:29:52 +01:00
dmapool.c
fadvise.c Remove Andrew Morton's old email accounts 2008-10-16 11:21:32 -07:00
failslab.c SLUB: failslab support 2008-12-29 11:27:46 +02:00
filemap_xip.c
filemap.c mm: write_cache_pages integrity fix 2009-01-06 15:58:59 -08:00
fremap.c mmap: handle mlocked pages during map, remap, unmap 2008-10-20 08:52:31 -07:00
highmem.c
hugetlb.c hugetlb: fix sparse warnings 2009-01-06 15:59:06 -08:00
internal.h hugetlb: pull gigantic page initialisation out of the default path 2008-11-06 15:41:18 -08:00
Kconfig Unevictable LRU Infrastructure 2008-10-20 08:50:26 -07:00
maccess.c
madvise.c
Makefile SLUB: failslab support 2008-12-29 11:27:46 +02:00
memcontrol.c mm: make mem_cgroup_resize_limit() static 2009-01-06 15:59:04 -08:00
memory_hotplug.c mm: remove GFP_HIGHUSER_PAGECACHE 2009-01-06 15:59:01 -08:00
memory.c mm: make maddr __iomem 2009-01-06 15:59:04 -08:00
mempolicy.c Merge branch 'master' into next 2008-11-14 11:29:12 +11:00
mempool.c
migrate.c mm: add Set,ClearPageSwapCache stubs 2009-01-06 15:59:02 -08:00
mincore.c
mlock.c x86, bts: memory accounting 2008-12-20 09:15:47 +01:00
mm_init.c
mmap.c mm: update my address 2009-01-05 17:44:42 -08:00
mmu_notifier.c
mmzone.c
mprotect.c mm: cleanup: remove #ifdef CONFIG_MIGRATION 2009-01-06 15:59:00 -08:00
mremap.c mm: update my address 2009-01-05 17:44:42 -08:00
msync.c add a vfs_fsync helper 2009-01-05 11:54:28 -05:00
nommu.c inode->i_op is never NULL 2009-01-05 11:54:28 -05:00
oom_kill.c oom: print triggering task's cpuset and mems allowed 2009-01-06 15:58:59 -08:00
page_alloc.c mm: make setup_per_zone_inactive_ratio() static 2009-01-06 15:59:05 -08:00
page_cgroup.c mm: make init_section_page_cgroup() static 2009-01-06 15:59:04 -08:00
page_io.c mm: try_to_free_swap replaces remove_exclusive_swap_page 2009-01-06 15:59:03 -08:00
page_isolation.c memory hotplug: fix page_zone() calculation in test_pages_isolated() 2008-11-06 15:41:19 -08:00
page-writeback.c mm: add dirty_background_bytes and dirty_bytes sysctls 2009-01-06 15:59:03 -08:00
pagewalk.c
pdflush.c cpumask: convert mm/ 2009-01-01 10:12:29 +10:30
prio_tree.c
quicklist.c
readahead.c vmscan: split LRU lists into anon & file sets 2008-10-20 08:50:25 -07:00
rmap.c mm: further cleanup page_add_new_anon_rmap 2009-01-06 15:59:02 -08:00
shmem_acl.c
shmem.c mm: don't mark_page_accessed in shmem_fault 2009-01-06 15:58:58 -08:00
slab.c cpumask: convert mm/ 2009-01-01 10:12:29 +10:30
slob.c slob: do not pass the SLAB flags as GFP in kmem_cache_create() 2008-12-15 16:27:06 -08:00
slub.c cpumask: convert mm/ 2009-01-01 10:12:29 +10:30
sparse-vmemmap.c vmemmap: warn about page_structs with remote distance 2008-11-06 15:41:19 -08:00
sparse.c meminit section warnings 2008-11-30 10:03:35 -08:00
swap_state.c mm: remove gfp_mask from add_to_swap 2009-01-06 15:59:04 -08:00
swap.c mm: try_to_free_swap replaces remove_exclusive_swap_page 2009-01-06 15:59:03 -08:00
swapfile.c swapfile: let others seed random 2009-01-06 15:59:06 -08:00
thrash.c
tiny-shmem.c Export tiny shmem_file_setup for DRM-GEM 2008-10-20 16:17:42 -07:00
truncate.c mmap: handle mlocked pages during map, remap, unmap 2008-10-20 08:52:31 -07:00
util.c
vmalloc.c mm: vmalloc make lazy unmapping configurable 2009-01-06 15:59:01 -08:00
vmscan.c vmscan: bail out of direct reclaim after swap_cluster_max pages 2009-01-06 15:59:06 -08:00
vmstat.c cpumask: convert mm/ 2009-01-01 10:12:29 +10:30