linux/mm
David Rientjes 119d6d59dc mm, compaction: avoid isolating pinned pages
Page migration will fail for memory that is pinned in memory with, for
example, get_user_pages().  In this case, it is unnecessary to take
zone->lru_lock or isolating the page and passing it to page migration
which will ultimately fail.

This is a racy check, the page can still change from under us, but in
that case we'll just fail later when attempting to move the page.

This avoids very expensive memory compaction when faulting transparent
hugepages after pinning a lot of memory with a Mellanox driver.

On a 128GB machine and pinning ~120GB of memory, before this patch we
see the enormous disparity in the number of page migration failures
because of the pinning (from /proc/vmstat):

	compact_pages_moved 8450
	compact_pagemigrate_failed 15614415

0.05% of pages isolated are successfully migrated and explicitly
triggering memory compaction takes 102 seconds.  After the patch:

	compact_pages_moved 9197
	compact_pagemigrate_failed 7

99.9% of pages isolated are now successfully migrated in this
configuration and memory compaction takes less than one second.

Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:01 -07:00
..
Kconfig mm/Kconfig: fix URL for zsmalloc benchmark 2014-03-10 17:26:20 -07:00
Kconfig.debug mm: more intensive memory corruption debugging 2012-01-10 16:30:42 -08:00
Makefile mm: thrash detection-based file cache sizing 2014-04-03 16:21:01 -07:00
backing-dev.c bdi: avoid oops on device removal 2014-04-03 16:20:49 -07:00
balloon_compaction.c mm: print more details for bad_page() 2014-01-23 16:36:50 -08:00
bootmem.c mm/bootmem.c: remove unused local `map' 2013-11-13 12:09:09 +09:00
bounce.c block: Convert bio_for_each_segment() to bvec_iter 2013-11-23 22:33:49 -08:00
cleancache.c mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE 2014-01-23 16:36:50 -08:00
compaction.c mm, compaction: avoid isolating pinned pages 2014-04-03 16:21:01 -07:00
debug-pagealloc.c mm, x86: Remove debug_pagealloc_enabled 2011-12-06 09:24:07 +01:00
dmapool.c dmapool: make DMAPOOL_DEBUG detect corruption of free marker 2012-12-11 17:22:24 -08:00
fadvise.c teach SYSCALL_DEFINE<n> how to deal with long long/unsigned long long 2013-03-03 22:46:22 -05:00
failslab.c switch debugfs to umode_t 2012-01-03 22:54:56 -05:00
filemap.c mm: keep page cache radix tree nodes in check 2014-04-03 16:21:01 -07:00
filemap_xip.c seqcount: Add lockdep functionality to seqcount/seqlock structures 2013-11-06 12:40:26 +01:00
fremap.c mm: fix bad rss-counter if remap_file_pages raced migration 2014-03-19 16:21:49 -07:00
frontswap.c frontswap: fix incorrect zeroing and allocation size for frontswap_map 2013-06-12 16:29:46 -07:00
highmem.c Some nice cleanups, and even a patch my wife did as a "live" demo for 2012-12-20 08:37:05 -08:00
huge_memory.c mm: include VM_MIXEDMAP flag in the VM_SPECIAL list to avoid m(un)locking 2014-03-04 07:55:48 -08:00
hugetlb.c mm, hugetlb: mark some bootstrap functions as __init 2014-04-03 16:21:01 -07:00
hugetlb_cgroup.c mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE 2014-01-23 16:36:50 -08:00
hwpoison-inject.c mm/hwpoison: add '#' to hwpoison_inject 2014-01-21 16:19:48 -08:00
init-mm.c atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
internal.h mm/page-writeback.c: do not count anon pages as dirtyable memory 2014-01-29 16:22:39 -08:00
interval_tree.c mm: add CONFIG_DEBUG_VM_RB build option 2012-10-09 16:22:42 +09:00
kmemcheck.c
kmemleak-test.c kmemleak: remove memset by using kzalloc 2011-01-27 18:31:51 +00:00
kmemleak.c kmemleak: change some global variables to int 2014-04-03 16:20:50 -07:00
ksm.c mm: close PageTail race 2014-03-04 07:55:47 -08:00
list_lru.c mm: keep page cache radix tree nodes in check 2014-04-03 16:21:01 -07:00
maccess.c mm: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
madvise.c mm/hwpoison: fix traversal of hugetlbfs pages to avoid printk flood 2013-09-30 14:31:02 -07:00
memblock.c memblock: add limit checking to memblock_virt_alloc 2014-01-29 16:22:40 -08:00
memcontrol.c memcg: reparent charges of children before processing parent 2014-03-04 07:55:48 -08:00
memory-failure.c mm: close PageTail race 2014-03-04 07:55:47 -08:00
memory.c mm, thp: fix infinite loop on memcg OOM 2014-02-25 15:25:44 -08:00
memory_hotplug.c mm/memory_hotplug.c: move register_memory_resource out of the lock_memory_hotplug 2014-01-23 16:36:52 -08:00
mempolicy.c mm: optimize put_mems_allowed() usage 2014-04-03 16:20:58 -07:00
mempool.c mm/mempool.c: convert kmalloc_node(...GFP_ZERO...) to kzalloc_node(...) 2013-09-11 15:58:14 -07:00
migrate.c mm: fix swapops.h:131 bug if remap_file_pages raced migration 2014-03-20 22:09:09 -07:00
mincore.c mm + fs: prepare for non-page entries in page cache radix trees 2014-04-03 16:21:00 -07:00
mlock.c mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE 2014-01-23 16:36:50 -08:00
mm_init.c mm: bring back /sys/kernel/mm 2014-01-27 21:02:39 -08:00
mmap.c mm: Add new func _install_special_mapping() to mmap.c 2014-03-18 12:51:56 -07:00
mmu_context.c sched/mm: call finish_arch_post_lock_switch in idle_task_exit and use_mm 2014-02-21 08:50:17 +01:00
mmu_notifier.c mm: audit/fix non-modular users of module_init in core code 2014-01-23 16:36:52 -08:00
mmzone.c mm: numa: Change page last {nid,pid} into {cpu,pid} 2013-10-09 14:47:45 +02:00
mprotect.c mm: Use ptep/pmdp_set_numa() for updating _PAGE_NUMA bit 2014-02-17 11:19:36 +11:00
mremap.c mm: revert mremap pud_free anti-fix 2013-10-16 21:35:53 -07:00
msync.c
nobootmem.c mm/nobootmem: free_all_bootmem again 2014-01-23 16:36:52 -08:00
nommu.c mm: add overcommit_kbytes sysctl variable 2014-01-21 16:19:44 -08:00
oom_kill.c mm, oom: base root bonus on current usage 2014-01-30 16:56:56 -08:00
page-writeback.c mm: __set_page_dirty_nobuffers() uses spin_lock_irqsave() instead of spin_lock_irq() 2014-02-06 13:48:51 -08:00
page_alloc.c mm: optimize put_mems_allowed() usage 2014-04-03 16:20:58 -07:00
page_cgroup.c Merge branch 'akpm' (incoming from Andrew) 2014-01-21 19:05:45 -08:00
page_io.c Merge branch 'for-3.14/core' of git://git.kernel.dk/linux-block 2014-01-30 11:19:05 -08:00
page_isolation.c mm: memory-hotplug: enable memory hotplug to handle hugepage 2013-09-11 15:57:48 -07:00
pagewalk.c mm/pagewalk.c: fix walk_page_range() access of wrong PTEs 2013-10-30 14:27:03 -07:00
percpu-km.c
percpu-vm.c mm: fix kernel-doc warnings 2012-06-20 14:39:36 -07:00
percpu.c percpu: renew the max_contig if we merge the head and previous block 2014-03-29 09:29:42 -04:00
pgtable-generic.c mm: fix TLB flush race between migration, and change_protection_range 2013-12-18 19:04:51 -08:00
process_vm_access.c mm/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types 2014-03-06 16:30:47 +01:00
quicklist.c mm: delete various needless include <linux/module.h> 2011-10-31 09:20:11 -04:00
readahead.c mm + fs: prepare for non-page entries in page cache radix trees 2014-04-03 16:21:00 -07:00
rmap.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux 2014-03-31 14:35:30 -07:00
shmem.c mm + fs: prepare for non-page entries in page cache radix trees 2014-04-03 16:21:00 -07:00
slab.c mm: optimize put_mems_allowed() usage 2014-04-03 16:20:58 -07:00
slab.h memcg, slab: RCU protect memcg_params for root caches 2014-01-23 16:36:51 -08:00
slab_common.c slab: fix wrong retval on kmem_cache_create_memcg error path 2014-01-29 16:22:40 -08:00
slob.c mm/sl[aou]b: Move kmallocXXX functions to common code 2013-09-04 20:51:33 +03:00
slub.c mm: optimize put_mems_allowed() usage 2014-04-03 16:20:58 -07:00
sparse-vmemmap.c mm/sparse: use memblock apis for early memory allocations 2014-01-21 16:19:47 -08:00
sparse.c sparse: fix comment 2014-04-02 09:16:17 +02:00
swap.c mm: thrash detection-based file cache sizing 2014-04-03 16:21:01 -07:00
swap_state.c swap: add a simple detector for inappropriate swapin readahead 2014-02-06 13:48:51 -08:00
swapfile.c mm/swap: fix race on swap_info reuse between swapoff and swapon 2014-02-06 13:48:51 -08:00
truncate.c mm: keep page cache radix tree nodes in check 2014-04-03 16:21:01 -07:00
util.c mm: add overcommit_kbytes sysctl variable 2014-01-21 16:19:44 -08:00
vmalloc.c Revert "mm/vmalloc: interchage the implementation of vmalloc_to_{pfn,page}" 2014-01-27 21:02:39 -08:00
vmpressure.c arm, pm, vmpressure: add missing slab.h includes 2014-02-03 13:24:01 -05:00
vmscan.c mm: thrash detection-based file cache sizing 2014-04-03 16:21:01 -07:00
vmstat.c mm: keep page cache radix tree nodes in check 2014-04-03 16:21:01 -07:00
workingset.c mm: keep page cache radix tree nodes in check 2014-04-03 16:21:01 -07:00
zbud.c mm/zbud: fix some trivial typos in comments 2013-09-11 15:57:35 -07:00
zsmalloc.c zsmalloc: add copyright 2014-01-30 16:56:55 -08:00
zswap.c mm/zswap.c: change params from hidden to ro 2014-01-23 16:36:50 -08:00