linux/kernel
Eric B Munson 5bbe3547aa mm: allow compaction of unevictable pages
Currently, pages which are marked as unevictable are protected from
compaction, but not from other types of migration.  The POSIX real time
extension explicitly states that mlock() will prevent a major page
fault, but the spirit of this is that mlock() should give a process the
ability to control sources of latency, including minor page faults.
However, the mlock manpage only explicitly says that a locked page will
not be written to swap and this can cause some confusion.  The
compaction code today does not give a developer who wants to avoid swap
but wants to have large contiguous areas available any method to achieve
this state.  This patch introduces a sysctl for controlling compaction
behavior with respect to the unevictable lru.  Users who demand no page
faults after a page is present can set compact_unevictable_allowed to 0
and users who need the large contiguous areas can enable compaction on
locked memory by leaving the default value of 1.

To illustrate this problem I wrote a quick test program that mmaps a
large number of 1MB files filled with random data.  These maps are
created locked and read only.  Then every other mmap is unmapped and I
attempt to allocate huge pages to the static huge page pool.  When the
compact_unevictable_allowed sysctl is 0, I cannot allocate hugepages
after fragmenting memory.  When the value is set to 1, allocations
succeed.

Signed-off-by: Eric B Munson <emunson@akamai.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-15 16:35:17 -07:00
..
bpf Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2015-04-15 09:00:47 -07:00
configs
debug debug: prevent entering debug mode on panic/exception. 2015-02-19 12:39:03 -06:00
events Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2015-04-15 09:00:47 -07:00
gcov kbuild,gcov: simplify kernel/gcov/Makefile more 2015-01-09 17:25:44 +01:00
irq irqchip core change for v4.1 (round 3) 2015-04-11 11:17:28 +02:00
livepatch Merge branch 'for-4.1/core-noarch' into for-linus 2015-04-13 23:57:20 +02:00
locking Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-04-13 10:27:28 -07:00
power Merge back earlier suspend/hibernate material for v4.1. 2015-04-10 12:01:59 +02:00
printk Merge branch 'iocb' into for-next 2015-04-11 22:24:41 -04:00
rcu Merge branches 'doc.2015.02.26a', 'earlycb.2015.03.03a', 'fixes.2015.03.03a', 'gpexp.2015.02.26a', 'hotplug.2015.03.20a', 'sysidle.2015.02.26b' and 'tiny.2015.02.26a' into HEAD 2015-03-20 08:31:01 -07:00
sched Merge branch 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-04-14 13:58:48 -07:00
time timers/PM: Drop unnecessary braces from tick_freeze() 2015-04-03 15:15:52 +02:00
trace Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-04-14 14:37:47 -07:00
.gitignore
Kconfig.freezer
Kconfig.hz
Kconfig.locks locking/mcs: Better differentiate between MCS variants 2015-01-14 15:07:32 +01:00
Kconfig.preempt
Makefile Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2015-02-11 20:25:11 -08:00
acct.c new fs_pin killing logics 2015-01-25 23:17:28 -05:00
async.c
audit.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-12-30 10:45:47 -08:00
audit.h audit: replace getname()/putname() hacks with reference counters 2015-01-23 00:23:58 -05:00
audit_tree.c
audit_watch.c
auditfilter.c Merge branch 'upstream' of git://git.infradead.org/users/pcmoore/audit 2015-02-11 20:07:47 -08:00
auditsc.c Merge branch 'getname2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-02-17 15:27:47 -08:00
backtracetest.c
bounds.c
capability.c
cgroup.c memcg: zap mem_cgroup_lookup() 2015-04-15 16:35:16 -07:00
cgroup_freezer.c
compat.c all arches, signal: move restart_block to struct task_struct 2015-02-12 18:54:12 -08:00
configs.c
context_tracking.c context_tracking: Export context_tracking_user_enter/exit 2015-03-09 15:43:00 +01:00
cpu.c Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-04-14 13:36:04 -07:00
cpu_pm.c
cpuset.c kernel, cpuset: remove exception for __GFP_THISNODE 2015-04-14 16:49:03 -07:00
crash_dump.c
cred.c
delayacct.c
dma.c
elfcore.c
exec_domain.c
exit.c oom, PM: make OOM detection in the freezer path raceless 2015-02-11 17:06:03 -08:00
extable.c
fork.c mm: do not use mm->nr_pmds on !MMU configurations 2015-02-12 18:54:10 -08:00
freezer.c
futex.c Linux 34.0-rc1 2015-02-24 08:41:07 +01:00
futex_compat.c
groups.c
hung_task.c
irq_work.c
jump_label.c
kallsyms.c
kcmp.c
kexec.c kexec: simplify conditional 2015-02-17 14:34:51 -08:00
kmod.c
kprobes.c kprobes: makes kprobes/enabled works correctly for optimized kprobes. 2015-02-13 21:21:42 -08:00
ksysfs.c
kthread.c
latencytop.c
module-internal.h
module.c Some clean ups and small fixes, but the biggest change is the addition 2015-04-14 10:49:03 -07:00
module_signing.c
notifier.c rcu: Make SRCU optional by using CONFIG_SRCU 2015-01-06 11:04:29 -08:00
nsproxy.c
padata.c padata: use %*pb[l] to print bitmaps including cpumasks and nodemasks 2015-02-13 21:21:38 -08:00
panic.c livepatch: kernel: add TAINT_LIVEPATCH 2014-12-22 15:40:48 +01:00
params.c param: fix uninitialized read with CONFIG_DEBUG_LOCK_ALLOC 2015-01-20 11:38:31 +10:30
pid.c
pid_namespace.c
profile.c profile: use %*pb[l] to print bitmaps including cpumasks and nodemasks 2015-02-13 21:21:38 -08:00
ptrace.c ptrace: remove linux/compat.h inclusion under CONFIG_COMPAT 2015-02-17 14:34:51 -08:00
range.c kernel: avoid overflow in cmp_range 2015-01-17 10:02:23 +13:00
reboot.c
relay.c
resource.c resources: Move struct resource_list_entry from ACPI into resource core 2015-02-05 15:09:25 +01:00
seccomp.c seccomp: cap SECCOMP_RET_ERRNO data to MAX_ERRNO 2015-02-17 14:34:55 -08:00
signal.c signal: use current->state helpers 2015-02-17 14:34:51 -08:00
smp.c
smpboot.c smpboot: Add common code for notification from dying CPU 2015-03-11 13:20:25 -07:00
smpboot.h
softirq.c Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-02-09 15:24:03 -08:00
stacktrace.c
stop_machine.c
sys.c kernel/sys.c: fix UNAME26 for 4.0 2015-02-28 09:57:51 -08:00
sys_ni.c
sysctl.c mm: allow compaction of unevictable pages 2015-04-15 16:35:17 -07:00
sysctl_binary.c
system_certificates.S
system_keyring.c
task_work.c
taskstats.c netlink: make nlmsg_end() and genlmsg_end() void 2015-01-18 01:03:45 -05:00
test_kprobes.c
torture.c
tracepoint.c
tsacct.c
uid16.c
up.c
user-return-notifier.c
user.c
user_namespace.c
utsname.c
utsname_sysctl.c
watchdog.c Merge branch 'akpm' (patches from Andrew) 2015-04-14 16:49:17 -07:00
workqueue.c workqueue: Reorder sysfs code 2015-04-06 11:16:04 -04:00
workqueue_internal.h