linux/kernel
Vladimir Davydov 4949148ad4 mm: charge/uncharge kmemcg from generic page allocator paths
Currently, to charge a non-slab allocation to kmemcg one has to use
alloc_kmem_pages helper with __GFP_ACCOUNT flag.  A page allocated with
this helper should finally be freed using free_kmem_pages, otherwise it
won't be uncharged.

This API suits its current users fine, but it turns out to be impossible
to use along with page reference counting, i.e.  when an allocation is
supposed to be freed with put_page, as it is the case with pipe or unix
socket buffers.

To overcome this limitation, this patch moves charging/uncharging to
generic page allocator paths, i.e.  to __alloc_pages_nodemask and
free_pages_prepare, and zaps alloc/free_kmem_pages helpers.  This way,
one can use any of the available page allocation functions to get the
allocated page charged to kmemcg - it's enough to pass __GFP_ACCOUNT,
just like in case of kmalloc and friends.  A charged page will be
automatically uncharged on free.

To make it possible, we need to mark pages charged to kmemcg somehow.
To avoid introducing a new page flag, we make use of page->_mapcount for
marking such pages.  Since pages charged to kmemcg are not supposed to
be mapped to userspace, it should work just fine.  There are other
(ab)users of page->_mapcount - buddy and balloon pages - but we don't
conflict with them.

In case kmemcg is compiled out or not used at runtime, this patch
introduces no overhead to generic page allocator paths.  If kmemcg is
used, it will be plus one gfp flags check on alloc and plus one
page->_mapcount check on free, which shouldn't hurt performance, because
the data accessed are hot.

Link: http://lkml.kernel.org/r/a9736d856f895bcb465d9f257b54efe32eda6f99.1464079538.git.vdavydov@virtuozzo.com
Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-26 16:19:19 -07:00
..
bpf Merge branch 'perf/urgent' into perf/core, to pick up fixes before merging new changes 2016-07-07 08:58:23 +02:00
configs
debug mm/init: Add 'rodata=off' boot cmdline parameter to disable read-only kernel mappings 2016-02-22 08:51:37 +01:00
events Merge branch 'perf/urgent' into perf/core, to pick up fixes before merging new changes 2016-07-07 08:58:23 +02:00
gcov gcov: add support for gcc version >= 6 2016-07-15 14:54:27 +09:00
irq genirq: Fix missing irq allocation affinity hint 2016-07-19 10:49:47 +02:00
livepatch Merge branches 'for-4.7/core', 'for-4.7/livepatching-doc' and 'for-4.7/livepatching-ppc64' into for-linus 2016-05-17 12:06:35 +02:00
locking Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-07-25 12:41:29 -07:00
power Merge branch 'x86/mm' into x86/boot, to pick up dependencies 2016-07-08 17:27:47 +02:00
printk printk/nmi: flush NMI messages on the system panic 2016-05-20 17:58:30 -07:00
rcu Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-07-25 12:41:29 -07:00
sched Merge branch 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-07-25 14:43:00 -07:00
time Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-07-25 20:43:12 -07:00
trace Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-06-29 11:50:42 -07:00
.gitignore
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
Makefile ELF/MIPS build fix 2016-05-23 17:04:14 -07:00
acct.c
async.c
audit.c Merge branch 'stable-4.7' of git://git.infradead.org/users/pcmoore/audit 2016-06-29 15:18:47 -07:00
audit.h audit: move audit_get_tty to reduce scope and kabi changes 2016-06-28 15:48:48 -04:00
audit_fsnotify.c
audit_tree.c audit: cleanup prune_tree_thread 2016-04-04 09:46:47 -04:00
audit_watch.c don't bother with ->d_inode->i_sb - it's always equal to ->d_sb 2016-04-10 17:11:51 -04:00
auditfilter.c
auditsc.c Merge branch 'stable-4.7' of git://git.infradead.org/users/pcmoore/audit 2016-06-29 15:18:47 -07:00
backtracetest.c
bounds.c
capability.c
cgroup.c cgroup: Disable IRQs while holding css_set_lock 2016-06-23 17:23:12 -04:00
cgroup_freezer.c
cgroup_pids.c
compat.c
configs.c
context_tracking.c
cpu.c cpu/hotplug: Keep enough storage space if SMP=n to avoid array out of bounds scribble 2016-07-13 09:29:39 +02:00
cpu_pm.c
cpuset.c cpuset: use static key better and convert to new API 2016-05-19 19:12:14 -07:00
crash_dump.c
cred.c
delayacct.c
dma.c
elfcore.c
exec_domain.c
exit.c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-07-25 13:59:34 -07:00
extable.c
fork.c mm: charge/uncharge kmemcg from generic page allocator paths 2016-07-26 16:19:19 -07:00
freezer.c
futex.c futex: Calculate the futex key based on a tail page for file-based futexes 2016-06-08 19:23:54 +02:00
futex_compat.c
groups.c
hung_task.c kernel/hung_task.c: use timeout diff when timeout is updated 2016-03-22 15:36:02 -07:00
irq_work.c
jump_label.c Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-07-25 12:41:29 -07:00
kallsyms.c kallsyms: add support for relative offsets in kallsyms address table 2016-03-15 16:55:16 -07:00
kcmp.c
kcov.c kernel/kcov: unproxify debugfs file's fops 2016-06-15 04:56:35 -07:00
kexec.c s390/kexec: consolidate crash_map/unmap_reserved_pages() and arch_kexec_protect(unprotect)_crashkres() 2016-05-23 17:04:14 -07:00
kexec_core.c s390/kexec: consolidate crash_map/unmap_reserved_pages() and arch_kexec_protect(unprotect)_crashkres() 2016-05-23 17:04:14 -07:00
kexec_file.c kexec: introduce a protection mechanism for the crashkernel reserved memory 2016-05-23 17:04:14 -07:00
kexec_internal.h
kmod.c
kprobes.c
ksysfs.c
kthread.c
latencytop.c
membarrier.c
memremap.c memremap: add arch specific hook for MEMREMAP_WB mappings 2016-04-04 10:26:41 +02:00
module-internal.h
module.c module: preserve Elf information for livepatch modules 2016-04-01 15:00:10 +02:00
module_signing.c KEYS: Move the point of trust determination to __key_link() 2016-04-11 22:43:43 +01:00
notifier.c
nsproxy.c
padata.c kernel/padata.c: hide unused functions 2016-05-19 19:12:14 -07:00
panic.c printk/nmi: flush NMI messages on the system panic 2016-05-20 17:58:30 -07:00
params.c
pid.c remove lots of IS_ERR_VALUE abuses 2016-05-27 15:26:11 -07:00
pid_namespace.c
profile.c profile: hide unused functions when !CONFIG_PROC_FS 2016-03-22 15:36:02 -07:00
ptrace.c ptrace: change __ptrace_unlink() to clear ->ptrace under ->siglock 2016-03-22 15:36:02 -07:00
range.c
reboot.c
relay.c kernel/relay.c: fix potential memory leak 2016-06-09 14:23:11 -07:00
resource.c /proc/iomem: only expose physical resource addresses to privileged users 2016-04-14 12:56:09 -07:00
seccomp.c Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus 2016-05-19 10:02:26 -07:00
signal.c signals: Use hrtimer for sigtimedwait() 2016-07-07 10:35:07 +02:00
smp.c locking/barriers: Replace smp_cond_acquire() with smp_cond_load_acquire() 2016-06-14 11:54:27 +02:00
smpboot.c cpu/hotplug: Unpark smpboot threads from the state machine 2016-03-01 20:36:56 +01:00
smpboot.h cpu/hotplug: Create hotplug threads 2016-03-01 20:36:56 +01:00
softirq.c arch, ftrace: for KASAN put hard/soft IRQ entries into separate sections 2016-03-25 16:37:42 -07:00
stacktrace.c
stop_machine.c
sys.c prctl: make PR_SET_THP_DISABLE wait for mmap_sem killable 2016-05-23 17:04:14 -07:00
sys_ni.c
sysctl.c rcu: sysctl: Panic on RCU Stall 2016-06-15 16:00:05 -07:00
sysctl_binary.c kernel/sysctl_binary.c: use generic UUID library 2016-05-20 17:58:30 -07:00
task_work.c locking/spinlock: Update spin_unlock_wait() users 2016-06-14 11:55:15 +02:00
taskstats.c taskstats: use the libnl API to align nlattr on 64-bit 2016-04-23 20:13:25 -04:00
test_kprobes.c
torture.c torture: Stop onoff task if there is only one cpu 2016-06-14 16:03:28 -07:00
tracepoint.c kernel/...: convert pr_warning to pr_warn 2016-03-22 15:36:02 -07:00
tsacct.c time, acct: Drop irq save & restore from __acct_update_integrals() 2016-02-29 09:53:09 +01:00
uid16.c
up.c
user-return-notifier.c
user.c
user_namespace.c
utsname.c
utsname_sysctl.c
watchdog.c Revert "perf/x86/intel, watchdog: Switch NMI watchdog to ref cycles on x86" 2016-07-10 20:58:36 +02:00
workqueue.c workqueue: Fix setting affinity of unbound worker threads 2016-06-16 15:37:05 -04:00
workqueue_internal.h sched/core: Get rid of 'cpu' argument in wq_worker_sleeping() 2016-03-02 10:28:47 -05:00