linux/kernel
Peter Zijlstra 6e5fb223e8 mm: sched: numa: Implement constant, per task Working Set Sampling (WSS) rate
Previously, to probe the working set of a task, we'd use
a very simple and crude method: mark all of its address
space PROT_NONE.

That method has various (obvious) disadvantages:

 - it samples the working set at dissimilar rates,
   giving some tasks a sampling quality advantage
   over others.

 - creates performance problems for tasks with very
   large working sets

 - over-samples processes with large address spaces but
   which only very rarely execute

Improve that method by keeping a rotating offset into the
address space that marks the current position of the scan,
and advance it by a constant rate (in a CPU cycles execution
proportional manner). If the offset reaches the last mapped
address of the mm then it then it starts over at the first
address.

The per-task nature of the working set sampling functionality in this tree
allows such constant rate, per task, execution-weight proportional sampling
of the working set, with an adaptive sampling interval/frequency that
goes from once per 100ms up to just once per 8 seconds.  The current
sampling volume is 256 MB per interval.

As tasks mature and converge their working set, so does the
sampling rate slow down to just a trickle, 256 MB per 8
seconds of CPU time executed.

This, beyond being adaptive, also rate-limits rarely
executing systems and does not over-sample on overloaded
systems.

[ In AutoNUMA speak, this patch deals with the effective sampling
  rate of the 'hinting page fault'. AutoNUMA's scanning is
  currently rate-limited, but it is also fundamentally
  single-threaded, executing in the knuma_scand kernel thread,
  so the limit in AutoNUMA is global and does not scale up with
  the number of CPUs, nor does it scan tasks in an execution
  proportional manner.

  So the idea of rate-limiting the scanning was first implemented
  in the AutoNUMA tree via a global rate limit. This patch goes
  beyond that by implementing an execution rate proportional
  working set sampling rate that is not implemented via a single
  global scanning daemon. ]

[ Dan Carpenter pointed out a possible NULL pointer dereference in the
  first version of this patch. ]

Based-on-idea-by: Andrea Arcangeli <aarcange@redhat.com>
Bug-Found-By: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
[ Wrote changelog and fixed bug. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
2012-12-11 14:42:46 +00:00
..
debug KGDB/KDB fixes and cleanups 2012-10-13 11:16:58 +09:00
events Merge branch 'uprobes/core' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc into perf/urgent 2012-10-21 18:18:17 +02:00
gcov gcov: disable CONSTRUCTORS for UML 2011-07-26 16:49:45 -07:00
irq irqdomain: augment add_simple() to allocate descs 2012-10-10 08:57:26 +02:00
power Merge branch 'pm-qos' 2012-09-17 20:25:51 +02:00
sched mm: sched: numa: Implement constant, per task Working Set Sampling (WSS) rate 2012-12-11 14:42:46 +00:00
time Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-10-12 22:17:48 +09:00
trace Merge branch 'tip/perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/urgent 2012-10-21 19:53:34 +02:00
.gitignore
Kconfig.freezer
Kconfig.hz
Kconfig.locks locking: Adjust spin lock inlining Kconfig options 2012-09-13 17:56:13 +02:00
Kconfig.preempt locking/kconfig: Simplify INLINE_SPIN_UNLOCK usage 2012-03-23 13:18:57 +01:00
Makefile Makefile: Documentation for external tool should be correct 2012-10-25 16:00:53 -07:00
acct.c vfs: make path_openat take a struct filename pointer 2012-10-12 20:15:09 -04:00
async.c [SCSI] async: make async_synchronize_full() flush all work regardless of domain 2012-07-20 09:07:37 +01:00
audit.c fs: handle failed audit_log_start properly 2012-10-09 23:33:37 -04:00
audit.h audit: optimize audit_compare_dname_path 2012-10-12 00:32:02 -04:00
audit_tree.c audit: clean up refcounting in audit-tree 2012-08-15 12:55:22 +02:00
audit_watch.c audit: optimize audit_compare_dname_path 2012-10-12 00:32:02 -04:00
auditfilter.c audit: optimize audit_compare_dname_path 2012-10-12 00:32:02 -04:00
auditsc.c audit: make audit_inode take struct filename 2012-10-12 20:15:09 -04:00
backtracetest.c
bounds.c memcg: remove direct page_cgroup-to-page pointer 2011-03-23 19:46:28 -07:00
capability.c userns: Teach inode_capable to understand inodes whose uids map to other namespaces. 2012-05-15 14:59:24 -07:00
cgroup.c Revert "cgroup: Remove task_lock() from cgroup_post_fork()" 2012-10-19 14:09:35 -07:00
cgroup_freezer.c cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them 2012-09-14 12:01:16 -07:00
compat.c new helper: sigsuspend() 2012-05-21 23:52:30 -04:00
configs.c kernel/configs.c: include MODULE_*() when CONFIG_IKCONFIG_PROC=n 2011-07-25 20:57:15 -07:00
cpu.c CPU hotplug, debug: detect imbalance between get_online_cpus() and put_online_cpus() 2012-10-09 16:22:15 +09:00
cpu_pm.c kernel/cpu_pm.c: fix various typos 2012-05-31 17:49:27 -07:00
cpuset.c cpusets: Remove/update outdated comments 2012-07-24 13:53:28 +02:00
crash_dump.c Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux 2011-11-06 19:44:47 -08:00
cred.c userns: Make credential debugging user namespace safe. 2012-08-23 22:54:18 -07:00
delayacct.c KVM: Steal time implementation 2011-07-14 12:59:14 +03:00
dma.c Remove all #inclusions of asm/system.h 2012-03-28 18:30:03 +01:00
elfcore.c
exec_domain.c
exit.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-10-02 20:25:04 -07:00
extable.c extable: Skip sorting if sorted at build time. 2012-04-19 15:06:55 -07:00
fork.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal 2012-10-10 12:02:25 +09:00
freezer.c PM / Freezer: Remove references to TIF_FREEZE in comments 2012-03-04 23:08:54 +01:00
futex.c futex: Handle futex_pi OWNER_DIED take over correctly 2012-11-01 12:06:54 +01:00
futex_compat.c futex: Mark get_robust_list as deprecated 2012-03-29 11:37:17 +02:00
groups.c userns: Convert in_group_p and in_egroup_p to use kgid_t 2012-05-03 03:29:33 -07:00
hrtimer.c hrtimer: Update hrtimer base offsets each hrtimer_interrupt 2012-07-11 23:34:39 +02:00
hung_task.c hung task debugging: Inject NMI when hung and going to panic 2012-04-25 12:39:25 +02:00
irq_work.c irq_work: fix compile failure on tile from missing include 2012-04-13 13:15:16 -04:00
itimer.c itimer: Use printk_once instead of WARN_ONCE 2012-04-10 11:00:30 +02:00
jump_label.c jump_label: Export jump_label_rate_limit() 2012-08-06 19:00:35 +03:00
kallsyms.c vsprintf: fix %ps on non symbols when using kallsyms 2012-05-29 16:22:32 -07:00
kcmp.c syscalls, x86: add __NR_kcmp syscall 2012-05-31 17:49:32 -07:00
kexec.c kdump: remove unneeded include 2012-10-06 03:05:19 +09:00
kfifo.c [media] kernel:kfifo: export __kfifo_max_r symbol 2012-04-11 18:24:37 -03:00
kmod.c infrastructure for saner ret_from_kernel_thread semantics 2012-10-12 13:35:07 -04:00
kprobes.c kprobes/x86: Fix to support jprobes on ftrace-based kprobe 2012-09-13 22:52:11 -04:00
ksysfs.c kernel: ksysfs.c is implicitly using stat.h 2011-10-31 09:20:13 -04:00
kthread.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal 2012-10-13 10:05:52 +09:00
latencytop.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
lglock.c brlocks/lglocks: turn into functions 2012-05-29 23:28:41 -04:00
lockdep.c lockdep: Check if nested lock is actually held 2012-09-13 17:00:44 +02:00
lockdep_internals.h
lockdep_proc.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
lockdep_states.h
modsign_pubkey.c MODSIGN: Provide module signing public keys to the kernel 2012-10-10 20:01:22 +10:30
module-internal.h MODSIGN: Move the magic string to the end of a module and eliminate the search 2012-10-19 17:30:40 -07:00
module.c module: fix out-by-one error in kallsyms 2012-10-31 13:56:37 +10:30
module_signing.c module_signing: fix printk format warning 2012-10-22 08:56:34 +03:00
mutex-debug.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
mutex-debug.h mutex: Use p->on_cpu for the adaptive spin 2011-04-14 08:52:33 +02:00
mutex.c sched/rt: Use schedule_preempt_disabled() 2012-03-01 10:28:03 +01:00
mutex.h mutex: Use p->on_cpu for the adaptive spin 2011-04-14 08:52:33 +02:00
notifier.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
nsproxy.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
padata.c padata: Fix cpu hotplug 2012-03-29 19:52:46 +08:00
panic.c panic: fix a possible deadlock in panic() 2012-07-30 17:25:13 -07:00
params.c params: replace printk(KERN_<LVL>...) with pr_<lvl>(...) 2012-05-04 17:28:18 -07:00
pid.c net ip6 flowlabel: Make owner a union of struct pid * and kuid_t 2012-08-14 21:49:25 -07:00
pid_namespace.c pidns: limit the nesting depth of pid namespaces 2012-10-25 14:37:53 -07:00
posix-cpu-timers.c [S390] cputime: add sparse checking and cleanup 2011-12-15 14:56:19 +01:00
posix-timers.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
printk.c printk: Fix scheduling-while-atomic problem in console_cpu_notify() 2012-10-16 18:17:44 -07:00
profile.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
ptrace.c ptrace: mark __ptrace_may_access() static 2012-08-03 14:47:17 +10:00
range.c range: fix bogus misuse of module.h to get printk() 2011-10-31 09:20:11 -04:00
rcu.h rcu: Allow nesting of rcu_idle_enter() and rcu_idle_exit() 2012-02-21 09:06:12 -08:00
rcupdate.c rcu: Add PROVE_RCU_DELAY to provoke difficult races 2012-09-23 07:42:49 -07:00
rcutiny.c rcu: Move TINY_RCU quiescent state out of extended quiescent state 2012-09-23 07:42:52 -07:00
rcutiny_plugin.h rcu: Move TINY_PREEMPT_RCU away from raw_local_irq_save() 2012-09-23 07:42:51 -07:00
rcutorture.c rcu: Prevent initialization race in rcutorture kthreads 2012-09-23 07:42:23 -07:00
rcutree.c rcu: Grace-period initialization excludes only RCU notifier 2012-10-08 09:06:38 -07:00
rcutree.h rcu: Grace-period initialization excludes only RCU notifier 2012-10-08 09:06:38 -07:00
rcutree_plugin.h rcu: Make RCU_FAST_NO_HZ handle adaptive ticks 2012-09-26 15:44:02 +02:00
rcutree_trace.c Merge remote-tracking branch 'tip/smp/hotplug' into next.2012.09.25b 2012-09-25 10:01:45 -07:00
relay.c splice: fix racy pipe->buffers uses 2012-06-13 21:16:42 +02:00
res_counter.c rescounters: add res_counter_uncharge_until() 2012-05-29 16:22:27 -07:00
resource.c kernel/resource.c: fix stack overflow in __reserve_region_with_split() 2012-10-06 03:05:31 +09:00
rtmutex-debug.c lockdep, rtmutex, bug: Show taint flags on error 2011-12-06 08:16:49 +01:00
rtmutex-debug.h
rtmutex-tester.c rtmutex-tester: convert sysdev_class to a regular subsystem 2011-12-14 14:54:22 -08:00
rtmutex.c Revert "rcu: Permit rt_mutex_unlock() with irqs disabled" 2011-12-11 10:33:18 -08:00
rtmutex.h
rtmutex_common.h rtmutex: Simplify PI algorithm and make highest prio task get lock 2011-01-27 21:13:51 -05:00
rwsem.c Remove all #inclusions of asm/system.h 2012-03-28 18:30:03 +01:00
seccomp.c seccomp: fix build warnings when there is no CONFIG_SECCOMP_FILTER 2012-04-18 12:24:52 +10:00
semaphore.c semaphore: fix improper comment reference to mutex 2012-04-05 17:15:55 -07:00
signal.c coredump: pass siginfo_t* to do_coredump() and below, not merely signr 2012-10-06 03:05:16 +09:00
smp.c smp: Remove ipi_call_lock[_irq]()/ipi_call_unlock[_irq]() 2012-06-05 17:27:14 +02:00
smpboot.c hotplug: Fix UP bug in smpboot hotplug code 2012-08-13 17:01:07 +02:00
smpboot.h smpboot: Provide infrastructure for percpu hotplug threads 2012-08-13 17:01:07 +02:00
softirq.c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-10-01 10:43:39 -07:00
spinlock.c locking/kconfig: Simplify INLINE_SPIN_UNLOCK usage 2012-03-23 13:18:57 +01:00
srcu.c workqueue: deprecate system_nrt[_freezable]_wq 2012-08-20 14:51:24 -07:00
stacktrace.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
stop_machine.c Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux 2011-11-06 19:44:47 -08:00
sys.c use clamp_t in UNAME26 fix 2012-10-19 18:51:17 -07:00
sys_ni.c syscalls, x86: add __NR_kcmp syscall 2012-05-31 17:49:32 -07:00
sysctl.c mm: sched: numa: Implement constant, per task Working Set Sampling (WSS) rate 2012-12-11 14:42:46 +00:00
sysctl_binary.c mm: prepare for removal of obsolete /proc/sys/vm/nr_pdflush_threads 2012-07-31 18:42:40 -07:00
task_work.c task_work: task_work_add() should not succeed after exit_task_work() 2012-09-13 16:47:34 +02:00
taskstats.c taskstats: cgroupstats_user_cmd() may leak on error 2012-10-06 03:05:31 +09:00
test_kprobes.c
time.c time: Move update_vsyscall definitions to timekeeper_internal.h 2012-09-24 12:38:06 -04:00
timeconst.pl
timer.c timers: Fix endless looping between cascade() and internal_add_timer() 2012-10-09 21:27:14 +02:00
tracepoint.c static keys: Introduce 'struct static_key', static_key_true()/false() and static_key_slow_[inc|dec]() 2012-02-24 10:05:59 +01:00
tsacct.c userns: Convert taskstats to handle the user and pid namespaces. 2012-09-18 01:01:32 -07:00
uid16.c userns: Convert setting and getting uid and gid system calls to use kuid and kgid 2012-05-03 03:28:41 -07:00
up.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
user-return-notifier.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
user.c userns: Add kprojid_t and associated infrastructure in projid.h 2012-09-18 01:01:37 -07:00
user_namespace.c userns: Add kprojid_t and associated infrastructure in projid.h 2012-09-18 01:01:37 -07:00
utsname.c userns: Use cred->user_ns instead of cred->user->user_ns 2012-04-07 16:55:51 -07:00
utsname_sysctl.c Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux 2011-11-06 19:44:47 -08:00
wait.c lockdep/waitqueues: Add better annotation 2011-12-21 10:07:39 +01:00
watchdog.c watchdog: Use hotplug thread infrastructure 2012-08-13 17:01:07 +02:00
workqueue.c workqueue: cancel_delayed_work() should return %false if work item is idle 2012-10-24 12:38:16 -07:00
workqueue_sched.h