linux/kernel
Rik van Riel 7e2703e609 sched/numa: Normalize faults_cpu stats and weigh by CPU use
Tracing the code that decides the active nodes has made it abundantly clear
that the naive implementation of the faults_from code has issues.

Specifically, the garbage collector in some workloads will access orders
of magnitudes more memory than the threads that do all the active work.
This resulted in the node with the garbage collector being marked the only
active node in the group.

This issue is avoided if we weigh the statistics by CPU use of each task in
the numa group, instead of by how many faults each thread has occurred.

To achieve this, we normalize the number of faults to the fraction of faults
that occurred on each node, and then multiply that fraction by the fraction
of CPU time the task has used since the last time task_numa_placement was
invoked.

This way the nodes in the active node mask will be the ones where the tasks
from the numa group are most actively running, and the influence of eg. the
garbage collector and other do-little threads is properly minimized.

On a 4 node system, using CPU use statistics calculated over a longer interval
results in about 1% fewer page migrations with two 32-warehouse specjbb runs
on a 4 node system, and about 5% fewer page migrations, as well as 1% better
throughput, with two 8-warehouse specjbb runs, as compared with the shorter
term statistics kept by the scheduler.

Signed-off-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Chegu Vinod <chegu_vinod@hp.com>
Link: http://lkml.kernel.org/r/1390860228-21539-7-git-send-email-riel@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-28 15:03:10 +01:00
..
cpu sched/preempt: Fix up missed PREEMPT_NEED_RESCHED folding 2014-01-13 17:38:55 +01:00
debug kdb: Add support for external NMI handler to call KGDB/KDB 2013-10-03 18:47:54 +02:00
events Merge branch 'perf/urgent' into perf/core 2014-01-16 09:33:30 +01:00
gcov gcov: reuse kbasename helper 2013-11-13 12:09:34 +09:00
irq Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-12-02 10:15:39 -08:00
locking Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-01-20 10:42:08 -08:00
power PM / sleep: Fix memory leak in pm_vt_switch_unregister(). 2013-12-22 00:56:35 +01:00
printk printk.c: comments should refer to /proc/vmcore instead of /proc/vmcoreinfo 2013-11-13 12:09:14 +09:00
rcu Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-01-20 10:25:12 -08:00
sched sched/numa: Normalize faults_cpu stats and weigh by CPU use 2014-01-28 15:03:10 +01:00
time Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-01-20 11:34:26 -08:00
trace sched/clock, x86: Use a static_key for sched_clock_stable 2014-01-13 15:13:13 +01:00
.gitignore Ignore generated file kernel/x509_certificate_list 2013-12-10 18:21:34 +00:00
Kconfig.freezer
Kconfig.hz kernel: remove CONFIG_USE_GENERIC_SMP_HELPERS 2013-11-15 09:32:22 +09:00
Kconfig.locks
Kconfig.preempt
Makefile KEYS: Remove files generated when SYSTEM_TRUSTED_KEYRING=y 2013-12-13 15:59:11 +00:00
acct.c
async.c
audit.c Merge git://git.infradead.org/users/eparis/audit 2013-11-21 19:18:14 -08:00
audit.h audit: call audit_bprm() only once to add AUDIT_EXECVE information 2013-11-05 11:15:03 -05:00
audit_tree.c
audit_watch.c
auditfilter.c audit: do not reject all AUDIT_INODE filter types 2013-11-05 11:09:16 -05:00
auditsc.c audit: fix type of sessionid in audit_set_loginuid() 2013-11-06 11:47:24 -05:00
backtracetest.c
bounds.c mm: do not allocate page->ptl dynamically, if spinlock_t fits to long 2013-12-20 12:25:45 -08:00
capability.c xfs: update for v3.12-rc1 2013-09-09 11:19:09 -07:00
cgroup.c cgroup: don't recycle cgroup id until all csses' have been destroyed 2013-12-17 08:11:52 -05:00
cgroup_freezer.c cgroup: make css_for_each_descendant() and friends include the origin css in the iteration 2013-08-08 20:11:27 -04:00
compat.c
configs.c
context_tracking.c context_tracking: Wrap static key check into more intuitive function name 2013-12-02 20:43:14 +01:00
cpu.c Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-11-14 16:55:11 +09:00
cpu_pm.c
cpuset.c cpuset: Fix memory allocator deadlock 2013-11-27 13:52:47 -05:00
crash_dump.c
cred.c
delayacct.c kernel/delayacct.c: remove redundant checking in __delayacct_add_tsk() 2013-11-13 12:09:12 +09:00
dma.c
elfcore.c switch elf_core_write_extra_phdrs() to dump_emit() 2013-11-09 00:16:23 -05:00
exec_domain.c
exit.c ptrace: revert "Prepare to fix racy accesses on task breakpoints" 2013-07-09 10:33:26 -07:00
extable.c kernel/extable: fix address-checks for core_kernel and init areas 2013-11-28 09:49:41 -08:00
fork.c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-01-20 10:42:08 -08:00
freezer.c libata, freezer: avoid block device removal while system is frozen 2013-12-19 13:50:32 -05:00
futex.c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-01-20 10:42:08 -08:00
futex_compat.c
groups.c userns: Kill nsown_capable it makes the wrong thing easy 2013-08-30 23:44:11 -07:00
hrtimer.c sched/deadline: Add SCHED_DEADLINE structures & implementation 2014-01-13 13:41:06 +01:00
hung_task.c Here are the 3.13 KVM changes. There was a lot of work on the PPC 2013-11-15 13:51:36 +09:00
irq_work.c
itimer.c
jump_label.c static_key: WARN on usage before jump_label_init was called 2013-10-19 19:45:35 -04:00
kallsyms.c
kcmp.c
kexec.c kexec: migrate to reboot cpu 2013-12-18 19:04:50 -08:00
kmod.c kernel/kmod.c: check for NULL in call_usermodehelper_exec() 2013-09-30 14:31:02 -07:00
kprobes.c kprobes: use KSYM_NAME_LEN to size identifier buffers 2013-11-13 12:09:26 +09:00
ksysfs.c kernel: replace strict_strto*() with kstrto*() 2013-09-12 15:38:03 -07:00
kthread.c kthread: make kthread_create() killable 2013-11-13 12:08:59 +09:00
latencytop.c
module-internal.h KEYS: Separate the kernel signature checking keyring from module signing 2013-09-25 17:17:01 +01:00
module.c Mainly boring here, too. rmmod --wait finally removed, though. 2013-11-15 13:27:50 +09:00
module_signing.c keys: change asymmetric keys to use common hash definitions 2013-10-25 17:15:18 -04:00
notifier.c
nsproxy.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2013-09-07 14:35:32 -07:00
padata.c padata: make the sequence counter an atomic_t 2013-10-30 12:02:58 +08:00
panic.c panic: Make panic_timeout configurable 2013-11-26 12:12:26 +01:00
params.c kernel/params: fix handling of signed integer types 2013-09-28 12:35:52 -07:00
pid.c pidns: fix free_pid() to handle the first fork failure 2013-09-30 14:31:03 -07:00
pid_namespace.c pid_namespace: make freeing struct pid_namespace rcu-delayed 2013-10-24 23:43:29 -04:00
posix-cpu-timers.c posix-timers: Convert abuses of BUG_ON to WARN_ON 2013-12-09 16:56:29 +01:00
posix-timers.c
profile.c kernel: delete __cpuinit usage from all core kernel files 2013-07-14 19:36:59 -04:00
ptrace.c exec/ptrace: fix get_dumpable() incorrect tests 2013-11-13 12:09:33 +09:00
range.c
reboot.c kexec: migrate to reboot cpu 2013-12-18 19:04:50 -08:00
relay.c kernel: delete __cpuinit usage from all core kernel files 2013-07-14 19:36:59 -04:00
res_counter.c memcg: reduce function dereference 2013-09-12 15:38:02 -07:00
resource.c
seccomp.c
signal.c constify copy_siginfo_to_user{,32}() 2013-11-09 00:16:29 -05:00
smp.c kernel: fix generic_exec_single indentation 2013-11-15 09:32:22 +09:00
smpboot.c kernel: delete __cpuinit usage from all core kernel files 2013-07-14 19:36:59 -04:00
smpboot.h
softirq.c Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-01-20 11:34:26 -08:00
stacktrace.c
stop_machine.c stop_machine: Fix race between stop_two_cpus() and stop_cpus() 2013-11-11 12:43:38 +01:00
sys.c kernel/sys.c: remove obsolete #include <linux/kexec.h> 2013-11-13 12:09:13 +09:00
sys_ni.c
sysctl.c sched/numa, mm: Remove p->numa_migrate_deferred 2014-01-28 13:17:04 +01:00
sysctl_binary.c kernel/sysctl_binary.c: use scnprintf() instead of snprintf() 2013-11-13 12:09:33 +09:00
system_certificates.S KEYS: correct alignment of system_certificate_list content in assembly file 2013-12-10 18:25:28 +00:00
system_keyring.c KEYS: correct alignment of system_certificate_list content in assembly file 2013-12-10 18:25:28 +00:00
task_work.c task_work: documentation 2013-09-11 15:58:27 -07:00
taskstats.c genetlink: only pass array to genl_register_family_with_ops() 2013-11-19 16:39:05 -05:00
test_kprobes.c
time.c
timeconst.bc
timer.c timer: Convert kmalloc_node(...GFP_ZERO...) to kzalloc_node(...) 2013-11-19 14:59:50 +01:00
tracepoint.c
tsacct.c
uid16.c userns: Kill nsown_capable it makes the wrong thing easy 2013-08-30 23:44:11 -07:00
up.c kernel: provide a __smp_call_function_single stub for !CONFIG_SMP 2013-11-15 09:32:22 +09:00
user-return-notifier.c
user.c KEYS: fix uninitialized persistent_keyring_register_sem 2013-12-13 15:59:11 +00:00
user_namespace.c KEYS: Add per-user_namespace registers for persistent per-UID kerberos caches 2013-09-24 10:35:19 +01:00
utsname.c userns: Kill nsown_capable it makes the wrong thing easy 2013-08-30 23:44:11 -07:00
utsname_sysctl.c
watchdog.c watchdog: update watchdog_thresh properly 2013-09-24 17:00:25 -07:00
workqueue.c PCI updates for v3.13: 2013-12-15 11:45:27 -08:00
workqueue_internal.h