linux/kernel
Mel Gorman 05cbdf4f5c sched/numa: Limit the conditions where scan period is reset
migrate_task_rq_fair() resets the scan rate for NUMA balancing on every
cross-node migration. In the event of excessive load balancing due to
saturation, this may result in the scan rate being pegged at maximum and
further overloading the machine.

This patch only resets the scan if NUMA balancing is active, a preferred
node has been selected and the task is being migrated from the preferred
node as these are the most harmful. For example, a migration to the preferred
node does not justify a faster scan rate. Similarly, a migration between two
nodes that are not preferred is probably bouncing due to over-saturation of
the machine.  In that case, scanning faster and trapping more NUMA faults
will further overload the machine.

Specjbb2005 results (8 warehouses)
Higher bops are better

2 Socket - 2  Node Haswell - X86
JVMS  Prev    Current  %Change
4     203370  205332   0.964744
1     328431  319785   -2.63252

2 Socket - 4 Node Power8 - PowerNV
JVMS  Prev    Current  %Change
1     206070  206585   0.249915

2 Socket - 2  Node Power9 - PowerNV
JVMS  Prev    Current  %Change
4     188386  189162   0.41192
1     201566  213760   6.04963

4 Socket - 4  Node Power7 - PowerVM
JVMS  Prev     Current  %Change
8     59157.4  58736.8  -0.710985
1     105495   105419   -0.0720413

Some events stats before and after applying the patch.

perf stats 8th warehouse Multi JVM 2 Socket - 2  Node Haswell - X86
Event                     Before          After
cs                        13,825,492      14,285,708
migrations                1,152,509       1,180,621
faults                    371,948         339,114
cache-misses              55,654,206,041  55,205,631,894
sched:sched_move_numa     1,856           843
sched:sched_stick_numa    4               6
sched:sched_swap_numa     428             219
migrate:mm_migrate_pages  898             365

vmstat 8th warehouse Multi JVM 2 Socket - 2  Node Haswell - X86
Event                   Before  After
numa_hint_faults        57146   26907
numa_hint_faults_local  51612   24279
numa_hit                238164  239771
numa_huge_pte_updates   16      0
numa_interleave         63      68
numa_local              238085  239688
numa_other              79      83
numa_pages_migrated     883     363
numa_pte_updates        67540   27415

perf stats 8th warehouse Single JVM 2 Socket - 2  Node Haswell - X86
Event                     Before          After
cs                        3,288,525       3,202,779
migrations                38,652          37,186
faults                    111,678         106,076
cache-misses              12,111,197,376  12,024,873,744
sched:sched_move_numa     900             931
sched:sched_stick_numa    0               0
sched:sched_swap_numa     5               1
migrate:mm_migrate_pages  714             637

vmstat 8th warehouse Single JVM 2 Socket - 2  Node Haswell - X86
Event                   Before  After
numa_hint_faults        18572   17409
numa_hint_faults_local  14850   14367
numa_hit                73197   73953
numa_huge_pte_updates   11      20
numa_interleave         25      25
numa_local              73138   73892
numa_other              59      61
numa_pages_migrated     712     668
numa_pte_updates        24021   27276

perf stats 8th warehouse Multi JVM 2 Socket - 2  Node Power9 - PowerNV
Event                     Before       After
cs                        8,451,543    8,474,013
migrations                202,804      254,934
faults                    310,024      320,506
cache-misses              253,522,507  110,580,458
sched:sched_move_numa     213          725
sched:sched_stick_numa    0            0
sched:sched_swap_numa     2            7
migrate:mm_migrate_pages  88           145

vmstat 8th warehouse Multi JVM 2 Socket - 2  Node Power9 - PowerNV
Event                   Before  After
numa_hint_faults        11830   22797
numa_hint_faults_local  11301   21539
numa_hit                90038   89308
numa_huge_pte_updates   0       0
numa_interleave         855     865
numa_local              89796   88955
numa_other              242     353
numa_pages_migrated     88      149
numa_pte_updates        12039   22930

perf stats 8th warehouse Single JVM 2 Socket - 2  Node Power9 - PowerNV
Event                     Before     After
cs                        2,049,153  2,195,628
migrations                11,405     11,179
faults                    162,309    149,656
cache-misses              7,203,343  8,117,515
sched:sched_move_numa     22         49
sched:sched_stick_numa    0          0
sched:sched_swap_numa     0          0
migrate:mm_migrate_pages  1          5

vmstat 8th warehouse Single JVM 2 Socket - 2  Node Power9 - PowerNV
Event                   Before  After
numa_hint_faults        1693    3577
numa_hint_faults_local  1669    3476
numa_hit                25177   26142
numa_huge_pte_updates   0       0
numa_interleave         194     358
numa_local              24993   26042
numa_other              184     100
numa_pages_migrated     1       5
numa_pte_updates        1577    3587

perf stats 8th warehouse Multi JVM 4 Socket - 4  Node Power7 - PowerVM
Event                     Before           After
cs                        94,515,937       100,602,296
migrations                4,203,554        4,135,630
faults                    832,697          789,256
cache-misses              226,248,698,331  226,160,621,058
sched:sched_move_numa     1,730            1,366
sched:sched_stick_numa    14               16
sched:sched_swap_numa     432              374
migrate:mm_migrate_pages  1,398            1,350

vmstat 8th warehouse Multi JVM 4 Socket - 4  Node Power7 - PowerVM
Event                   Before  After
numa_hint_faults        80079   47857
numa_hint_faults_local  68620   39768
numa_hit                241187  240165
numa_huge_pte_updates   0       0
numa_interleave         0       0
numa_local              241186  240165
numa_other              1       0
numa_pages_migrated     1347    1224
numa_pte_updates        80729   48354

perf stats 8th warehouse Single JVM 4 Socket - 4  Node Power7 - PowerVM
Event                     Before          After
cs                        63,704,961      58,515,496
migrations                573,404         564,845
faults                    230,878         245,807
cache-misses              76,568,222,781  73,603,757,976
sched:sched_move_numa     509             996
sched:sched_stick_numa    31              10
sched:sched_swap_numa     182             193
migrate:mm_migrate_pages  541             646

vmstat 8th warehouse Single JVM 4 Socket - 4  Node Power7 - PowerVM
Event                   Before  After
numa_hint_faults        8501    13422
numa_hint_faults_local  2960    5619
numa_hit                35526   36118
numa_huge_pte_updates   0       0
numa_interleave         0       0
numa_local              35526   36116
numa_other              0       2
numa_pages_migrated     539     616
numa_pte_updates        8433    13374

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Jirka Hladky <jhladky@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1537552141-27815-5-git-send-email-srikar@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-02 09:42:24 +02:00
..
bpf bpf: sockmap, fix transition through disconnect without close 2018-09-22 02:46:41 +02:00
cgroup Merge branch 'for-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2018-08-24 13:19:27 -07:00
configs kconfig: tinyconfig: remove stale stack protector fixups 2018-06-15 07:15:28 +09:00
debug treewide: kzalloc() -> kcalloc() 2018-06-12 16:19:22 -07:00
dma dma-mapping: add the missing ARCH_HAS_SYNC_DMA_FOR_CPU_ALL declaration 2018-09-25 15:11:58 -07:00
events perf/core: Add sanity check to deal with pinned event failure 2018-09-28 22:44:53 +02:00
gcov gcov: remove CONFIG_GCOV_FORMAT_AUTODETECT 2018-06-08 18:56:02 +09:00
irq Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-08-13 10:47:26 -07:00
livepatch Merge branch 'for-4.19/upstream' into for-linus 2018-08-20 18:33:50 +02:00
locking locking/ww_mutex: Fix spelling mistake "cylic" -> "cyclic" 2018-09-10 14:00:01 +02:00
power PM / sleep: wakeup: Fix build error caused by missing SRCU support 2018-08-15 00:07:08 +02:00
printk Printk changes for 4.19-rc4 2018-09-13 19:37:08 -10:00
rcu Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-08-13 11:25:07 -07:00
sched sched/numa: Limit the conditions where scan period is reset 2018-10-02 09:42:24 +02:00
time clocksource: Revert "Remove kthread" 2018-09-06 23:38:35 +02:00
trace ring-buffer: Allow for rescheduling when removing pages 2018-09-17 18:15:11 -04:00
.gitignore
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt kconfig: include kernel/Kconfig.preempt from init/Kconfig 2018-08-02 08:06:54 +09:00
Makefile kbuild: move bin2c back to scripts/ from scripts/basic/ 2018-07-18 01:18:05 +09:00
acct.c kernel/acct.c: fix the acct->needcheck check in check_free_space() 2018-01-04 16:45:09 -08:00
async.c kernel/async.c: revert "async: simplify lowest_in_progress()" 2018-02-06 18:32:44 -08:00
audit.c audit: use ktime_get_coarse_real_ts64() for timestamps 2018-07-17 14:45:08 -04:00
audit.h audit: track the owner of the command mutex ourselves 2018-02-23 11:22:22 -05:00
audit_fsnotify.c fsnotify: add fsnotify_add_inode_mark() wrappers 2018-05-18 14:58:22 +02:00
audit_tree.c \n 2018-08-17 09:41:28 -07:00
audit_watch.c audit: fix use-after-free in audit_add_watch 2018-07-18 11:43:36 -04:00
auditfilter.c audit: rename FILTER_TYPE to FILTER_EXCLUDE 2018-06-19 10:39:54 -04:00
auditsc.c audit/stable-4.18 PR 20180814 2018-08-15 10:46:54 -07:00
backtracetest.c
bounds.c
capability.c
compat.c time: Enable get/put_compat_itimerspec64 always 2018-06-24 14:39:47 +02:00
configs.c
context_tracking.c
cpu.c cpu/hotplug: Prevent state corruption on error rollback 2018-09-06 15:21:38 +02:00
cpu_pm.c
crash_core.c kernel/crash_core.c: print timestamp using time64_t 2018-08-22 10:52:47 -07:00
crash_dump.c
cred.c
delayacct.c delayacct: Use raw_spinlocks 2018-04-27 14:34:51 +02:00
dma.c proc: introduce proc_create_single{,_data} 2018-05-16 07:23:35 +02:00
elfcore.c
exec_domain.c proc: introduce proc_create_single{,_data} 2018-05-16 07:23:35 +02:00
exit.c signal: Pass pid type into group_send_sig_info 2018-07-21 12:57:35 -05:00
extable.c extable: Make init_kernel_text() global 2018-02-21 16:54:06 +01:00
fail_function.c bpf/error-inject/kprobes: Clear current_kprobe and enable preempt in kprobe 2018-06-21 12:33:19 +02:00
fork.c mm: respect arch_dup_mmap() return value 2018-09-04 16:45:02 -07:00
freezer.c PM / reboot: Eliminate race between reboot and suspend 2018-08-06 12:35:20 +02:00
futex.c futex: Mark expected switch fall-throughs 2018-08-20 18:23:00 +02:00
futex_compat.c
groups.c kernel: make groups_sort calling a responsibility group_info allocators 2017-12-14 16:00:49 -08:00
hung_task.c kernel/hung_task.c: allow to set checking interval separately from timeout 2018-08-22 10:52:47 -07:00
iomem.c memremap: split devm_memremap_pages() and memremap() infrastructure 2018-05-15 23:08:33 -07:00
irq_work.c irq/work: Improve the flag definitions 2018-01-08 19:43:15 +01:00
jump_label.c jump_label: Fix typo in warning message 2018-09-10 10:15:48 +02:00
kallsyms.c kallsyms, x86: Export addresses of PTI entry trampolines 2018-08-14 19:12:29 -03:00
kcmp.c
kcov.c sched/core / kcov: avoid kcov_area during task switch 2018-06-15 07:55:24 +09:00
kexec.c kexec: add call to LSM hook in original kexec_load syscall 2018-07-16 12:31:57 -07:00
kexec_core.c kexec: yield to scheduler when loading kimage segments 2018-06-15 07:55:24 +09:00
kexec_file.c treewide: Use array_size() in vzalloc() 2018-06-12 16:19:22 -07:00
kexec_internal.h
kmod.c
kprobes.c kprobes: Replace %p with other pointer types 2018-06-21 17:33:42 +02:00
ksysfs.c
kthread.c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-08-13 11:25:07 -07:00
latencytop.c
memremap.c libnvdimm-for-4.19_dax-memory-failure 2018-08-25 18:43:59 -07:00
module-internal.h modsign: log module name in the event of an error 2018-07-02 11:36:17 +02:00
module.c module: use relative references for __ksymtab entries 2018-08-22 10:52:47 -07:00
module_signing.c modsign: log module name in the event of an error 2018-07-02 11:36:17 +02:00
notifier.c
nsproxy.c
padata.c padata: add SPDX identifier 2018-01-05 18:43:00 +11:00
panic.c Kbuild: rename CC_STACKPROTECTOR[_STRONG] config variables 2018-06-14 12:21:18 +09:00
params.c kernel/params.c: downgrade warning for unsafe parameters 2018-04-11 10:28:37 -07:00
pid.c fork: report pid exhaustion correctly 2018-09-20 22:01:11 +02:00
pid_namespace.c Merge branch 'userns-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2018-04-03 19:15:32 -07:00
profile.c
ptrace.c pids: introduce find_get_task_by_vpid() helper 2018-02-06 18:32:46 -08:00
range.c
reboot.c PM / reboot: Eliminate race between reboot and suspend 2018-08-06 12:35:20 +02:00
relay.c kernel/relay.c: change return type to vm_fault_t 2018-06-15 07:55:24 +09:00
resource.c libnvdimm for 4.18 2018-06-08 17:21:52 -07:00
rseq.c rseq: uapi: Declare rseq_cs field as union, update includes 2018-07-10 22:18:52 +02:00
seccomp.c audit/stable-4.18 PR 20180605 2018-06-06 16:34:00 -07:00
signal.c Merge branch 'akpm' (patches from Andrew) 2018-08-22 12:34:08 -07:00
smp.c cpu/hotplug: Fix SMT supported evaluation 2018-08-07 12:25:30 +02:00
smpboot.c smpboot: Remove cpumask from the API 2018-07-03 09:20:44 +02:00
smpboot.h
softirq.c nohz: Fix missing tick reprogram when interrupting an inline softirq 2018-08-03 15:52:10 +02:00
stacktrace.c
stop_machine.c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-08-13 11:25:07 -07:00
sys.c kernel/sys.c: remove duplicated include 2018-09-20 22:01:11 +02:00
sys_ni.c Merge branch 'core-rseq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-06-10 10:17:09 -07:00
sysctl.c namei: allow restricted O_CREAT of FIFOs and regular files 2018-08-23 18:48:43 -07:00
sysctl_binary.c staging: irda: remove remaining remants of irda code removal 2018-04-16 11:26:49 +02:00
task_work.c locking/barriers: Convert users of lockless_dereference() to READ_ONCE() 2017-12-17 13:57:15 +01:00
taskstats.c pids: introduce find_get_task_by_vpid() helper 2018-02-06 18:32:46 -08:00
test_kprobes.c kprobes: Remove jprobe API implementation 2018-06-21 12:33:05 +02:00
torture.c torture: Keep old-school dmesg format 2018-06-25 11:30:10 -07:00
tracepoint.c kernel: tracepoints: add support for relative references 2018-08-22 10:52:47 -07:00
tsacct.c
ucount.c headers: untangle kmemleak.h from mm.h 2018-04-05 21:36:27 -07:00
uid16.c fs: add do_fchownat(), ksys_fchown() helpers and ksys_{,l}chown() wrappers 2018-04-02 20:15:59 +02:00
uid16.h kernel: provide ksys_*() wrappers for syscalls called by kernel/uid16.c 2018-04-02 20:15:30 +02:00
umh.c umh: fix race condition 2018-06-07 16:56:28 -04:00
up.c
user-return-notifier.c
user.c userns: use irqsave variant of refcount_dec_and_lock() 2018-08-22 10:52:47 -07:00
user_namespace.c userns: move user access out of the mutex 2018-08-11 02:05:53 -05:00
utsname.c uts: create "struct uts_namespace" from kmem_cache 2018-04-11 10:28:35 -07:00
utsname_sysctl.c sys: don't hold uts_sem while accessing userspace memory 2018-08-11 02:05:53 -05:00
watchdog.c watchdog: Mark watchdog touch functions as notrace 2018-08-30 12:56:40 +02:00
watchdog_hld.c watchdog: Mark watchdog touch functions as notrace 2018-08-30 12:56:40 +02:00
workqueue.c watchdog: Mark watchdog touch functions as notrace 2018-08-30 12:56:40 +02:00
workqueue_internal.h workqueue: Set worker->desc to workqueue name by default 2018-05-18 08:47:13 -07:00