linux

History

Mel Gorman 7347fc87df sched/numa: Delay retrying placement for automatic NUMA balance after wake_affine() If wake_affine() pulls a task to another node for any reason and the node is no longer preferred then temporarily stop automatic NUMA balancing pulling the task back. Otherwise, tasks with a strong waker/wakee relationship may constantly fight automatic NUMA balancing over where a task should be placed. Once again netperf is interesting here. The performance barely changes but automatic NUMA balancing is interesting: Hmean send-64 354.67 ( 0.00%) 352.15 ( -0.71%) Hmean send-128 702.91 ( 0.00%) 693.84 ( -1.29%) Hmean send-256 1350.07 ( 0.00%) 1344.19 ( -0.44%) Hmean send-1024 5124.38 ( 0.00%) 4941.24 ( -3.57%) Hmean send-2048 9687.44 ( 0.00%) 9624.45 ( -0.65%) Hmean send-3312 14577.64 ( 0.00%) 14514.35 ( -0.43%) Hmean send-4096 16393.62 ( 0.00%) 16488.30 ( 0.58%) Hmean send-8192 26877.26 ( 0.00%) 26431.63 ( -1.66%) Hmean send-16384 38683.43 ( 0.00%) 38264.91 ( -1.08%) Hmean recv-64 354.67 ( 0.00%) 352.15 ( -0.71%) Hmean recv-128 702.91 ( 0.00%) 693.84 ( -1.29%) Hmean recv-256 1350.07 ( 0.00%) 1344.19 ( -0.44%) Hmean recv-1024 5124.38 ( 0.00%) 4941.24 ( -3.57%) Hmean recv-2048 9687.43 ( 0.00%) 9624.45 ( -0.65%) Hmean recv-3312 14577.59 ( 0.00%) 14514.35 ( -0.43%) Hmean recv-4096 16393.55 ( 0.00%) 16488.20 ( 0.58%) Hmean recv-8192 26876.96 ( 0.00%) 26431.29 ( -1.66%) Hmean recv-16384 38682.41 ( 0.00%) 38263.94 ( -1.08%) NUMA alloc hit 1465986 `1423090` NUMA alloc miss 0 0 NUMA interleave hit 0 0 NUMA alloc local 1465897 1423003 NUMA base PTE updates 1473 1420 NUMA huge PMD updates 0 0 NUMA page range updates 1473 1420 NUMA hint faults 1383 1312 NUMA hint local faults 451 124 NUMA hint local percent 32 9 There is a slight degrading in performance but there are slightly fewer NUMA faults. There is a large drop in the percentage of local faults but the bulk of migrations for netperf are in small shared libraries so it's reflecting the fact that automatic NUMA balancing has backed off. This is a case where despite wake_affine() and automatic NUMA balancing fighting for placement that there is a marginal benefit to rescheduling to local data quickly. However, it should be noted that wake_affine() and automatic NUMA balancing fighting each other constantly is undesirable. However, the benefit in other cases is large. This is the result for NAS with the D class sizing on a 4-socket machine: nas-mpi 4.15.0 4.15.0 sdnuma-v1r23 delayretry-v1r23 Time cg.D 557.00 ( 0.00%) 431.82 ( 22.47%) Time ep.D 77.83 ( 0.00%) 79.01 ( -1.52%) Time is.D 26.46 ( 0.00%) 26.64 ( -0.68%) Time lu.D 727.14 ( 0.00%) 597.94 ( 17.77%) Time mg.D 191.35 ( 0.00%) 146.85 ( 23.26%) 4.15.0 4.15.0 sdnuma-v1r23delayretry-v1r23 User 75665.20 70413.30 System 20321.59 8861.67 Elapsed 766.13 634.92 Minor Faults 16528502 7127941 Major Faults 4553 5068 NUMA alloc local 6963197 6749135 NUMA base PTE updates 366409093 107491434 NUMA huge PMD updates 687556 198880 NUMA page range updates 718437765 209317994 NUMA hint faults 13643410 4601187 NUMA hint local faults 9212593 3063996 NUMA hint local percent 67 66 Note the massive reduction in system CPU usage even though the percentage of local faults is barely affected. There is a massive reduction in the number of PTE updates showing that automatic NUMA balancing has backed off. A critical observation is also that there is a massive reduction in minor faults which is due to far fewer NUMA hinting faults being trapped. There were questions on NAS OMP and how it behaved related to threads being bound to CPUs. First, there are more gains than losses with this patch applied and a reduction in system CPU usage: nas-omp 4.16.0-rc1 4.16.0-rc1 sdnuma-v2r1 delayretry-v2r1 Time bt.D 436.71 ( 0.00%) 430.05 ( 1.53%) Time cg.D 201.02 ( 0.00%) 180.87 ( 10.02%) Time ep.D 32.84 ( 0.00%) 32.68 ( 0.49%) Time is.D 9.63 ( 0.00%) 9.64 ( -0.10%) Time lu.D 331.20 ( 0.00%) 304.80 ( 7.97%) Time mg.D 54.87 ( 0.00%) 52.72 ( 3.92%) Time sp.D 1108.78 ( 0.00%) 917.10 ( 17.29%) Time ua.D 378.81 ( 0.00%) 398.83 ( -5.28%) 4.16.0-rc1 4.16.0-rc1 sdnuma-v2r1delayretry-v2r1 User 305633.08 296751.91 System 451.75 357.80 Elapsed 2595.73 2368.13 However, it does not close the gap between binding and being unbound. There is negligible difference between the performance of the baseline and a patched kernel when threads are bound so it is not presented here: 4.16.0-rc1 4.16.0-rc1 delayretry-bind delayretry-unbound Time bt.D 385.02 ( 0.00%) 430.05 ( -11.70%) Time cg.D 144.02 ( 0.00%) 180.87 ( -25.59%) Time ep.D 32.85 ( 0.00%) 32.68 ( 0.52%) Time is.D 10.52 ( 0.00%) 9.64 ( 8.37%) Time lu.D 285.31 ( 0.00%) 304.80 ( -6.83%) Time mg.D 43.21 ( 0.00%) 52.72 ( -22.01%) Time sp.D 820.24 ( 0.00%) 917.10 ( -11.81%) Time ua.D 337.09 ( 0.00%) 398.83 ( -18.32%) 4.16.0-rc1 4.16.0-rc1 delayretry-binddelayretry-unbound User 277731.25 296751.91 System 261.29 357.80 Elapsed 2100.55 2368.13 Unfortunately, while performance is improved by the patch, there is still quite a long way to go before it's equivalent to hard binding. Other workloads like hackbench, tbench, dbench and schbench are barely affected. dbench shows a mix of gains and losses depending on the machine although in general, the results are more stable. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Giovanni Gherdovich <ggherdovich@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180213133730.24064-7-mgorman@techsingularity.net Signed-off-by: Ingo Molnar <mingo@kernel.org>		2018-02-21 08:49:45 +01:00
..
bpf	bpf: sockmap, fix leaking maps with attached but not detached progs	2018-02-06 11:39:32 +01:00
cgroup	kernel/cpuset: current_cpuset_is_being_rebound can be boolean	2018-02-06 18:32:47 -08:00
configs	KVM changes for 4.16	2018-02-10 13:16:35 -08:00
debug	signal: Simplify and fix kdb_send_sig	2018-01-03 18:01:08 -06:00
events	vfs: do bulk POLL* -> EPOLL* replacement	2018-02-11 14:34:03 -08:00
gcov	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
irq	irqdomain: Re-use DEFINE_SHOW_ATTRIBUTE() macro	2018-02-16 14:22:34 +00:00
livepatch	Merge branch 'for-4.16/remove-immediate' into for-linus	2018-01-31 16:36:38 +01:00
locking	locking/qspinlock: Ensure node->count is updated before initialising node	2018-02-13 14:50:14 +01:00
power	x86/power: Fix swsusp_arch_resume prototype	2018-02-02 23:33:50 +01:00
printk	vfs: do bulk POLL* -> EPOLL* replacement	2018-02-11 14:34:03 -08:00
rcu	SCSI misc on 20180131	2018-01-31 11:23:28 -08:00
sched	sched/numa: Delay retrying placement for automatic NUMA balance after wake_affine()	2018-02-21 08:49:45 +01:00
time	vfs: do bulk POLL* -> EPOLL* replacement	2018-02-11 14:34:03 -08:00
trace	vfs: do bulk POLL* -> EPOLL* replacement	2018-02-11 14:34:03 -08:00
.gitignore
acct.c	kernel/acct.c: fix the acct->needcheck check in check_free_space()	2018-01-04 16:45:09 -08:00
async.c	kernel/async.c: revert "async: simplify lowest_in_progress()"	2018-02-06 18:32:44 -08:00
audit_fsnotify.c	Merge branch 'fsnotify' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs	2017-05-03 11:05:15 -07:00
audit_tree.c	Merge branch 'fsnotify' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs	2017-11-14 14:08:20 -08:00
audit_watch.c	audit/stable-4.13 PR 20170816	2017-08-16 16:48:34 -07:00
audit.c	Audit: remove unused audit_log_secctx function	2017-11-10 16:08:47 -05:00
audit.h	audit/stable-4.15 PR 20171113	2017-11-15 13:28:48 -08:00
auditfilter.c	audit: filter PATH records keyed on filesystem magic	2017-11-10 16:08:56 -05:00
auditsc.c	audit/stable-4.15 PR 20171113	2017-11-15 13:28:48 -08:00
backtracetest.c
bounds.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
capability.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
compat.c	cpumask: make cpumask_size() return "unsigned int"	2018-02-06 18:32:45 -08:00
configs.c
context_tracking.c
cpu_pm.c	PM / CPU: replace raw_notifier with atomic_notifier	2017-07-31 13:09:49 +02:00
cpu.c	Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2017-12-31 12:30:34 -08:00
crash_core.c	kdump: write correct address of mem_section into vmcoreinfo	2018-01-13 10:42:48 -08:00
crash_dump.c
cred.c	doc: ReSTify credentials.txt	2017-05-18 10:30:19 -06:00
delayacct.c	delayacct: Account blkio completion on the correct task	2018-01-16 03:29:36 +01:00
dma.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
elfcore.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
exec_domain.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
exit.c	kernel/exit.c: export abort() to modules	2018-01-04 16:45:09 -08:00
extable.c	kprobes, x86/alternatives: Use text_mutex to protect smp_alt_modules	2017-11-07 12:20:09 +01:00
fail_function.c	error-injection: Support fault injection framework	2018-01-12 17:33:38 -08:00
fork.c	Merge branch 'akpm' (patches from Andrew)	2018-02-06 22:15:42 -08:00
freezer.c
futex_compat.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
futex.c	pids: introduce find_get_task_by_vpid() helper	2018-02-06 18:32:46 -08:00
groups.c	kernel: make groups_sort calling a responsibility group_info allocators	2017-12-14 16:00:49 -08:00
hung_task.c	kernel/hung_task.c: defer showing held locks	2017-05-08 17:15:10 -07:00
irq_work.c	irq/work: Improve the flag definitions	2018-01-08 19:43:15 +01:00
jump_label.c	sched/core: Fix cpu.max vs. cpuhotplug deadlock	2018-01-24 10:03:44 +01:00
kallsyms.c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk	2018-02-01 13:36:15 -08:00
kcmp.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kcov.c	kcov: detect double association with a single task	2018-02-06 18:32:46 -08:00
kexec_core.c	x86/mm, kexec: Allow kexec to be used with SME	2017-07-18 11:38:04 +02:00
kexec_file.c	resource: Provide resource struct in resource walk callback	2017-11-07 15:35:57 +01:00
kexec_internal.h	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
kexec.c	kdump: protect vmcoreinfo data under the crash memory	2017-07-12 16:26:00 -07:00
kmod.c	kmod: move #ifdef CONFIG_MODULES wrapper to Makefile	2017-09-08 18:26:51 -07:00
kprobes.c	kprobes: Propagate error from disarm_kprobe_ftrace()	2018-02-16 09:12:58 +01:00
ksysfs.c	kexec: move vmcoreinfo out of the kernel's .bss section	2017-07-12 16:25:59 -07:00
kthread.c	treewide: Remove TIMER_FUNC_TYPE and TIMER_DATA_TYPE casts	2017-11-21 16:35:54 -08:00
latencytop.c
Makefile	error-injection: Support fault injection framework	2018-01-12 17:33:38 -08:00
memremap.c	mm: Fix devm_memremap_pages() collision handling	2018-01-19 16:29:56 -08:00
module_signing.c
module-internal.h
module.c	Modules updates for v4.16	2018-02-07 14:29:34 -08:00
notifier.c
nsproxy.c
padata.c	padata: add SPDX identifier	2018-01-05 18:43:00 +11:00
panic.c	kernel/panic.c: add TAINT_AUX	2017-11-17 16:10:04 -08:00
params.c	kernel/params.c: improve STANDARD_PARAM_DEF readability	2017-10-03 17:54:26 -07:00
pid_namespace.c	pid: remove pidhash	2017-11-17 16:10:04 -08:00
pid.c	pids: introduce find_get_task_by_vpid() helper	2018-02-06 18:32:46 -08:00
profile.c
ptrace.c	pids: introduce find_get_task_by_vpid() helper	2018-02-06 18:32:46 -08:00
range.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
reboot.c	kernel/reboot.c: add devm_register_reboot_notifier()	2017-11-17 16:10:04 -08:00
relay.c	vfs: do bulk POLL* -> EPOLL* replacement	2018-02-11 14:34:03 -08:00
resource.c	Merge branch 'akpm' (patches from Andrew)	2018-02-06 22:15:42 -08:00
seccomp.c	Merge branch 'next-seccomp' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security	2018-01-31 13:44:45 -08:00
signal.c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching	2018-01-31 13:02:18 -08:00
smp.c	smp/core: Use lockdep to assert IRQs are disabled/enabled	2017-11-08 11:13:50 +01:00
smpboot.c	watchdog/core, powerpc: Lock cpus across reconfiguration	2017-10-04 10:53:54 +02:00
smpboot.h	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
softirq.c	softirq: Eliminate cond_resched_rcu_qs() in favor of cond_resched()	2017-12-04 10:28:58 -08:00
stacktrace.c
stop_machine.c	stop_machine: Provide stop_machine_cpuslocked()	2017-05-26 10:10:36 +02:00
sys_ni.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
sys.c	fix typo in assignment of fs default overflow gid	2017-12-14 16:01:45 -06:00
sysctl_binary.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
sysctl.c	pipe: reject F_SETPIPE_SZ with size over UINT_MAX	2018-02-06 18:32:47 -08:00
task_work.c	locking/barriers: Convert users of lockless_dereference() to READ_ONCE()	2017-12-17 13:57:15 +01:00
taskstats.c	pids: introduce find_get_task_by_vpid() helper	2018-02-06 18:32:46 -08:00
test_kprobes.c	kprobes: Disable the jprobes test code	2017-10-20 11:02:54 +02:00
torture.c	torture: Save a line in stutter_wait(): while -> for	2017-12-11 09:18:30 -08:00
tracepoint.c	tracepoint: Remove smp_read_barrier_depends() from comment	2017-12-04 10:52:56 -08:00
tsacct.c
ucount.c
uid16.c	kernel: make groups_sort calling a responsibility group_info allocators	2017-12-14 16:00:49 -08:00
umh.c	kernel/umh.c: optimize 'proc_cap_handler()'	2017-11-17 16:10:01 -08:00
up.c	smp: Avoid using two cache lines for struct call_single_data	2017-08-29 15:14:38 +02:00
user_namespace.c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace	2017-11-16 12:20:15 -08:00
user-return-notifier.c
user.c	userns: use union in {g,u}idmap struct	2017-10-31 17:22:58 -05:00
utsname_sysctl.c
utsname.c
watchdog_hld.c	Merge branch 'linus' into core/urgent, to pick up dependent commits	2017-11-04 08:53:04 +01:00
watchdog.c	Merge branch 'linus' into sched/core, to pick up fixes	2017-11-08 10:17:15 +01:00
workqueue_internal.h	Merge branch 'for-4.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq	2017-11-06 12:26:49 -08:00
workqueue.c	Staging/IIO patches for 4.16-rc1	2018-02-01 09:51:57 -08:00