linux

History

Peter Zijlstra 10e2f1acd0 sched/core: Rewrite and improve select_idle_siblings() select_idle_siblings() is a known pain point for a number of workloads; it either does too much or not enough and sometimes just does plain wrong. This rewrite attempts to address a number of issues (but sadly not all). The current code does an unconditional sched_domain iteration; with the intent of finding an idle core (on SMT hardware). The problems which this patch tries to address are: - its pointless to look for idle cores if the machine is real busy; at which point you're just wasting cycles. - it's behaviour is inconsistent between SMT and !SMT hardware in that !SMT hardware ends up doing a scan for any idle CPU in the LLC domain, while SMT hardware does a scan for idle cores and if that fails, falls back to a scan for idle threads on the 'target' core. The new code replaces the sched_domain scan with 3 explicit scans: 1) search for an idle core in the LLC 2) search for an idle CPU in the LLC 3) search for an idle thread in the 'target' core where 1 and 3 are conditional on SMT support and 1 and 2 have runtime heuristics to skip the step. Step 1) is conditional on sd_llc_shared->has_idle_cores; when a cpu goes idle and sd_llc_shared->has_idle_cores is false, we scan all SMT siblings of the CPU going idle. Similarly, we clear sd_llc_shared->has_idle_cores when we fail to find an idle core. Step 2) tracks the average cost of the scan and compares this to the average idle time guestimate for the CPU doing the wakeup. There is a significant fudge factor involved to deal with the variability of the averages. Esp. hackbench was sensitive to this. Step 3) is unconditional; we assume (also per step 1) that scanning all SMT siblings in a core is 'cheap'. With this; SMT systems gain step 2, which cures a few benchmarks -- notably one from Facebook. One 'feature' of the sched_domain iteration, which we preserve in the new code, is that it would start scanning from the 'target' CPU, instead of scanning the cpumask in cpu id order. This avoids multiple CPUs in the LLC scanning for idle to gang up and find the same CPU quite as much. The down side is that tasks can end up hopping across the LLC for no apparent reason. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>		2016-09-30 11:03:09 +02:00
..
Makefile	cpufreq: schedutil: New governor based on scheduler utilization data	2016-04-02 01:09:12 +02:00
auto_group.c	sched/core: Move the sched_to_prio[] arrays out of line	2015-12-04 10:34:46 +01:00
auto_group.h	sched, timer: Convert usages of ACCESS_ONCE() in the scheduler to READ_ONCE()/WRITE_ONCE()	2015-05-08 12:11:32 +02:00
clock.c	sched/clock: Make local_clock()/cpu_clock() inline	2016-04-13 12:25:22 +02:00
completion.c	sched/completion: Serialize completion_done() with complete()	2015-02-18 14:27:40 +01:00
core.c	sched/core: Rewrite and improve select_idle_siblings()	2016-09-30 11:03:09 +02:00
cpuacct.c	sched/cpuacct: Introduce cpuacct.usage_all to show all CPU stats together	2016-07-09 13:56:15 +02:00
cpuacct.h	sched/cpuacct: Simplify the cpuacct code	2016-03-21 11:00:28 +01:00
cpudeadline.c	sched/deadline: Split cpudl_set() into cpudl_set() and cpudl_clear()	2016-09-05 13:29:43 +02:00
cpudeadline.h	sched/deadline: Split cpudl_set() into cpudl_set() and cpudl_clear()	2016-09-05 13:29:43 +02:00
cpufreq.c	cpufreq: sched: Helpers to add and remove update_util hooks	2016-04-02 01:08:43 +02:00
cpufreq_schedutil.c	cpufreq: schedutil: map raw required frequency to driver frequency	2016-07-21 22:28:21 +02:00
cpupri.c	sched/core: Use tsk_cpus_allowed() instead of accessing ->cpus_allowed	2016-05-12 09:55:35 +02:00
cpupri.h	sched/cpupri: Remove unnecessary definitions in cpupri.h	2014-11-16 10:58:59 +01:00
cputime.c	sched/cputime: Improve scalability by not accounting thread group tasks pending runtime	2016-08-18 11:53:46 +02:00
deadline.c	sched/deadline: Fix the intention to re-evalute tick dependency for offline CPU	2016-09-05 13:29:45 +02:00
debug.c	sched/debug: Remove several CONFIG_SCHEDSTATS guards	2016-09-05 13:29:47 +02:00
fair.c	sched/core: Rewrite and improve select_idle_siblings()	2016-09-30 11:03:09 +02:00
features.h	sched/fair: Convert arch_scale_cpu_capacity() from weak function to #define	2015-09-13 09:52:55 +02:00
idle.c	Merge branch 'sched/urgent' into sched/core, to pick up fixes	2016-06-14 11:04:13 +02:00
idle_task.c	sched/core: Rewrite and improve select_idle_siblings()	2016-09-30 11:03:09 +02:00
loadavg.c	sched/core: Correct off by one bug in load migration calculation	2016-07-13 14:58:20 +02:00
rt.c	sched/core: Provide a tsk_nr_cpus_allowed() helper	2016-05-12 09:55:36 +02:00
sched.h	sched/core: Rewrite and improve select_idle_siblings()	2016-09-30 11:03:09 +02:00
stats.c	sched: use %*pb[l] to print bitmaps including cpumasks and nodemasks	2015-02-13 21:21:37 -08:00
stats.h	sched/debug: Rename 'schedstat_val()' -> 'schedstat_val_or_zero()'	2016-09-05 13:29:46 +02:00
stop_task.c	locking/lockdep, sched/core: Implement a better lock pinning scheme	2016-05-05 09:23:59 +02:00
swait.c	wait.[ch]: Introduce the simple waitqueue (swait) implementation	2016-02-25 11:27:16 +01:00
wait.c	sched/wait: Introduce init_wait_entry()	2016-09-30 10:54:03 +02:00