linux/kernel/sched
Rik van Riel 095bebf61a sched/numa: Do not move past the balance point if unbalanced
There is a subtle interaction between the logic introduced in commit
e63da03639 ("sched/numa: Allow task switch if load imbalance improves"),
the way the load balancer counts the load on each NUMA node, and the way
NUMA hinting faults are done.

Specifically, the load balancer only counts currently running tasks
in the load, while NUMA hinting faults may cause tasks to stop, if
the page is locked by another task.

This could cause all of the threads of a large single instance workload,
like SPECjbb2005, to migrate to the same NUMA node. This was possible
because occasionally they all fault on the same few pages, and only one
of the threads remains runnable. That thread can move to the process's
preferred NUMA node without making the imbalance worse, because nothing
else is running at that time.

The fix is to check the direction of the net moving of load, and to
refuse a NUMA move if it would cause the system to move past the point
of balance.  In an unbalanced state, only moves that bring us closer
to the balance point are allowed.

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: mgorman@suse.de
Link: http://lkml.kernel.org/r/20150203165648.0e9ac692@annuminas.surriel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-18 16:18:00 +01:00
..
Makefile sched/idle: Move cpu/idle.c to sched/idle.c 2014-02-11 09:58:30 +01:00
auto_group.c sched/autogroup: Fix failure to set cpu.rt_runtime_us 2015-02-18 16:17:20 +01:00
auto_group.h Revert "sched/autogroup: Fix crash on reboot when autogroup is disabled" 2012-12-11 10:23:45 +01:00
clock.c time: Replace __get_cpu_var uses 2014-08-26 13:45:44 -04:00
completion.c sched/completion: Serialize completion_done() with complete() 2015-02-18 14:27:40 +01:00
core.c sched/rt: Avoid obvious configuration fail 2015-02-18 16:17:23 +01:00
cpuacct.c cgroup: rename cgroup_subsys->base_cftypes to ->legacy_cftypes 2014-07-15 11:05:09 -04:00
cpuacct.h sched/cpuacct: Initialize root cpuacct earlier 2013-04-10 13:54:20 +02:00
cpudeadline.c sched/deadline: Remove cpu_active_mask from cpudl_find() 2015-02-04 07:52:29 +01:00
cpudeadline.h sched/deadline: Modify cpudl::free_cpus to reflect rd->online 2015-01-30 19:39:16 +01:00
cpupri.c Merge commit '3cf2f34' into sched/core, to fix build error 2014-06-12 13:46:37 +02:00
cpupri.h sched/cpupri: Remove unnecessary definitions in cpupri.h 2014-11-16 10:58:59 +01:00
cputime.c sched, time: Fix build error with 64 bit cputime_t on 32 bit systems 2014-10-03 05:46:55 +02:00
deadline.c sched/dl: Do update_rq_clock() in yield_task_dl() 2015-02-18 16:17:12 +01:00
debug.c sched/debug: Print rq->clock_task 2015-01-14 13:34:22 +01:00
fair.c sched/numa: Do not move past the balance point if unbalanced 2015-02-18 16:18:00 +01:00
features.h sched: Rename capacity related flags 2014-06-05 11:52:32 +02:00
idle.c sched/idle: Add missing checks to the exit condition of cpu_idle_poll() 2015-01-30 19:38:52 +01:00
idle_task.c sched: Provide update_curr callbacks for stop/idle scheduling classes 2014-11-23 14:14:40 -08:00
proc.c cpuidle: menu: Lookup CPU runqueues less 2014-08-06 21:17:45 +02:00
rt.c sched/rt: Reduce rq lock contention by eliminating locking of non-feasible target 2015-01-30 19:38:49 +01:00
sched.h sched: Make dl_task_time() use task_rq_lock() 2015-02-18 14:27:30 +01:00
stats.c kernel: audit/fix non-modular users of module_init in core code 2014-04-03 16:21:07 -07:00
stats.h sched: Micro-optimize by dropping unnecessary task_rq() calls 2013-09-25 13:51:06 +02:00
stop_task.c sched: Provide update_curr callbacks for stop/idle scheduling classes 2014-11-23 14:14:40 -08:00
wait.c sched/wait: Fix a kthread race with wait_woken() 2014-11-04 07:17:44 +01:00