linux/kernel
Kirill Tkhai 67dfa1b756 sched/deadline: Implement cancel_dl_timer() to use in switched_from_dl()
Currently used hrtimer_try_to_cancel() is racy:

raw_spin_lock(&rq->lock)
...                            dl_task_timer                 raw_spin_lock(&rq->lock)
...                               raw_spin_lock(&rq->lock)   ...
   switched_from_dl()             ...                        ...
      hrtimer_try_to_cancel()     ...                        ...
   switched_to_fair()             ...                        ...
...                               ...                        ...
...                               ...                        ...
raw_spin_unlock(&rq->lock)        ...                        (asquired)
...                               ...                        ...
...                               ...                        ...
do_exit()                         ...                        ...
   schedule()                     ...                        ...
      raw_spin_lock(&rq->lock)    ...                        raw_spin_unlock(&rq->lock)
      ...                         ...                        ...
      raw_spin_unlock(&rq->lock)  ...                        raw_spin_lock(&rq->lock)
      ...                         ...                        (asquired)
      put_task_struct()           ...                        ...
          free_task_struct()      ...                        ...
      ...                         ...                        raw_spin_unlock(&rq->lock)
...                               (asquired)                 ...
...                               ...                        ...
...                               (use after free)           ...

So, let's implement 100% guaranteed way to cancel the timer and let's
be sure we are safe even in very unlikely situations.

rq unlocking does not limit the area of switched_from_dl() use, because
this has already been possible in pull_dl_task() below.

Let's consider the safety of of this unlocking. New code in the patch
is working when hrtimer_try_to_cancel() fails. This means the callback
is running. In this case hrtimer_cancel() is just waiting till the
callback is finished. Two

1) Since we are in switched_from_dl(), new class is not dl_sched_class and
new prio is not less MAX_DL_PRIO. So, the callback returns early; it's
right after !dl_task() check. After that hrtimer_cancel() returns back too.

The above is:

raw_spin_lock(rq->lock);                  ...
...                                       dl_task_timer()
...                                          raw_spin_lock(rq->lock);
   switched_from_dl()                        ...
       hrtimer_try_to_cancel()               ...
          raw_spin_unlock(rq->lock);         ...
          hrtimer_cancel()                   ...
          ...                                raw_spin_unlock(rq->lock);
          ...                                return HRTIMER_NORESTART;
          ...                             ...
          raw_spin_lock(rq->lock);        ...

2) But the below is also possible:
                                   dl_task_timer()
                                      raw_spin_lock(rq->lock);
                                      ...
                                      raw_spin_unlock(rq->lock);
raw_spin_lock(rq->lock);              ...
   switched_from_dl()                 ...
       hrtimer_try_to_cancel()        ...
       ...                            return HRTIMER_NORESTART;
       raw_spin_unlock(rq->lock);  ...
       hrtimer_cancel();           ...
       raw_spin_lock(rq->lock);    ...

In this case hrtimer_cancel() returns immediately. Very unlikely case,
just to mention.

Nobody can manipulate the task, because check_class_changed() is
always called with pi_lock locked. Nobody can force the task to
participate in (concurrent) priority inheritance schemes (the same reason).

All concurrent task operations require pi_lock, which is held by us.
No deadlocks with dl_task_timer() are possible, because it returns
right after !dl_task() check (it does nothing).

If we receive a new dl_task during the time of unlocked rq, we just
don't have to do pull_dl_task() in switched_from_dl() further.

Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
[ Added comments]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1414420852.19914.186.camel@tkhai
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-04 07:17:50 +01:00
..
bpf bpf: add search pruning optimization to verifier 2014-10-01 21:30:33 -04:00
configs x86: Add "make tinyconfig" to configure the tiniest possible kernel 2014-08-08 16:30:24 -07:00
debug kdb: replace strnicmp with strncasecmp 2014-10-14 02:18:25 +02:00
events Merge branch 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu 2014-10-15 07:48:18 +02:00
gcov ARM: 8153/1: Enable gcov support on the ARM architecture 2014-09-26 14:39:57 +01:00
irq Merge branch 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu 2014-10-15 07:48:18 +02:00
locking locking/mutex: Don't assume TASK_RUNNING 2014-10-28 10:55:08 +01:00
power Merge branch 'freezer' 2014-10-23 23:02:45 +02:00
printk Merge branch 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu 2014-10-15 07:48:18 +02:00
rcu rcu: Eliminate deadlock between CPU hotplug and expedited grace periods 2014-09-18 16:22:27 -07:00
sched sched/deadline: Implement cancel_dl_timer() to use in switched_from_dl() 2014-11-04 07:17:50 +01:00
time Merge branch 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu 2014-10-15 07:48:18 +02:00
trace Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-10-13 16:23:15 +02:00
.gitignore
Kconfig.freezer
Kconfig.hz
Kconfig.locks locking/rwsem: Add CONFIG_RWSEM_SPIN_ON_OWNER 2014-07-16 14:57:13 +02:00
Kconfig.preempt
Makefile bin2c: move bin2c in scripts/basic 2014-08-08 15:57:32 -07:00
acct.c acct: eliminate compile warning 2014-10-09 22:26:04 -04:00
async.c kernel/async.c: switch to pr_foo() 2014-10-09 22:26:04 -04:00
audit.c audit, sched/wait: Fixup kauditd_thread() wait loop 2014-11-04 07:17:47 +01:00
audit.h audit: reduce scope of audit_log_fcaps 2014-09-23 16:37:51 -04:00
audit_tree.c audit: rename audit_log_remove_rule to disambiguate for trees 2014-10-10 15:30:25 -04:00
audit_watch.c audit: invalid op= values for rules 2014-09-23 16:37:53 -04:00
auditfilter.c Merge git://git.infradead.org/users/eparis/audit 2014-10-19 16:25:56 -07:00
auditsc.c Merge git://git.infradead.org/users/eparis/audit 2014-10-19 16:25:56 -07:00
backtracetest.c kernel/backtracetest.c: replace no level printk by pr_info() 2014-06-04 16:54:14 -07:00
bounds.c page-cgroup: get rid of NR_PCG_FLAGS 2014-08-08 15:57:18 -07:00
capability.c CAPABILITIES: remove undefined caps from all processes 2014-07-24 21:53:47 +10:00
cgroup.c Merge branch 'for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu 2014-10-10 07:26:02 -04:00
cgroup_freezer.c cgroup: rename cgroup_subsys->base_cftypes to ->legacy_cftypes 2014-07-15 11:05:09 -04:00
compat.c compat: nanosleep: Clarify error handling 2014-09-06 12:58:18 +02:00
configs.c
context_tracking.c sched: stop the unbound recursion in preempt_schedule_context() 2014-10-28 10:46:05 +01:00
cpu.c rcu: Eliminate deadlock between CPU hotplug and expedited grace periods 2014-09-18 16:22:27 -07:00
cpu_pm.c
cpuset.c sched/deadline: Ensure that updates to exclusive cpusets don't break AC 2014-10-28 10:48:00 +01:00
crash_dump.c crash_dump: Make is_kdump_kernel() accessible from modules 2014-08-25 15:42:19 -07:00
cred.c
delayacct.c delayacct: Remove braindamaged type conversions 2014-07-23 10:18:06 -07:00
dma.c
elfcore.c
exec_domain.c kernel/exec_domain.c: code clean-up 2014-06-04 16:54:15 -07:00
exit.c sched, exit: Deal with nested sleeps 2014-10-28 10:55:30 +01:00
extable.c
fork.c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-10-13 16:23:15 +02:00
freezer.c freezer: remove obsolete comments in __thaw_task() 2014-10-21 23:44:20 +02:00
futex.c futex: Ensure get_futex_key_refs() always implies a barrier 2014-10-18 09:28:51 -07:00
futex_compat.c
groups.c kernel/groups.c: remove return value of set_groups 2014-04-03 16:21:05 -07:00
hung_task.c kernel/hung_task.c: convert simple_strtoul to kstrtouint 2014-06-04 16:54:15 -07:00
irq_work.c Merge branch 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu 2014-10-15 07:48:18 +02:00
jump_label.c
kallsyms.c kernel/kallsyms.c: use __seq_open_private() 2014-10-14 02:18:16 +02:00
kcmp.c kcmp: fix standard comparison bug 2014-09-10 15:42:12 -07:00
kexec.c kexec: remove the unused function parameter 2014-10-14 02:18:21 +02:00
kmod.c signals: change wait_for_helper() to use kernel_sigaction() 2014-06-06 16:08:12 -07:00
kprobes.c kprobes: Skip kretprobe hit in NMI context to avoid deadlock 2014-08-08 10:38:04 +02:00
ksysfs.c kobject: Make support for uevent_helper optional. 2014-04-25 12:00:49 -07:00
kthread.c kernel/kthread.c: partial revert of 81c98869fa ("kthread: ensure locality of task_struct allocations") 2014-10-09 22:25:51 -04:00
latencytop.c kernel/latencytop.c: convert seq_printf to seq_puts 2014-06-04 16:54:15 -07:00
module-internal.h
module.c sched, modules: Fix nested sleep in add_unformed_module() 2014-10-28 10:56:30 +01:00
module_signing.c
notifier.c kprobes, notifier: Use NOKPROBE_SYMBOL macro in notifier 2014-04-24 10:26:39 +02:00
nsproxy.c namespaces: Use task_lock and not rcu to protect nsproxy 2014-07-29 18:08:50 -07:00
padata.c
panic.c panic: add TAINT_SOFTLOCKUP 2014-08-08 15:57:24 -07:00
params.c kernel/param: consolidate __{start,stop}___param[] in <linux/moduleparam.h> 2014-10-14 02:18:28 +02:00
pid.c
pid_namespace.c pid_namespace: pidns_get() should check task_active_pid_ns() != NULL 2014-04-02 16:20:21 -07:00
profile.c kernel/profile.c: use static const char instead of static char 2014-06-06 16:08:13 -07:00
ptrace.c sched: Remove proliferation of wait_on_bit() action functions 2014-07-16 15:10:39 +02:00
range.c
reboot.c kernel: add support for kernel restart handler call chain 2014-09-26 00:00:06 -07:00
relay.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-04-12 14:49:50 -07:00
res_counter.c kernel/res_counter.c: replace simple_strtoull by kstrtoull 2014-06-04 16:54:15 -07:00
resource.c x86: optimize resource lookups for ioremap 2014-10-14 02:18:22 +02:00
seccomp.c Merge branch 'x86-seccomp-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-10-14 02:27:06 +02:00
signal.c Merge branch 'signal-cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/misc 2014-08-09 09:58:12 -07:00
smp.c Merge branch 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu 2014-10-15 07:48:18 +02:00
smpboot.c sched, smp: Correctly deal with nested sleeps 2014-10-28 10:56:24 +01:00
smpboot.h
softirq.c Merge branch 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu 2014-10-15 07:48:18 +02:00
stacktrace.c
stop_machine.c kernel/stop_machine.c: kernel-doc warning fix 2014-06-04 16:54:15 -07:00
sys.c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-10-13 16:23:15 +02:00
sys_ni.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-10-08 21:40:54 -04:00
sysctl.c sched/fair: Fix division by zero sysctl_numa_balancing_scan_size 2014-10-28 10:46:04 +01:00
sysctl_binary.c dmaengine-3.17 2014-10-07 20:39:25 -04:00
system_certificates.S
system_keyring.c KEYS: validate certificate trust only with builtin keys 2014-07-17 09:35:17 -04:00
task_work.c
taskstats.c scheduler: Replace __get_cpu_var with this_cpu_ptr 2014-08-26 13:45:45 -04:00
test_kprobes.c kernel/test_kprobes.c: use current logging functions 2014-08-08 15:57:18 -07:00
torture.c torture: Address race in module cleanup 2014-09-16 13:41:06 -07:00
tracepoint.c tracing: syscall_regfunc() should not skip kernel threads 2014-06-21 00:15:26 -04:00
tsacct.c sched: Make task->start_time nanoseconds based 2014-07-23 10:18:05 -07:00
uid16.c
up.c
user-return-notifier.c scheduler: Replace __get_cpu_var with this_cpu_ptr 2014-08-26 13:45:45 -04:00
user.c kernel/user.c: drop unused field 'files' from user_struct 2014-06-04 16:54:16 -07:00
user_namespace.c proc: constify seq_operations 2014-08-08 15:57:22 -07:00
utsname.c namespaces: Use task_lock and not rcu to protect nsproxy 2014-07-29 18:08:50 -07:00
utsname_sysctl.c sysctl: convert use of typedef ctl_table to struct ctl_table 2014-06-06 16:08:16 -07:00
watchdog.c Merge branch 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu 2014-10-15 07:48:18 +02:00
workqueue.c workqueue: Use cond_resched_rcu_qs macro 2014-10-06 05:58:26 -07:00
workqueue_internal.h workqueue: rename manager_mutex to attach_mutex 2014-05-20 10:59:32 -04:00