linux/kernel
Thomas Gleixner 01b8eab1cc rtmutex: Plug slow unlock race
commit 27e35715df upstream.

When the rtmutex fast path is enabled the slow unlock function can
create the following situation:

spin_lock(foo->m->wait_lock);
foo->m->owner = NULL;
	    			rt_mutex_lock(foo->m); <-- fast path
				free = atomic_dec_and_test(foo->refcnt);
				rt_mutex_unlock(foo->m); <-- fast path
				if (free)
				   kfree(foo);

spin_unlock(foo->m->wait_lock); <--- Use after free.

Plug the race by changing the slow unlock to the following scheme:

     while (!rt_mutex_has_waiters(m)) {
     	    /* Clear the waiters bit in m->owner */
	    clear_rt_mutex_waiters(m);
      	    owner = rt_mutex_owner(m);
      	    spin_unlock(m->wait_lock);
      	    if (cmpxchg(m->owner, owner, 0) == owner)
      	       return;
      	    spin_lock(m->wait_lock);
     }

So in case of a new waiter incoming while the owner tries the slow
path unlock we have two situations:

 unlock(wait_lock);
					lock(wait_lock);
 cmpxchg(p, owner, 0) == owner
 	    	   			mark_rt_mutex_waiters(lock);
	 				acquire(lock);

Or:

 unlock(wait_lock);
					lock(wait_lock);
	 				mark_rt_mutex_waiters(lock);
 cmpxchg(p, owner, 0) != owner
					enqueue_waiter();
					unlock(wait_lock);
 lock(wait_lock);
 wakeup_next waiter();
 unlock(wait_lock);
					lock(wait_lock);
					acquire(lock);

If the fast path is disabled, then the simple

   m->owner = NULL;
   unlock(m->wait_lock);

is sufficient as all access to m->owner is serialized via
m->wait_lock;

Also document and clarify the wakeup_next_waiter function as suggested
by Oleg Nesterov.

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140611183852.937945560@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30 20:11:58 -07:00
..
cpu sched/preempt: Fix up missed PREEMPT_NEED_RESCHED folding 2014-01-13 17:38:55 +01:00
debug kgdb/kdb: Fix no KDB config problem 2014-01-25 08:55:09 +01:00
events perf: Fix race in removing an event 2014-06-11 11:54:08 -07:00
gcov gcov: reuse kbasename helper 2013-11-13 12:09:34 +09:00
irq genirq: Allow forcing cpu affinity of interrupts 2014-06-07 10:28:08 -07:00
locking rtmutex: Plug slow unlock race 2014-06-30 20:11:58 -07:00
power arm, pm, vmpressure: add missing slab.h includes 2014-02-03 13:24:01 -05:00
printk printk: fix syslog() overflowing user buffer 2014-02-17 12:24:45 -08:00
rcu Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-01-28 08:38:04 -08:00
sched sched: Fix sched_policy < 0 comparison 2014-06-11 11:54:11 -07:00
time tick-sched: Check tick_nohz_enabled in tick_nohz_switch_to_nohz() 2014-05-31 13:20:28 -07:00
trace ftrace/module: Hardcode ftrace_module_init() call into load_module() 2014-06-07 10:28:07 -07:00
.gitignore Ignore generated file kernel/x509_certificate_list 2013-12-10 18:21:34 +00:00
Kconfig.freezer
Kconfig.hz kernel: remove CONFIG_USE_GENERIC_SMP_HELPERS 2013-11-15 09:32:22 +09:00
Kconfig.locks
Kconfig.preempt
Makefile KEYS: Remove files generated when SYSTEM_TRUSTED_KEYRING=y 2013-12-13 15:59:11 +00:00
acct.c
async.c
audit.c net: Use netlink_ns_capable to verify the permisions of netlink messages 2014-06-26 15:15:38 -04:00
audit.h audit: Use struct net not pid_t to remember the network namespce to reply in 2014-02-28 04:04:33 -08:00
audit_tree.c inotify: Fix reporting of cookies for inotify events 2014-02-18 11:17:17 +01:00
audit_watch.c inotify: Fix reporting of cookies for inotify events 2014-02-18 11:17:17 +01:00
auditfilter.c audit: Update kdoc for audit_send_reply and audit_list_rules_send 2014-03-08 15:31:54 -08:00
auditsc.c auditsc: audit_krule mask accesses need bounds checking 2014-06-16 13:40:33 -07:00
backtracetest.c
bounds.c mm: do not allocate page->ptl dynamically, if spinlock_t fits to long 2013-12-20 12:25:45 -08:00
capability.c fs,userns: Change inode_capable to capable_wrt_inode_uidgid 2014-06-16 13:40:32 -07:00
cgroup.c cgroup: fix a failure path in create_css() 2014-03-18 17:15:36 -04:00
cgroup_freezer.c cgroup: replace cftype->read_seq_string() with cftype->seq_show() 2013-12-05 12:28:04 -05:00
compat.c
configs.c
context_tracking.c context_tracking: Wrap static key check into more intuitive function name 2013-12-02 20:43:14 +01:00
cpu.c sched: Fix hotplug vs. set_cpus_allowed_ptr() 2014-06-11 11:54:11 -07:00
cpu_pm.c
cpuset.c cpuset: fix a race condition in __cpuset_node_allowed_softwall() 2014-02-27 09:39:54 -05:00
crash_dump.c
cred.c
delayacct.c kernel/delayacct.c: remove redundant checking in __delayacct_add_tsk() 2013-11-13 12:09:12 +09:00
dma.c
elfcore.c switch elf_core_write_extra_phdrs() to dump_emit() 2013-11-09 00:16:23 -05:00
exec_domain.c
exit.c exit: call disassociate_ctty() before exit_task_namespaces() 2014-04-26 17:19:07 -07:00
extable.c kernel/extable: fix address-checks for core_kernel and init areas 2013-11-28 09:49:41 -08:00
fork.c ptrace: fix fork event messages across pid namespaces 2014-06-30 20:11:54 -07:00
freezer.c libata, freezer: avoid block device removal while system is frozen 2013-12-19 13:50:32 -05:00
futex.c futex: Make lookup_pi_state more robust 2014-06-07 10:28:29 -07:00
futex_compat.c
groups.c userns: Kill nsown_capable it makes the wrong thing easy 2013-08-30 23:44:11 -07:00
hrtimer.c hrtimer: Set expiry time before switch_hrtimer_base() 2014-06-07 10:28:12 -07:00
hung_task.c hung_task: Display every hung task warning 2014-01-25 12:13:33 +01:00
irq_work.c
itimer.c
jump_label.c static_key: WARN on usage before jump_label_init was called 2013-10-19 19:45:35 -04:00
kallsyms.c
kcmp.c
kexec.c powerpc, kexec: Fix "Processor X is stuck" issue during kexec from ST mode 2014-06-07 10:28:28 -07:00
kmod.c execve: use 'struct filename *' for executable name passing 2014-02-05 12:54:53 -08:00
kprobes.c kprobes: use KSYM_NAME_LEN to size identifier buffers 2013-11-13 12:09:26 +09:00
ksysfs.c kdump: fix exported size of vmcoreinfo note 2014-01-23 16:37:03 -08:00
kthread.c kthread: fix return value of kthread_create() upon SIGKILL. 2014-06-30 20:11:53 -07:00
latencytop.c
module-internal.h KEYS: Separate the kernel signature checking keyring from module signing 2013-09-25 17:17:01 +01:00
module.c module: remove warning about waiting module removal. 2014-06-07 10:28:11 -07:00
module_signing.c keys: change asymmetric keys to use common hash definitions 2013-10-25 17:15:18 -04:00
notifier.c
nsproxy.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2013-09-07 14:35:32 -07:00
padata.c padata: Fix wrong usage of rcu_dereference() 2013-12-05 21:28:42 +08:00
panic.c panic: Make panic_timeout configurable 2013-11-26 12:12:26 +01:00
params.c params: improve standard definitions 2013-12-04 14:09:46 +10:30
pid.c pidns: fix free_pid() to handle the first fork failure 2013-09-30 14:31:03 -07:00
pid_namespace.c pid_namespace: pidns_get() should check task_active_pid_ns() != NULL 2014-04-26 17:19:04 -07:00
posix-cpu-timers.c posix-timers: Convert abuses of BUG_ON to WARN_ON 2013-12-09 16:56:29 +01:00
posix-timers.c
profile.c mm: fix GFP_THISNODE callers and clarify 2014-03-10 17:26:19 -07:00
ptrace.c exec/ptrace: fix get_dumpable() incorrect tests 2013-11-13 12:09:33 +09:00
range.c
reboot.c kexec: migrate to reboot cpu 2013-12-18 19:04:50 -08:00
relay.c
res_counter.c memcg: reduce function dereference 2013-09-12 15:38:02 -07:00
resource.c
seccomp.c
signal.c kernel/signal.c: change do_signal_stop/do_sigaction to use while_each_thread() 2014-01-23 16:37:02 -08:00
smp.c kernel/smp.c: remove cpumask_ipi 2014-01-30 16:56:54 -08:00
smpboot.c
smpboot.h
softirq.c Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-01-31 09:02:51 -08:00
stacktrace.c
stop_machine.c stop_machine: Fix^2 race between stop_two_cpus() and stop_cpus() 2014-03-11 11:33:47 +01:00
sys.c kernel/sys.c: k_getrusage() can use while_each_thread() 2014-01-23 16:37:02 -08:00
sys_ni.c
sysctl.c hung_task: check the value of "sysctl_hung_task_timeout_sec" 2014-05-06 07:59:35 -07:00
sysctl_binary.c kernel/sysctl_binary.c: use scnprintf() instead of snprintf() 2013-11-13 12:09:33 +09:00
system_certificates.S KEYS: correct alignment of system_certificate_list content in assembly file 2013-12-10 18:25:28 +00:00
system_keyring.c KEYS: correct alignment of system_certificate_list content in assembly file 2013-12-10 18:25:28 +00:00
task_work.c task_work: documentation 2013-09-11 15:58:27 -07:00
taskstats.c genetlink: only pass array to genl_register_family_with_ops() 2013-11-19 16:39:05 -05:00
test_kprobes.c
time.c
timeconst.bc
timer.c timer: Prevent overflow in apply_slack 2014-06-07 10:28:09 -07:00
tracepoint.c tracepoint: Do not waste memory on mods with no tracepoints 2014-05-31 13:20:28 -07:00
tsacct.c
uid16.c userns: Kill nsown_capable it makes the wrong thing easy 2013-08-30 23:44:11 -07:00
up.c kernel: provide a __smp_call_function_single stub for !CONFIG_SMP 2013-11-15 09:32:22 +09:00
user-return-notifier.c
user.c KEYS: fix uninitialized persistent_keyring_register_sem 2013-12-13 15:59:11 +00:00
user_namespace.c user namespace: fix incorrect memory barriers 2014-04-26 17:19:02 -07:00
utsname.c userns: Kill nsown_capable it makes the wrong thing easy 2013-08-30 23:44:11 -07:00
utsname_sysctl.c
watchdog.c watchdog: update watchdog_thresh properly 2013-09-24 17:00:25 -07:00
workqueue.c workqueue: make rescuer_thread() empty wq->maydays list before exiting 2014-06-07 10:28:22 -07:00
workqueue_internal.h