linux

Commit Graph

Author	SHA1	Message	Date
Nobody	0ce81bc93c	Linux 3.14 with MCST patches	2020-10-14 02:51:01 +03:00
Nobody	3878bb6f09	unionfs: apply unionfs patch	2020-10-14 01:17:03 +03:00
Steven Rostedt (Red Hat)	ad44f5675c	Linux 3.14.46-rt46 REBASE	2020-10-14 00:59:25 +03:00
Frederic Weisbecker	fc9c01480b	irq_work: Split raised and lazy lists commit `b93e0b8fa8` upstream. An irq work can be handled from two places: from the tick if the work carries the "lazy" flag and the tick is periodic, or from a self IPI. We merge all these works in a single list and we use some per cpu latch to avoid raising a self-IPI when one is already pending. Now we could do away with this ugly latch if only the list was only made of non-lazy works. Just enqueueing a work on the empty list would be enough to know if we need to raise an IPI or not. Also we are going to implement remote irq work queuing. Then the per CPU latch will need to become atomic in the global scope. That's too bad because, here as well, just enqueueing a work on an empty list of non-lazy works would be enough to know if we need to raise an IPI or not. So lets take a way out of this: split the works in two distinct lists, one for the works that can be handled by the next tick and another one for those handled by the IPI. Just checking if the latter is empty when we queue a new work is enough to know if we need to raise an IPI. Suggested-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Conflicts: kernel/irq_work.c Merged in some changes from 4.0-rt that added the irq_work_tick() code, and also has the raised_list called from hardirq context and the lazy_list always from softirq context (which is threaded on RT) Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:25 +03:00
Peter Zijlstra	f25a5ec582	irq_work: Introduce arch_irq_work_has_interrupt() commit `c5c38ef3d7` upstream. The nohz full code needs irq work to trigger its own interrupt so that the subsystem can work even when the tick is stopped. Lets introduce arch_irq_work_has_interrupt() that archs can override to tell about their support for this ability. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2020-10-14 00:59:25 +03:00
Mike Galbraith	f52d7ef05a	rt, nohz_full: fix nohz_full for PREEMPT_RT_FULL A task being ticked and trying to shut the tick down will fail due to having just awakened ksoftirqd, subtract it from nr_running. Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:25 +03:00
Mike Galbraith	8b45f30e76	hotplug: Use set_cpus_allowed_ptr() in sync_unplug_thread() do_set_cpus_allowed() is not safe vs ->sched_class change. crash> bt PID: 11676 TASK: ffff88026f979da0 CPU: 22 COMMAND: "sync_unplug/22" #0 [ffff880274d25bc8] machine_kexec at ffffffff8103b41c #1 [ffff880274d25c18] crash_kexec at ffffffff810d881a #2 [ffff880274d25cd8] oops_end at ffffffff81525818 #3 [ffff880274d25cf8] do_invalid_op at ffffffff81003096 #4 [ffff880274d25d90] invalid_op at ffffffff8152d3de [exception RIP: set_cpus_allowed_rt+18] RIP: ffffffff8109e012 RSP: ffff880274d25e48 RFLAGS: 00010202 RAX: ffffffff8109e000 RBX: ffff88026f979da0 RCX: ffff8802770cb6e8 RDX: 0000000000000000 RSI: ffffffff81add700 RDI: ffff88026f979da0 RBP: ffff880274d25e78 R8: ffffffff816112e0 R9: 0000000000000001 R10: 0000000000000001 R11: 0000000000011940 R12: ffff88026f979da0 R13: ffff8802770cb6d0 R14: ffff880274d25fd8 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #5 [ffff880274d25e60] do_set_cpus_allowed at ffffffff8108e65f #6 [ffff880274d25e80] sync_unplug_thread at ffffffff81058c08 #7 [ffff880274d25ed8] kthread at ffffffff8107cad6 #8 [ffff880274d25f50] ret_from_fork at ffffffff8152bbbc crash> task_struct ffff88026f979da0 \| grep class sched_class = 0xffffffff816111e0 <fair_sched_class+64>, Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: stable-rt@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:25 +03:00
Marcelo Tosatti	ff8bba4038	KVM: use simple waitqueue for vcpu->wq The problem: On -RT, an emulated LAPIC timer instances has the following path: 1) hard interrupt 2) ksoftirqd is scheduled 3) ksoftirqd wakes up vcpu thread 4) vcpu thread is scheduled This extra context switch introduces unnecessary latency in the LAPIC path for a KVM guest. The solution: Allow waking up vcpu thread from hardirq context, thus avoiding the need for ksoftirqd to be scheduled. Normal waitqueues make use of spinlocks, which on -RT are sleepable locks. Therefore, waking up a waitqueue waiter involves locking a sleeping lock, which is not allowed from hard interrupt context. cyclictest command line: This patch reduces the average latency in my tests from 14us to 11us. Cc: stable-rt@vger.kernel.org Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:25 +03:00
Marcelo Tosatti	74bfead8bc	KVM: lapic: mark LAPIC timer handler as irqsafe Since lapic timer handler only wakes up a simple waitqueue, it can be executed from hardirq context. Also handle the case where hrtimer_start_expires fails due to -ETIME, by injecting the interrupt to the guest immediately. Reduces average cyclictest latency by 3us. Cc: stable-rt@vger.kernel.org Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:25 +03:00
Sebastian Andrzej Siewior	4ff6b67ad1	kernel/irq_work: fix no_hz deadlock Invoking NO_HZ's irq_work callback from timer irq is not working very well if the callback decides to invoke hrtimer_cancel(): \|hrtimer_try_to_cancel+0x55/0x5f \|hrtimer_cancel+0x16/0x28 \|tick_nohz_restart+0x17/0x72 \|__tick_nohz_full_check+0x8e/0x93 \|nohz_full_kick_work_func+0xe/0x10 \|irq_work_run_list+0x39/0x57 \|irq_work_tick+0x60/0x67 \|update_process_times+0x57/0x67 \|tick_sched_handle+0x4a/0x59 \|tick_sched_timer+0x3b/0x64 \|__run_hrtimer+0x7a/0x149 \|hrtimer_interrupt+0x1cc/0x2c5 and here we deadlock while waiting for the lock which we are holding. To fix this I'm doing the same thing that upstream is doing: is the irq_work dedicated IRQ and use it only for what is marked as "hirq" which should only be the FULL_NO_HZ related work. Cc: stable-rt@vger.kernel.org Reported-by: Carsten Emde <C.Emde@osadl.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> [ Added back in_irq() check for non PREEMPT_RT configs ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:25 +03:00
Kevin Hao	085ce53a6b	netpoll: guard the access to dev->npinfo with rcu_read_lock/unlock_bh() for CONFIG_PREEMPT_RT_FULL=y For vanilla kernel we don't need to invoke rcu_read_lock/unlock_bh explicitly to mark an RCU-bh critical section in the softirq context because bh is already disabled in this case. But for a rt kernel, the commit ("rcu: Merge RCU-bh into RCU-preempt") implements the RCU-bh in term of RCU-preempt. So we have to use rcu_read_lock/unlock_bh() to mark an RCU-bh critical section even in a softirq context. Otherwise we will get a call trace like this: include/linux/netpoll.h:90 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 1 lock held by irq/177-eth0_g0/129: #0: (&per_cpu(local_softirq_locks[i], __cpu).lock){+.+...}, at: [<8002f544>] do_current_softirqs+0x12c/0x5ec stack backtrace: CPU: 0 PID: 129 Comm: irq/177-eth0_g0 Not tainted 3.14.23 #11 [<80018c0c>] (unwind_backtrace) from [<800138b0>] (show_stack+0x20/0x24) [<800138b0>] (show_stack) from [<8075c3bc>] (dump_stack+0x84/0xd0) [<8075c3bc>] (dump_stack) from [<8008111c>] (lockdep_rcu_suspicious+0xe8/0x11c) [<8008111c>] (lockdep_rcu_suspicious) from [<805e94e8>] (dev_gro_receive+0x240/0x724) [<805e94e8>] (dev_gro_receive) from [<805e9c34>] (napi_gro_receive+0x3c/0x1e8) [<805e9c34>] (napi_gro_receive) from [<804b01ac>] (gfar_clean_rx_ring+0x2d4/0x624) [<804b01ac>] (gfar_clean_rx_ring) from [<804b078c>] (gfar_poll_rx_sq+0x58/0xe8) [<804b078c>] (gfar_poll_rx_sq) from [<805eada8>] (net_rx_action+0x1c8/0x418) [<805eada8>] (net_rx_action) from [<8002f62c>] (do_current_softirqs+0x214/0x5ec) [<8002f62c>] (do_current_softirqs) from [<8002fa88>] (__local_bh_enable+0x84/0x9c) [<8002fa88>] (__local_bh_enable) from [<8002fab8>] (local_bh_enable+0x18/0x1c) [<8002fab8>] (local_bh_enable) from [<80093924>] (irq_forced_thread_fn+0x50/0x74) [<80093924>] (irq_forced_thread_fn) from [<80093c30>] (irq_thread+0x158/0x1c4) [<80093c30>] (irq_thread) from [<800555b8>] (kthread+0xd4/0xe8) [<800555b8>] (kthread) from [<8000ee88>] (ret_from_fork+0x14/0x20) Signed-off-by: Kevin Hao <kexin.hao@windriver.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:25 +03:00
Steven Rostedt (Red Hat)	f13b372196	Revert "timers: do not raise softirq unconditionally" This reverts commit 891f510568343d93c5aa2f477b6bebe009b48f05. An issue arisen that if a rt_mutex (spin_lock converted to a mutex in PREEMPT_RT) is taken in hard interrupt context, it could cause a false deadlock detection and trigger a BUG_ON() from the return value of task_blocks_on_rt_mutex() in rt_spin_lock_slowlock(). The problem is this: CPU0 CPU1 ---- ---- spin_lock(A) spin_lock(A) [ blocks, but spins as owner on CPU 0 is running ] <interrupt> spin_trylock(B) [ succeeds ] spin_lock(B) <blocks> Now the deadlock detection triggers and follows the locking: Task X (on CPU0) blocked on spinlock B owned by task Y on CPU1 (via the interrupt taking it with a try lock) The owner of B (Y) is blocked on spin_lock A (still spinning) A is owned by task X (self). DEADLOCK detected! BUG_ON triggered. This was caused by the code to try to not raise softirq unconditionally to allow NO_HZ_FULL to work. Unfortunately, reverting that patch causes NO_HZ_FULL to break again, but that's still better than triggering a BUG_ON(). Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Conflicts: kernel/timer.c	2020-10-14 00:59:25 +03:00
Mike Galbraith	d8b0072a62	fs,btrfs: fix rt deadlock on extent_buffer->lock Sat Jul 14 12:30:41 CEST 2012 Trivially repeatable deadlock is cured by enabling lockdep code in btrfs_clear_path_blocking() as suggested by Chris Mason. He also suggested restricting blocking reader count to one, and not allowing a spinning reader while blocking reader exists. This has proven to be unnecessary, the strict lock order enforcement is enough.. or rather that's my box's opinion after long hours of hard pounding. Note: extent-tree.c bit is additional recommendation from Chris Mason, split into a separate patch after discussion. Link: http://lkml.kernel.org/r/1414913478.5380.114.camel@marge.simpson.net Cc: linux-rt-users <linux-rt-users@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Carsten Emde <C.Emde@osadl.org> Cc: John Kacur <jkacur@redhat.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Clark Williams <williams@redhat.com> Cc: Chris Mason <chris.mason@fusionio.com> Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:25 +03:00
Steven Rostedt (Red Hat)	94120a5406	staging: Mark rtl8821ae as broken rtl8821ae does not build with a make allmodconfig. Mark it as broken. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:25 +03:00
Paul E. McKenney	0df8511870	timers: Reduce future __run_timers() latency for first add to empty list upstream commit: `18d8cb64c9` The __run_timers() function currently steps through the list one jiffy at a time in order to update the timer wheel. However, if the timer wheel is empty, no adjustment is needed other than updating ->timer_jiffies. Therefore, just before we add a timer to an empty timer wheel, we should mark the timer wheel as being up to date. This marking will reduce (and perhaps eliminate) the jiffy-stepping that a future __run_timers() call will need to do in response to some future timer posting or migration. This commit therefore updates ->timer_jiffies for this case. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Tested-by: Mike Galbraith <bitbucket@online.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:25 +03:00
Paul E. McKenney	eaf6ec6140	timers: Reduce future __run_timers() latency for newly emptied list upstream commit: `16d937f880` The __run_timers() function currently steps through the list one jiffy at a time in order to update the timer wheel. However, if the timer wheel is empty, no adjustment is needed other than updating ->timer_jiffies. Therefore, if we just emptied the timer wheel, for example, by deleting the last timer, we should mark the timer wheel as being up to date. This marking will reduce (and perhaps eliminate) the jiffy-stepping that a future __run_timers() call will need to do in response to some future timer posting or migration. This commit therefore catches ->timer_jiffies for this case. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Tested-by: Mike Galbraith <bitbucket@online.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:25 +03:00
Paul E. McKenney	4eb7482303	timers: Reduce __run_timers() latency for empty list upstream commit: `d550e81dc0` The __run_timers() function currently steps through the list one jiffy at a time in order to update the timer wheel. However, if the timer wheel is empty, no adjustment is needed other than updating ->timer_jiffies. In this case, which is likely to be common for NO_HZ_FULL kernels, the kernel currently incurs a large latency for no good reason. This commit therefore short-circuits this case. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Tested-by: Mike Galbraith <bitbucket@online.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:25 +03:00
Paul E. McKenney	6b698aef1a	timers: Track total number of timers in list upstream commit: `fff421580f` Currently, the tvec_base structure's ->active_timers field tracks only the non-deferrable timers, which means that even if ->active_timers is zero, there might well be deferrable timers in the list. This commit therefore adds an ->all_timers field to track all the timers, whether deferrable or not. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Tested-by: Mike Galbraith <bitbucket@online.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:25 +03:00
Sebastian Andrzej Siewior	0c60b58e1f	fs/aio: simple simple work \|BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:768 \|in_atomic(): 1, irqs_disabled(): 0, pid: 26, name: rcuos/2 \|2 locks held by rcuos/2/26: \| #0: (rcu_callback){.+.+..}, at: [<ffffffff810b1a12>] rcu_nocb_kthread+0x1e2/0x380 \| #1: (rcu_read_lock_sched){.+.+..}, at: [<ffffffff812acd26>] percpu_ref_kill_rcu+0xa6/0x1c0 \|Preemption disabled at:[<ffffffff810b1a93>] rcu_nocb_kthread+0x263/0x380 \|Call Trace: \| [<ffffffff81582e9e>] dump_stack+0x4e/0x9c \| [<ffffffff81077aeb>] __might_sleep+0xfb/0x170 \| [<ffffffff81589304>] rt_spin_lock+0x24/0x70 \| [<ffffffff811c5790>] free_ioctx_users+0x30/0x130 \| [<ffffffff812ace34>] percpu_ref_kill_rcu+0x1b4/0x1c0 \| [<ffffffff810b1a93>] rcu_nocb_kthread+0x263/0x380 \| [<ffffffff8106e046>] kthread+0xd6/0xf0 \| [<ffffffff81591eec>] ret_from_fork+0x7c/0xb0 replace this preempt_disable() friendly swork. Cc: stable-rt@vger.kernel.org Reported-By: Mike Galbraith <umgwanakikbuti@gmail.com> Suggested-by: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:24 +03:00
Josh Cartwright	3a2b35dd64	lockdep: selftest: fix warnings due to missing PREEMPT_RT conditionals "lockdep: Selftest: Only do hardirq context test for raw spinlock" disabled the execution of certain tests with PREEMPT_RT_FULL, but did not prevent the tests from still being defined. This leads to warnings like: ./linux/lib/locking-selftest.c:574:1: warning: 'irqsafe1_hard_rlock_12' defined but not used [-Wunused-function] ./linux/lib/locking-selftest.c:574:1: warning: 'irqsafe1_hard_rlock_21' defined but not used [-Wunused-function] ./linux/lib/locking-selftest.c:577:1: warning: 'irqsafe1_hard_wlock_12' defined but not used [-Wunused-function] ./linux/lib/locking-selftest.c:577:1: warning: 'irqsafe1_hard_wlock_21' defined but not used [-Wunused-function] ./linux/lib/locking-selftest.c:580:1: warning: 'irqsafe1_soft_spin_12' defined but not used [-Wunused-function] ... Fixed by wrapping the test definitions in #ifndef CONFIG_PREEMPT_RT_FULL conditionals. Cc: stable-rt@vger.kernel.org Signed-off-by: Josh Cartwright <josh.cartwright@ni.com> Signed-off-by: Xander Huff <xander.huff@ni.com> Acked-by: Gratian Crisan <gratian.crisan@ni.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:24 +03:00
Daniel Wagner	2207ec6666	thermal: Defer thermal wakups to threads On RT the spin lock in pkg_temp_thermal_platfrom_thermal_notify will call schedule while we run in irq context. [<ffffffff816850ac>] dump_stack+0x4e/0x8f [<ffffffff81680f7d>] __schedule_bug+0xa6/0xb4 [<ffffffff816896b4>] __schedule+0x5b4/0x700 [<ffffffff8168982a>] schedule+0x2a/0x90 [<ffffffff8168a8b5>] rt_spin_lock_slowlock+0xe5/0x2d0 [<ffffffff8168afd5>] rt_spin_lock+0x25/0x30 [<ffffffffa03a7b75>] pkg_temp_thermal_platform_thermal_notify+0x45/0x134 [x86_pkg_temp_thermal] [<ffffffff8103d4db>] ? therm_throt_process+0x1b/0x160 [<ffffffff8103d831>] intel_thermal_interrupt+0x211/0x250 [<ffffffff8103d8c1>] smp_thermal_interrupt+0x21/0x40 [<ffffffff8169415d>] thermal_interrupt+0x6d/0x80 Let's defer the work to a kthread. Cc: stable-rt@vger.kernel.org Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> [bigeasy: reoder init/denit position. TODO: flush swork on exit] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:24 +03:00
Mike Galbraith	faa7968040	locking: ww_mutex: fix ww_mutex vs self-deadlock If the caller already holds the mutex, task_blocks_on_rt_mutex() returns -EDEADLK, we proceed directly to rt_mutex_handle_deadlock() where it's instant game over. Let ww_mutexes return EDEADLK/EALREADY as they want to instead. Cc: stable-rt@vger.kernel.org Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:24 +03:00
Sebastian Andrzej Siewior	d12e92f871	Revert "rwsem-rt: Do not allow readers to nest" This behaviour is required by cpufreq and its logic is "okay": It does a read_lock followed by a try_read_lock. Lockdep warns if one try a read_lock twice in -RT and vanilla so it should be good. We still only allow multiple readers as long as it is in the same process. Cc: stable-rt@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:24 +03:00
Mike Galbraith	b54201289c	sunrpc: make svc_xprt_do_enqueue() use get_cpu_light() \|BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:915 \|in_atomic(): 1, irqs_disabled(): 0, pid: 3194, name: rpc.nfsd \|Preemption disabled at:[<ffffffffa06bf0bb>] svc_xprt_received+0x4b/0xc0 [sunrpc] \|CPU: 6 PID: 3194 Comm: rpc.nfsd Not tainted 3.18.7-rt1 #9 \|Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.404 11/06/2014 \| ffff880409630000 ffff8800d9a33c78 ffffffff815bdeb5 0000000000000002 \| 0000000000000000 ffff8800d9a33c98 ffffffff81073c86 ffff880408dd6008 \| ffff880408dd6000 ffff8800d9a33cb8 ffffffff815c3d84 ffff88040b3ac000 \|Call Trace: \| [<ffffffff815bdeb5>] dump_stack+0x4f/0x9e \| [<ffffffff81073c86>] __might_sleep+0xe6/0x150 \| [<ffffffff815c3d84>] rt_spin_lock+0x24/0x50 \| [<ffffffffa06beec0>] svc_xprt_do_enqueue+0x80/0x230 [sunrpc] \| [<ffffffffa06bf0bb>] svc_xprt_received+0x4b/0xc0 [sunrpc] \| [<ffffffffa06c03ed>] svc_add_new_perm_xprt+0x6d/0x80 [sunrpc] \| [<ffffffffa06b2693>] svc_addsock+0x143/0x200 [sunrpc] \| [<ffffffffa072e69c>] write_ports+0x28c/0x340 [nfsd] \| [<ffffffffa072d2ac>] nfsctl_transaction_write+0x4c/0x80 [nfsd] \| [<ffffffff8117ee83>] vfs_write+0xb3/0x1d0 \| [<ffffffff8117f889>] SyS_write+0x49/0xb0 \| [<ffffffff815c4556>] system_call_fastpath+0x16/0x1b Cc: stable-rt@vger.kernel.org Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:24 +03:00
Daniel Wagner	7dafcfee7b	work-simple: Simple work queue implemenation Provides a framework for enqueuing callbacks from irq context PREEMPT_RT_FULL safe. The callbacks are executed in kthread context. Bases on wait-simple. Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:24 +03:00
Mike Galbraith	d9859c5f3a	scheduling while atomic in cgroup code mm, memcg: make refill_stock() use get_cpu_light() Nikita reported the following memcg scheduling while atomic bug: Call Trace: [e22d5a90] [c0007ea8] show_stack+0x4c/0x168 (unreliable) [e22d5ad0] [c0618c04] __schedule_bug+0x94/0xb0 [e22d5ae0] [c060b9ec] __schedule+0x530/0x550 [e22d5bf0] [c060bacc] schedule+0x30/0xbc [e22d5c00] [c060ca24] rt_spin_lock_slowlock+0x180/0x27c [e22d5c70] [c00b39dc] res_counter_uncharge_until+0x40/0xc4 [e22d5ca0] [c013ca88] drain_stock.isra.20+0x54/0x98 [e22d5cc0] [c01402ac] __mem_cgroup_try_charge+0x2e8/0xbac [e22d5d70] [c01410d4] mem_cgroup_charge_common+0x3c/0x70 [e22d5d90] [c0117284] __do_fault+0x38c/0x510 [e22d5df0] [c011a5f4] handle_pte_fault+0x98/0x858 [e22d5e50] [c060ed08] do_page_fault+0x42c/0x6fc [e22d5f40] [c000f5b4] handle_page_fault+0xc/0x80 What happens: refill_stock() get_cpu_var() drain_stock() res_counter_uncharge() res_counter_uncharge_until() spin_lock() <== boom Fix it by replacing get/put_cpu_var() with get/put_cpu_light(). Cc: stable-rt@vger.kernel.org Reported-by: Nikita Yushchenko <nyushchenko@dev.rtsoft.ru> Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:24 +03:00
Paul Gortmaker	230f7e0d38	sas-ata/isci: dont't disable interrupts in qc_issue handler On 3.14-rt we see the following trace on Canoe Pass for SCSI_ISCI "Intel(R) C600 Series Chipset SAS Controller" when the sas qc_issue handler is run: BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:905 in_atomic(): 0, irqs_disabled(): 1, pid: 432, name: udevd CPU: 11 PID: 432 Comm: udevd Not tainted 3.14.28-rt22 #2 Hardware name: Intel Corporation S2600CP/S2600CP, BIOS SE5C600.86B.02.01.0002.082220131453 08/22/2013 ffff880fab500000 ffff880fa9f239c0 ffffffff81a2d273 0000000000000000 ffff880fa9f239d8 ffffffff8107f023 ffff880faac23dc0 ffff880fa9f239f0 ffffffff81a33cc0 ffff880faaeb1400 ffff880fa9f23a40 ffffffff815de891 Call Trace: [<ffffffff81a2d273>] dump_stack+0x4e/0x7a [<ffffffff8107f023>] __might_sleep+0xe3/0x160 [<ffffffff81a33cc0>] rt_spin_lock+0x20/0x50 [<ffffffff815de891>] isci_task_execute_task+0x171/0x2f0 <----- [<ffffffff815cfecb>] sas_ata_qc_issue+0x25b/0x2a0 [<ffffffff81606363>] ata_qc_issue+0x1f3/0x370 [<ffffffff8160c600>] ? ata_scsi_invalid_field+0x40/0x40 [<ffffffff8160c8f5>] ata_scsi_translate+0xa5/0x1b0 [<ffffffff8160efc6>] ata_sas_queuecmd+0x86/0x280 [<ffffffff815ce446>] sas_queuecommand+0x196/0x230 [<ffffffff81081fad>] ? get_parent_ip+0xd/0x50 [<ffffffff815b05a4>] scsi_dispatch_cmd+0xb4/0x210 [<ffffffff815b7744>] scsi_request_fn+0x314/0x530 and gdb shows: (gdb) list * isci_task_execute_task+0x171 0xffffffff815ddfb1 is in isci_task_execute_task (drivers/scsi/isci/task.c:138). 133 dev_dbg(&ihost->pdev->dev, "%s: num=%d\n", __func__, num); 134 135 for_each_sas_task(num, task) { 136 enum sci_status status = SCI_FAILURE; 137 138 spin_lock_irqsave(&ihost->scic_lock, flags); <----- 139 idev = isci_lookup_device(task->dev); 140 io_ready = isci_device_io_ready(idev, task); 141 tag = isci_alloc_tag(ihost); 142 spin_unlock_irqrestore(&ihost->scic_lock, flags); (gdb) In addition to the scic_lock, the function also contains locking of the task_state_lock -- which is clearly not a candidate for raw lock conversion. As can be seen by the comment nearby, we really should be running the qc_issue code with interrupts enabled anyway. Cc: stable-rt@vger.kernel.org Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:24 +03:00
Yang Shi	7768fdc079	mips: rt: Replace pagefault_* to raw version In k{un}map_coherent, pagefault_disable and pagefault_enable are called respectively, but k{un}map_coherent needs preempt disabled according to commit `f8829caee3` ("[MIPS] Fix aliasing bug in copy_to_user_page / copy_from_user_page") to avoid dcache alias on COW. k{un}map_coherent are just called when cpu_has_dc_aliases == 1 with VIPT cache. However, actually, the most modern MIPS processors have PIPT dcache without dcache alias issue. In such case, k{un}map_atomic will be called with preempt enabled. To fix this, we replace pagefault_* to raw version in k{un}map_coherent, which disables preempt, otherwise the following kernel panic may be caught: CPU 0 Unable to handle kernel paging request at virtual address fffffffffffd5000, epc == ffffffff80122c00, ra == ffffffff8011fbcc Oops[#1]: CPU: 0 PID: 409 Comm: runltp Not tainted 3.14.17-rt5 #1 task: 980000000fa936f0 ti: 980000000eed0000 task.ti: 980000000eed0000 $ 0 : 0000000000000000 000000001400a4e1 fffffffffffd5000 0000000000000001 $ 4 : 980000000cded000 fffffffffffd5000 980000000cdedf00 ffffffffffff00fe $ 8 : 0000000000000000 ffffffffffffff00 000000000000000d 0000000000000004 $12 : 980000000eed3fe0 000000000000a400 ffffffffa00ae278 0000000000000000 $16 : 980000000cded000 000000726eb855c8 98000000012ccfe8 ffffffff8095e0c0 $20 : ffffffff80ad0000 ffffffff8095e0c0 98000000012d0bd8 980000000fb92000 $24 : 0000000000000000 ffffffff80177fb0 $28 : 980000000eed0000 980000000eed3b60 980000000fb92060 ffffffff8011fbcc Hi : 000000000002cb02 Lo : 000000000000ee56 epc : ffffffff80122c00 copy_page+0x38/0x548 Not tainted ra : ffffffff8011fbcc copy_user_highpage+0x16c/0x180 Status: 1400a4e3 KX SX UX KERNEL EXL IE Cause : 10800408 BadVA : fffffffffffd5000 PrId : 00010000 (MIPS64R2-generic) Modules linked in: i2c_piix4 i2c_core uhci_hcd Process runltp (pid: 409, threadinfo=980000000eed0000, task=980000000fa936f0, tls=000000fff7756700) Stack : 98000000012ccfe8 980000000eeb7ba8 980000000ecc7508 000000000666da5b 000000726eb855c8 ffffffff802156e0 000000726ea4a000 98000000010007e0 980000000fb92060 0000000000000000 0000000000000000 6db6db6db6db6db7 0000000000000080 000000726eb855c8 980000000fb92000 980000000eeeec28 980000000ecc7508 980000000fb92060 0000000000000001 00000000000000a9 ffffffff80995e60 ffffffff80218910 000000001400a4e0 ffffffff804efd24 980000000ee25b90 ffffffff8079cec4 ffffffff8079d49c ffffffff80979658 000000000666da5b 980000000eeb7ba8 000000726eb855c8 00000000000000a9 980000000fb92000 980000000fa936f0 980000000eed3eb0 0000000000000001 980000000fb92088 0000000000030002 980000000ecc7508 ffffffff8011ecd0 ... Call Trace: [<ffffffff80122c00>] copy_page+0x38/0x548 [<ffffffff8011fbcc>] copy_user_highpage+0x16c/0x180 [<ffffffff802156e0>] do_wp_page+0x658/0xcd8 [<ffffffff80218910>] handle_mm_fault+0x7d8/0x1070 [<ffffffff8011ecd0>] __do_page_fault+0x1a0/0x508 [<ffffffff80104d84>] resume_userspace_check+0x0/0x10 Or there may be random segmentation fault happened. Cc: stable-rt@vger.kernel.org Signed-off-by: Yang Shi <yang.shi@windriver.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:24 +03:00
Yong Zhang	879e33f32e	ARM: cmpxchg: define __HAVE_ARCH_CMPXCHG for armv6 and later Both pi_stress and sigwaittest in rt-test show performance gain with __HAVE_ARCH_CMPXCHG. Testing result on coretile_express_a9x4: pi_stress -p 99 --duration=300 (on linux-3.4-rc5; bigger is better) vanilla: Total inversion performed: 5493381 patched: Total inversion performed: 5621746 sigwaittest -p 99 -l 100000 (on linux-3.4-rc5-rt6; less is better) 3.4-rc5-rt6: Min 24, Cur 27, Avg 30, Max 98 patched: Min 19, Cur 21, Avg 23, Max 96 Signed-off-by: Yong Zhang <yong.zhang0 at gmail.com> Cc: Russell King <rmk+kernel at arm.linux.org.uk> Cc: Nicolas Pitre <nico at linaro.org> Cc: Will Deacon <will.deacon at arm.com> Cc: Catalin Marinas <catalin.marinas at arm.com> Cc: Thomas Gleixner <tglx at linutronix.de> Cc: linux-arm-kernel at lists.infradead.org Cc: stable-rt@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:24 +03:00
Sebastian Andrzej Siewior	9e1ba85f8d	arm/futex: disable preemption during futex_atomic_cmpxchg_inatomic() The ARM UP implementation of futex_atomic_cmpxchg_inatomic() assumes that pagefault_disable() inherits a preempt disabled section. This assumtion is true for mainline but -RT reverts this and allows preemption in pagefault disabled regions. The code sequence of futex_atomic_cmpxchg_inatomic(): \| x = futex; \| if (x == oldval) \| futex = newval; The problem occurs if the code is preempted after reading the futex value or after comparing it with x. While preempted, the futex owner has to be scheduled which then releases the lock (in userland because it has no waiter yet). Once the code is back on the CPU, it overwrites the futex value with with the old PID and the waiter bit set. The workaround is to explicit disable code preemption to avoid the described race window. Debugged-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:24 +03:00
Yadi.hu	6876e235a9	ARM: enable irq in translation/section permission fault handlers Probably happens on all ARM, with CONFIG_PREEMPT_RT_FULL CONFIG_DEBUG_ATOMIC_SLEEP This simple program.... int main() { ((char)0xc0001000) = 0; }; [ 512.742724] BUG: sleeping function called from invalid context at kernel/rtmutex.c:658 [ 512.743000] in_atomic(): 0, irqs_disabled(): 128, pid: 994, name: a [ 512.743217] INFO: lockdep is turned off. [ 512.743360] irq event stamp: 0 [ 512.743482] hardirqs last enabled at (0): [< (null)>] (null) [ 512.743714] hardirqs last disabled at (0): [<c0426370>] copy_process+0x3b0/0x11c0 [ 512.744013] softirqs last enabled at (0): [<c0426370>] copy_process+0x3b0/0x11c0 [ 512.744303] softirqs last disabled at (0): [< (null)>] (null) [ 512.744631] [<c041872c>] (unwind_backtrace+0x0/0x104) [ 512.745001] [<c09af0c4>] (dump_stack+0x20/0x24) [ 512.745355] [<c0462490>] (__might_sleep+0x1dc/0x1e0) [ 512.745717] [<c09b6770>] (rt_spin_lock+0x34/0x6c) [ 512.746073] [<c0441bf0>] (do_force_sig_info+0x34/0xf0) [ 512.746457] [<c0442668>] (force_sig_info+0x18/0x1c) [ 512.746829] [<c041d880>] (__do_user_fault+0x9c/0xd8) [ 512.747185] [<c041d938>] (do_bad_area+0x7c/0x94) [ 512.747536] [<c041d990>] (do_sect_fault+0x40/0x48) [ 512.747898] [<c040841c>] (do_DataAbort+0x40/0xa0) [ 512.748181] Exception stack(0xecaa1fb0 to 0xecaa1ff8) Oxc0000000 belongs to kernel address space, user task can not be allowed to access it. For above condition, correct result is that test case should receive a “segment fault” and exits but not stacks. the root cause is commit `02fe2845d6` ("avoid enabling interrupts in prefetch/data abort handlers"),it deletes irq enable block in Data abort assemble code and move them into page/breakpiont/alignment fault handlers instead. But author does not enable irq in translation/section permission fault handlers. ARM disables irq when it enters exception/ interrupt mode, if kernel doesn't enable irq, it would be still disabled during translation/section permission fault. We see the above splat because do_force_sig_info is still called with IRQs off, and that code eventually does a: spin_lock_irqsave(&t->sighand->siglock, flags); As this is architecture independent code, and we've not seen any other need for other arch to have the siglock converted to raw lock, we can conclude that we should enable irq for ARM translation/section permission exception. Cc: stable-rt@vger.kernel.org Signed-off-by: Yadi.hu <yadi.hu@windriver.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:24 +03:00
Mike Galbraith	53d7783841	x86: UV: raw_spinlock conversion Shrug. Lots of hobbyists have a beast in their basement, right? Cc: stable-rt@vger.kernel.org Signed-off-by: Mike Galbraith <mgalbraith@suse.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:24 +03:00
Gustavo Bittencourt	dc678d68b7	rtmutex: enable deadlock detection in ww_mutex_lock functions The functions ww_mutex_lock_interruptible and ww_mutex_lock should return -EDEADLK when faced with a deadlock. To do so, the paramenter detect_deadlock in rt_mutex_slowlock must be TRUE. This patch corrects potential deadlocks when running PREEMPT_RT with nouveau driver. Cc: stable-rt@vger.kernel.org Signed-off-by: Gustavo Bittencourt <gbitten@gmail.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:24 +03:00
Mike Galbraith	0d54782ed8	rt,locking: fix __ww_mutex_lock_interruptible() lockdep annotation Using mutex_acquire_nest() as used in __ww_mutex_lock() fixes the splat below. Remove superfluous line break in __ww_mutex_lock() as well. \|============================================= \|[ INFO: possible recursive locking detected ] \|3.14.4-rt5 #26 Not tainted \|--------------------------------------------- \|Xorg/4298 is trying to acquire lock: \| (reservation_ww_class_mutex){+.+.+.}, at: [<ffffffffa02b4270>] nouveau_gem_ioctl_pushbuf+0x870/0x19f0 [nouveau] \|but task is already holding lock: \| (reservation_ww_class_mutex){+.+.+.}, at: [<ffffffffa02b4270>] nouveau_gem_ioctl_pushbuf+0x870/0x19f0 [nouveau] \|other info that might help us debug this: \| Possible unsafe locking scenario: \| CPU0 \| ---- \| lock(reservation_ww_class_mutex); \| lock(reservation_ww_class_mutex); \| \| * DEADLOCK * \| \| May be due to missing lock nesting notation \| \|3 locks held by Xorg/4298: \| #0: (&cli->mutex){+.+.+.}, at: [<ffffffffa02b597b>] nouveau_abi16_get+0x2b/0x100 [nouveau] \| #1: (reservation_ww_class_acquire){+.+...}, at: [<ffffffffa0160cd2>] drm_ioctl+0x4d2/0x610 [drm] \| #2: (reservation_ww_class_mutex){+.+.+.}, at: [<ffffffffa02b4270>] nouveau_gem_ioctl_pushbuf+0x870/0x19f0 [nouveau] Cc: stable-rt@vger.kernel.org Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:24 +03:00
Brad Mouring	c9ac083943	rtmutex.c: Fix incorrect waiter check In task_blocks_on_lock, there's a null check on pi_blocked_on of the task_struct. This pointer can encode the fact that the task that contains the pointer is waking (preventing requeuing) and therefore is non-null. Use the inline function to avoid dereferencing an invalid "pointer" Signed-off-by: Brad Mouring <brad.mouring@ni.com> Reported-by: Ben Shelton <ben.shelton@ni.com> Reviewed-by: T Makphaibulchoke <tmac@hp.com> Tested-by: T Makphaibulchoke <tmac@hp.com> Cc: stable-rt@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:24 +03:00
Sebastian Andrzej Siewior	7ffb28ebaa	locking/rt-mutex: avoid a NULL pointer dereference on deadlock With task_blocks_on_rt_mutex() returning early -EDEADLK we never add the waiter to the waitqueue. Later, we try to remove it via remove_waiter() and go boom in rt_mutex_top_waiter() because rb_entry() gives a NULL pointer. Tested on v3.18-RT where rtmutex is used for regular mutex and I tried to get one twice in a row. Not sure when this started but I guess `397335f00` ("rtmutex: Fix deadlock detector for real") or commit `3d5c9340` ("rtmutex: Handle deadlock detection smarter"). Cc: stable-rt@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:24 +03:00
Thomas Gleixner	ef453cd0d4	futex: Simplify futex_lock_pi_atomic() and make it more robust upstream commit: `af54d6a1c3` futex_lock_pi_atomic() is a maze of retry hoops and loops. Reduce it to simple and understandable states: First step is to lookup existing waiters (state) in the kernel. If there is an existing waiter, validate it and attach to it. If there is no existing waiter, check the user space value If the TID encoded in the user space value is 0, take over the futex preserving the owner died bit. If the TID encoded in the user space value is != 0, lookup the owner task, validate it and attach to it. Reduces text size by 128 bytes on x8664. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Kees Cook <kees@outflux.net> Cc: wad@chromium.org Cc: Darren Hart <darren@dvhart.com> Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1406131137020.5170@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:24 +03:00
Thomas Gleixner	c3cefde051	futex: Split out the first waiter attachment from lookup_pi_state() upstream commit: `04e1b2e52b` We want to be a bit more clever in futex_lock_pi_atomic() and separate the possible states. Split out the code which attaches the first waiter to the owner into a separate function. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Darren Hart <darren@dvhart.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Kees Cook <kees@outflux.net> Cc: wad@chromium.org Link: http://lkml.kernel.org/r/20140611204237.271300614@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:24 +03:00
Thomas Gleixner	16268bd779	futex: Split out the waiter check from lookup_pi_state() upstream commit: `e60cbc5cea` We want to be a bit more clever in futex_lock_pi_atomic() and separate the possible states. Split out the waiter verification into a separate function. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Darren Hart <darren@dvhart.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Kees Cook <kees@outflux.net> Cc: wad@chromium.org Link: http://lkml.kernel.org/r/20140611204237.180458410@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:24 +03:00
Thomas Gleixner	9e62263a76	futex: Use futex_top_waiter() in lookup_pi_state() upstream commit: `bd1dbcc67c` No point in open coding the same function again. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Darren Hart <darren@dvhart.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Kees Cook <kees@outflux.net> Cc: wad@chromium.org Link: http://lkml.kernel.org/r/20140611204237.092947239@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:24 +03:00
Thomas Gleixner	5e3746259e	futex: Make unlock_pi more robust upstream commit: `ccf9e6a80d` The kernel tries to atomically unlock the futex without checking whether there is kernel state associated to the futex. So if user space manipulated the user space value, this will leave kernel internal state around associated to the owner task. For robustness sake, lookup first whether there are waiters on the futex. If there are waiters, wake the top priority waiter with all the proper sanity checks applied. If there are no waiters, do the atomic release. We do not have to preserve the waiters bit in this case, because a potentially incoming waiter is blocked on the hb->lock and will acquire the futex atomically. We neither have to preserve the owner died bit. The caller is the owner and it was supposed to cleanup the mess. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Darren Hart <darren@dvhart.com> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Kees Cook <kees@outflux.net> Cc: wad@chromium.org Link: http://lkml.kernel.org/r/20140611204237.016987332@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:24 +03:00
Thomas Gleixner	41e57dbd40	rtmutex: Avoid pointless requeueing in the deadlock detection chain walk upstream commit: `67792e2cab` In case the dead lock detector is enabled we follow the lock chain to the end in rt_mutex_adjust_prio_chain, even if we could stop earlier due to the priority/waiter constellation. But once we are no longer the top priority waiter in a certain step or the task holding the lock has already the same priority then there is no point in dequeing and enqueing along the lock chain as there is no change at all. So stop the queueing at this point. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Link: http://lkml.kernel.org/r/20140522031950.280830190@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:24 +03:00
Thomas Gleixner	401bec21e8	rtmutex: Cleanup deadlock detector debug logic upstream commit: `8930ed80f9` The conditions under which deadlock detection is conducted are unclear and undocumented. Add constants instead of using 0/1 and provide a selection function which hides the additional debug dependency from the calling code. Add comments where needed. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Link: http://lkml.kernel.org/r/20140522031949.947264874@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Conflicts: kernel/locking/rtmutex.c	2020-10-14 00:59:24 +03:00
Thomas Gleixner	0f48389512	rtmutex: Confine deadlock logic to futex upstream commit: `c051b21f71` The deadlock logic is only required for futexes. Remove the extra arguments for the public functions and also for the futex specific ones which get always called with deadlock detection enabled. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Conflicts: include/linux/rtmutex.h kernel/locking/rtmutex.c	2020-10-14 00:59:23 +03:00
Thomas Gleixner	41e69f2371	rtmutex: Simplify remove_waiter() upstream commit: `1ca7b86062` Exit right away, when the removed waiter was not the top priority waiter on the lock. Get rid of the extra indent level. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Conflicts: kernel/locking/rtmutex.c	2020-10-14 00:59:23 +03:00
Thomas Gleixner	5974d85ed0	rtmutex: Document pi chain walk upstream commit: `3eb65aeadf` Add commentry to document the chain walk and the protection mechanisms and their scope. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:23 +03:00
Thomas Gleixner	af64a14546	rtmutex: Clarify the boost/deboost part upstream commit: `a57594a13a` Add a separate local variable for the boost/deboost logic to make the code more readable. Add comments where appropriate. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Conflicts: kernel/locking/rtmutex.c	2020-10-14 00:59:23 +03:00
Thomas Gleixner	5c1c8184e8	rtmutex: No need to keep task ref for lock owner check upstream commit: `2ffa5a5cd2` There is no point to keep the task ref across the check for lock owner. Drop the ref before that, so the protection context is clear. Found while documenting the chain walk. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:23 +03:00
Thomas Gleixner	b69f25004a	rtmutex: Simplify and document try_to_take_rtmutex() upstream commit: `358c331f39` The current implementation of try_to_take_rtmutex() is correct, but requires more than a single brain twist to understand the clever encoded conditionals. Untangle it and document the cases proper. Looks less efficient at the first glance, but actually reduces the binary code size on x8664 by 80 bytes. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Conflicts: kernel/locking/rtmutex.c	2020-10-14 00:59:23 +03:00
Thomas Gleixner	ba1d3cbb24	rtmutex: Simplify rtmutex_slowtrylock() upstream-commit: `88f2b4c15e` Oleg noticed that rtmutex_slowtrylock() has a pointless check for rt_mutex_owner(lock) != current. To avoid calling try_to_take_rtmutex() we really want to check whether the lock has an owner at all or whether the trylock failed because the owner is NULL, but the RT_MUTEX_HAS_WAITERS bit is set. This covers the lock is owned by caller situation as well. We can actually do this check lockless. trylock is taking a chance whether we take lock->wait_lock to do the check or not. Add comments to the function while at it. Reported-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Conflicts: kernel/locking/rtmutex.c	2020-10-14 00:59:23 +03:00

1 2 3 4 5 ...

431786 Commits All Branches Search

431786 Commits

All Branches