linux

Commit Graph

Author	SHA1	Message	Date
Paul Gortmaker	230f7e0d38	sas-ata/isci: dont't disable interrupts in qc_issue handler On 3.14-rt we see the following trace on Canoe Pass for SCSI_ISCI "Intel(R) C600 Series Chipset SAS Controller" when the sas qc_issue handler is run: BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:905 in_atomic(): 0, irqs_disabled(): 1, pid: 432, name: udevd CPU: 11 PID: 432 Comm: udevd Not tainted 3.14.28-rt22 #2 Hardware name: Intel Corporation S2600CP/S2600CP, BIOS SE5C600.86B.02.01.0002.082220131453 08/22/2013 ffff880fab500000 ffff880fa9f239c0 ffffffff81a2d273 0000000000000000 ffff880fa9f239d8 ffffffff8107f023 ffff880faac23dc0 ffff880fa9f239f0 ffffffff81a33cc0 ffff880faaeb1400 ffff880fa9f23a40 ffffffff815de891 Call Trace: [<ffffffff81a2d273>] dump_stack+0x4e/0x7a [<ffffffff8107f023>] __might_sleep+0xe3/0x160 [<ffffffff81a33cc0>] rt_spin_lock+0x20/0x50 [<ffffffff815de891>] isci_task_execute_task+0x171/0x2f0 <----- [<ffffffff815cfecb>] sas_ata_qc_issue+0x25b/0x2a0 [<ffffffff81606363>] ata_qc_issue+0x1f3/0x370 [<ffffffff8160c600>] ? ata_scsi_invalid_field+0x40/0x40 [<ffffffff8160c8f5>] ata_scsi_translate+0xa5/0x1b0 [<ffffffff8160efc6>] ata_sas_queuecmd+0x86/0x280 [<ffffffff815ce446>] sas_queuecommand+0x196/0x230 [<ffffffff81081fad>] ? get_parent_ip+0xd/0x50 [<ffffffff815b05a4>] scsi_dispatch_cmd+0xb4/0x210 [<ffffffff815b7744>] scsi_request_fn+0x314/0x530 and gdb shows: (gdb) list * isci_task_execute_task+0x171 0xffffffff815ddfb1 is in isci_task_execute_task (drivers/scsi/isci/task.c:138). 133 dev_dbg(&ihost->pdev->dev, "%s: num=%d\n", __func__, num); 134 135 for_each_sas_task(num, task) { 136 enum sci_status status = SCI_FAILURE; 137 138 spin_lock_irqsave(&ihost->scic_lock, flags); <----- 139 idev = isci_lookup_device(task->dev); 140 io_ready = isci_device_io_ready(idev, task); 141 tag = isci_alloc_tag(ihost); 142 spin_unlock_irqrestore(&ihost->scic_lock, flags); (gdb) In addition to the scic_lock, the function also contains locking of the task_state_lock -- which is clearly not a candidate for raw lock conversion. As can be seen by the comment nearby, we really should be running the qc_issue code with interrupts enabled anyway. Cc: stable-rt@vger.kernel.org Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:24 +03:00
Yang Shi	7768fdc079	mips: rt: Replace pagefault_* to raw version In k{un}map_coherent, pagefault_disable and pagefault_enable are called respectively, but k{un}map_coherent needs preempt disabled according to commit `f8829caee3` ("[MIPS] Fix aliasing bug in copy_to_user_page / copy_from_user_page") to avoid dcache alias on COW. k{un}map_coherent are just called when cpu_has_dc_aliases == 1 with VIPT cache. However, actually, the most modern MIPS processors have PIPT dcache without dcache alias issue. In such case, k{un}map_atomic will be called with preempt enabled. To fix this, we replace pagefault_* to raw version in k{un}map_coherent, which disables preempt, otherwise the following kernel panic may be caught: CPU 0 Unable to handle kernel paging request at virtual address fffffffffffd5000, epc == ffffffff80122c00, ra == ffffffff8011fbcc Oops[#1]: CPU: 0 PID: 409 Comm: runltp Not tainted 3.14.17-rt5 #1 task: 980000000fa936f0 ti: 980000000eed0000 task.ti: 980000000eed0000 $ 0 : 0000000000000000 000000001400a4e1 fffffffffffd5000 0000000000000001 $ 4 : 980000000cded000 fffffffffffd5000 980000000cdedf00 ffffffffffff00fe $ 8 : 0000000000000000 ffffffffffffff00 000000000000000d 0000000000000004 $12 : 980000000eed3fe0 000000000000a400 ffffffffa00ae278 0000000000000000 $16 : 980000000cded000 000000726eb855c8 98000000012ccfe8 ffffffff8095e0c0 $20 : ffffffff80ad0000 ffffffff8095e0c0 98000000012d0bd8 980000000fb92000 $24 : 0000000000000000 ffffffff80177fb0 $28 : 980000000eed0000 980000000eed3b60 980000000fb92060 ffffffff8011fbcc Hi : 000000000002cb02 Lo : 000000000000ee56 epc : ffffffff80122c00 copy_page+0x38/0x548 Not tainted ra : ffffffff8011fbcc copy_user_highpage+0x16c/0x180 Status: 1400a4e3 KX SX UX KERNEL EXL IE Cause : 10800408 BadVA : fffffffffffd5000 PrId : 00010000 (MIPS64R2-generic) Modules linked in: i2c_piix4 i2c_core uhci_hcd Process runltp (pid: 409, threadinfo=980000000eed0000, task=980000000fa936f0, tls=000000fff7756700) Stack : 98000000012ccfe8 980000000eeb7ba8 980000000ecc7508 000000000666da5b 000000726eb855c8 ffffffff802156e0 000000726ea4a000 98000000010007e0 980000000fb92060 0000000000000000 0000000000000000 6db6db6db6db6db7 0000000000000080 000000726eb855c8 980000000fb92000 980000000eeeec28 980000000ecc7508 980000000fb92060 0000000000000001 00000000000000a9 ffffffff80995e60 ffffffff80218910 000000001400a4e0 ffffffff804efd24 980000000ee25b90 ffffffff8079cec4 ffffffff8079d49c ffffffff80979658 000000000666da5b 980000000eeb7ba8 000000726eb855c8 00000000000000a9 980000000fb92000 980000000fa936f0 980000000eed3eb0 0000000000000001 980000000fb92088 0000000000030002 980000000ecc7508 ffffffff8011ecd0 ... Call Trace: [<ffffffff80122c00>] copy_page+0x38/0x548 [<ffffffff8011fbcc>] copy_user_highpage+0x16c/0x180 [<ffffffff802156e0>] do_wp_page+0x658/0xcd8 [<ffffffff80218910>] handle_mm_fault+0x7d8/0x1070 [<ffffffff8011ecd0>] __do_page_fault+0x1a0/0x508 [<ffffffff80104d84>] resume_userspace_check+0x0/0x10 Or there may be random segmentation fault happened. Cc: stable-rt@vger.kernel.org Signed-off-by: Yang Shi <yang.shi@windriver.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:24 +03:00
Yong Zhang	879e33f32e	ARM: cmpxchg: define __HAVE_ARCH_CMPXCHG for armv6 and later Both pi_stress and sigwaittest in rt-test show performance gain with __HAVE_ARCH_CMPXCHG. Testing result on coretile_express_a9x4: pi_stress -p 99 --duration=300 (on linux-3.4-rc5; bigger is better) vanilla: Total inversion performed: 5493381 patched: Total inversion performed: 5621746 sigwaittest -p 99 -l 100000 (on linux-3.4-rc5-rt6; less is better) 3.4-rc5-rt6: Min 24, Cur 27, Avg 30, Max 98 patched: Min 19, Cur 21, Avg 23, Max 96 Signed-off-by: Yong Zhang <yong.zhang0 at gmail.com> Cc: Russell King <rmk+kernel at arm.linux.org.uk> Cc: Nicolas Pitre <nico at linaro.org> Cc: Will Deacon <will.deacon at arm.com> Cc: Catalin Marinas <catalin.marinas at arm.com> Cc: Thomas Gleixner <tglx at linutronix.de> Cc: linux-arm-kernel at lists.infradead.org Cc: stable-rt@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:24 +03:00
Sebastian Andrzej Siewior	9e1ba85f8d	arm/futex: disable preemption during futex_atomic_cmpxchg_inatomic() The ARM UP implementation of futex_atomic_cmpxchg_inatomic() assumes that pagefault_disable() inherits a preempt disabled section. This assumtion is true for mainline but -RT reverts this and allows preemption in pagefault disabled regions. The code sequence of futex_atomic_cmpxchg_inatomic(): \| x = futex; \| if (x == oldval) \| futex = newval; The problem occurs if the code is preempted after reading the futex value or after comparing it with x. While preempted, the futex owner has to be scheduled which then releases the lock (in userland because it has no waiter yet). Once the code is back on the CPU, it overwrites the futex value with with the old PID and the waiter bit set. The workaround is to explicit disable code preemption to avoid the described race window. Debugged-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:24 +03:00
Yadi.hu	6876e235a9	ARM: enable irq in translation/section permission fault handlers Probably happens on all ARM, with CONFIG_PREEMPT_RT_FULL CONFIG_DEBUG_ATOMIC_SLEEP This simple program.... int main() { ((char)0xc0001000) = 0; }; [ 512.742724] BUG: sleeping function called from invalid context at kernel/rtmutex.c:658 [ 512.743000] in_atomic(): 0, irqs_disabled(): 128, pid: 994, name: a [ 512.743217] INFO: lockdep is turned off. [ 512.743360] irq event stamp: 0 [ 512.743482] hardirqs last enabled at (0): [< (null)>] (null) [ 512.743714] hardirqs last disabled at (0): [<c0426370>] copy_process+0x3b0/0x11c0 [ 512.744013] softirqs last enabled at (0): [<c0426370>] copy_process+0x3b0/0x11c0 [ 512.744303] softirqs last disabled at (0): [< (null)>] (null) [ 512.744631] [<c041872c>] (unwind_backtrace+0x0/0x104) [ 512.745001] [<c09af0c4>] (dump_stack+0x20/0x24) [ 512.745355] [<c0462490>] (__might_sleep+0x1dc/0x1e0) [ 512.745717] [<c09b6770>] (rt_spin_lock+0x34/0x6c) [ 512.746073] [<c0441bf0>] (do_force_sig_info+0x34/0xf0) [ 512.746457] [<c0442668>] (force_sig_info+0x18/0x1c) [ 512.746829] [<c041d880>] (__do_user_fault+0x9c/0xd8) [ 512.747185] [<c041d938>] (do_bad_area+0x7c/0x94) [ 512.747536] [<c041d990>] (do_sect_fault+0x40/0x48) [ 512.747898] [<c040841c>] (do_DataAbort+0x40/0xa0) [ 512.748181] Exception stack(0xecaa1fb0 to 0xecaa1ff8) Oxc0000000 belongs to kernel address space, user task can not be allowed to access it. For above condition, correct result is that test case should receive a “segment fault” and exits but not stacks. the root cause is commit `02fe2845d6` ("avoid enabling interrupts in prefetch/data abort handlers"),it deletes irq enable block in Data abort assemble code and move them into page/breakpiont/alignment fault handlers instead. But author does not enable irq in translation/section permission fault handlers. ARM disables irq when it enters exception/ interrupt mode, if kernel doesn't enable irq, it would be still disabled during translation/section permission fault. We see the above splat because do_force_sig_info is still called with IRQs off, and that code eventually does a: spin_lock_irqsave(&t->sighand->siglock, flags); As this is architecture independent code, and we've not seen any other need for other arch to have the siglock converted to raw lock, we can conclude that we should enable irq for ARM translation/section permission exception. Cc: stable-rt@vger.kernel.org Signed-off-by: Yadi.hu <yadi.hu@windriver.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:24 +03:00
Mike Galbraith	53d7783841	x86: UV: raw_spinlock conversion Shrug. Lots of hobbyists have a beast in their basement, right? Cc: stable-rt@vger.kernel.org Signed-off-by: Mike Galbraith <mgalbraith@suse.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:24 +03:00
Gustavo Bittencourt	dc678d68b7	rtmutex: enable deadlock detection in ww_mutex_lock functions The functions ww_mutex_lock_interruptible and ww_mutex_lock should return -EDEADLK when faced with a deadlock. To do so, the paramenter detect_deadlock in rt_mutex_slowlock must be TRUE. This patch corrects potential deadlocks when running PREEMPT_RT with nouveau driver. Cc: stable-rt@vger.kernel.org Signed-off-by: Gustavo Bittencourt <gbitten@gmail.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:24 +03:00
Mike Galbraith	0d54782ed8	rt,locking: fix __ww_mutex_lock_interruptible() lockdep annotation Using mutex_acquire_nest() as used in __ww_mutex_lock() fixes the splat below. Remove superfluous line break in __ww_mutex_lock() as well. \|============================================= \|[ INFO: possible recursive locking detected ] \|3.14.4-rt5 #26 Not tainted \|--------------------------------------------- \|Xorg/4298 is trying to acquire lock: \| (reservation_ww_class_mutex){+.+.+.}, at: [<ffffffffa02b4270>] nouveau_gem_ioctl_pushbuf+0x870/0x19f0 [nouveau] \|but task is already holding lock: \| (reservation_ww_class_mutex){+.+.+.}, at: [<ffffffffa02b4270>] nouveau_gem_ioctl_pushbuf+0x870/0x19f0 [nouveau] \|other info that might help us debug this: \| Possible unsafe locking scenario: \| CPU0 \| ---- \| lock(reservation_ww_class_mutex); \| lock(reservation_ww_class_mutex); \| \| * DEADLOCK * \| \| May be due to missing lock nesting notation \| \|3 locks held by Xorg/4298: \| #0: (&cli->mutex){+.+.+.}, at: [<ffffffffa02b597b>] nouveau_abi16_get+0x2b/0x100 [nouveau] \| #1: (reservation_ww_class_acquire){+.+...}, at: [<ffffffffa0160cd2>] drm_ioctl+0x4d2/0x610 [drm] \| #2: (reservation_ww_class_mutex){+.+.+.}, at: [<ffffffffa02b4270>] nouveau_gem_ioctl_pushbuf+0x870/0x19f0 [nouveau] Cc: stable-rt@vger.kernel.org Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:24 +03:00
Brad Mouring	c9ac083943	rtmutex.c: Fix incorrect waiter check In task_blocks_on_lock, there's a null check on pi_blocked_on of the task_struct. This pointer can encode the fact that the task that contains the pointer is waking (preventing requeuing) and therefore is non-null. Use the inline function to avoid dereferencing an invalid "pointer" Signed-off-by: Brad Mouring <brad.mouring@ni.com> Reported-by: Ben Shelton <ben.shelton@ni.com> Reviewed-by: T Makphaibulchoke <tmac@hp.com> Tested-by: T Makphaibulchoke <tmac@hp.com> Cc: stable-rt@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:24 +03:00
Sebastian Andrzej Siewior	7ffb28ebaa	locking/rt-mutex: avoid a NULL pointer dereference on deadlock With task_blocks_on_rt_mutex() returning early -EDEADLK we never add the waiter to the waitqueue. Later, we try to remove it via remove_waiter() and go boom in rt_mutex_top_waiter() because rb_entry() gives a NULL pointer. Tested on v3.18-RT where rtmutex is used for regular mutex and I tried to get one twice in a row. Not sure when this started but I guess `397335f00` ("rtmutex: Fix deadlock detector for real") or commit `3d5c9340` ("rtmutex: Handle deadlock detection smarter"). Cc: stable-rt@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:24 +03:00
Thomas Gleixner	ef453cd0d4	futex: Simplify futex_lock_pi_atomic() and make it more robust upstream commit: `af54d6a1c3` futex_lock_pi_atomic() is a maze of retry hoops and loops. Reduce it to simple and understandable states: First step is to lookup existing waiters (state) in the kernel. If there is an existing waiter, validate it and attach to it. If there is no existing waiter, check the user space value If the TID encoded in the user space value is 0, take over the futex preserving the owner died bit. If the TID encoded in the user space value is != 0, lookup the owner task, validate it and attach to it. Reduces text size by 128 bytes on x8664. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Kees Cook <kees@outflux.net> Cc: wad@chromium.org Cc: Darren Hart <darren@dvhart.com> Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1406131137020.5170@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:24 +03:00
Thomas Gleixner	c3cefde051	futex: Split out the first waiter attachment from lookup_pi_state() upstream commit: `04e1b2e52b` We want to be a bit more clever in futex_lock_pi_atomic() and separate the possible states. Split out the code which attaches the first waiter to the owner into a separate function. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Darren Hart <darren@dvhart.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Kees Cook <kees@outflux.net> Cc: wad@chromium.org Link: http://lkml.kernel.org/r/20140611204237.271300614@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:24 +03:00
Thomas Gleixner	16268bd779	futex: Split out the waiter check from lookup_pi_state() upstream commit: `e60cbc5cea` We want to be a bit more clever in futex_lock_pi_atomic() and separate the possible states. Split out the waiter verification into a separate function. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Darren Hart <darren@dvhart.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Kees Cook <kees@outflux.net> Cc: wad@chromium.org Link: http://lkml.kernel.org/r/20140611204237.180458410@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:24 +03:00
Thomas Gleixner	9e62263a76	futex: Use futex_top_waiter() in lookup_pi_state() upstream commit: `bd1dbcc67c` No point in open coding the same function again. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Darren Hart <darren@dvhart.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Kees Cook <kees@outflux.net> Cc: wad@chromium.org Link: http://lkml.kernel.org/r/20140611204237.092947239@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:24 +03:00
Thomas Gleixner	5e3746259e	futex: Make unlock_pi more robust upstream commit: `ccf9e6a80d` The kernel tries to atomically unlock the futex without checking whether there is kernel state associated to the futex. So if user space manipulated the user space value, this will leave kernel internal state around associated to the owner task. For robustness sake, lookup first whether there are waiters on the futex. If there are waiters, wake the top priority waiter with all the proper sanity checks applied. If there are no waiters, do the atomic release. We do not have to preserve the waiters bit in this case, because a potentially incoming waiter is blocked on the hb->lock and will acquire the futex atomically. We neither have to preserve the owner died bit. The caller is the owner and it was supposed to cleanup the mess. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Darren Hart <darren@dvhart.com> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Kees Cook <kees@outflux.net> Cc: wad@chromium.org Link: http://lkml.kernel.org/r/20140611204237.016987332@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:24 +03:00
Thomas Gleixner	41e57dbd40	rtmutex: Avoid pointless requeueing in the deadlock detection chain walk upstream commit: `67792e2cab` In case the dead lock detector is enabled we follow the lock chain to the end in rt_mutex_adjust_prio_chain, even if we could stop earlier due to the priority/waiter constellation. But once we are no longer the top priority waiter in a certain step or the task holding the lock has already the same priority then there is no point in dequeing and enqueing along the lock chain as there is no change at all. So stop the queueing at this point. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Link: http://lkml.kernel.org/r/20140522031950.280830190@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:24 +03:00
Thomas Gleixner	401bec21e8	rtmutex: Cleanup deadlock detector debug logic upstream commit: `8930ed80f9` The conditions under which deadlock detection is conducted are unclear and undocumented. Add constants instead of using 0/1 and provide a selection function which hides the additional debug dependency from the calling code. Add comments where needed. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Link: http://lkml.kernel.org/r/20140522031949.947264874@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Conflicts: kernel/locking/rtmutex.c	2020-10-14 00:59:24 +03:00
Thomas Gleixner	0f48389512	rtmutex: Confine deadlock logic to futex upstream commit: `c051b21f71` The deadlock logic is only required for futexes. Remove the extra arguments for the public functions and also for the futex specific ones which get always called with deadlock detection enabled. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Conflicts: include/linux/rtmutex.h kernel/locking/rtmutex.c	2020-10-14 00:59:23 +03:00
Thomas Gleixner	41e69f2371	rtmutex: Simplify remove_waiter() upstream commit: `1ca7b86062` Exit right away, when the removed waiter was not the top priority waiter on the lock. Get rid of the extra indent level. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Conflicts: kernel/locking/rtmutex.c	2020-10-14 00:59:23 +03:00
Thomas Gleixner	5974d85ed0	rtmutex: Document pi chain walk upstream commit: `3eb65aeadf` Add commentry to document the chain walk and the protection mechanisms and their scope. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:23 +03:00
Thomas Gleixner	af64a14546	rtmutex: Clarify the boost/deboost part upstream commit: `a57594a13a` Add a separate local variable for the boost/deboost logic to make the code more readable. Add comments where appropriate. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Conflicts: kernel/locking/rtmutex.c	2020-10-14 00:59:23 +03:00
Thomas Gleixner	5c1c8184e8	rtmutex: No need to keep task ref for lock owner check upstream commit: `2ffa5a5cd2` There is no point to keep the task ref across the check for lock owner. Drop the ref before that, so the protection context is clear. Found while documenting the chain walk. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:23 +03:00
Thomas Gleixner	b69f25004a	rtmutex: Simplify and document try_to_take_rtmutex() upstream commit: `358c331f39` The current implementation of try_to_take_rtmutex() is correct, but requires more than a single brain twist to understand the clever encoded conditionals. Untangle it and document the cases proper. Looks less efficient at the first glance, but actually reduces the binary code size on x8664 by 80 bytes. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Conflicts: kernel/locking/rtmutex.c	2020-10-14 00:59:23 +03:00
Thomas Gleixner	ba1d3cbb24	rtmutex: Simplify rtmutex_slowtrylock() upstream-commit: `88f2b4c15e` Oleg noticed that rtmutex_slowtrylock() has a pointless check for rt_mutex_owner(lock) != current. To avoid calling try_to_take_rtmutex() we really want to check whether the lock has an owner at all or whether the trylock failed because the owner is NULL, but the RT_MUTEX_HAS_WAITERS bit is set. This covers the lock is owned by caller situation as well. We can actually do this check lockless. trylock is taking a chance whether we take lock->wait_lock to do the check or not. Add comments to the function while at it. Reported-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Conflicts: kernel/locking/rtmutex.c	2020-10-14 00:59:23 +03:00
Sebastian Andrzej Siewior	88f8541ddd	gpio: omap: use raw locks for locking This patch converts gpio_bank.lock from a spin_lock into a raw_spin_lock. The call path is to access this lock is always under a raw_spin_lock, for instance - __setup_irq() holds &desc->lock with irq off + __irq_set_trigger() + omap_gpio_irq_type() - handle_level_irq() (runs with irqs off therefore raw locks) + mask_ack_irq() + omap_gpio_mask_irq() This fixes the obvious backtrace on -RT. However the locking vs context is not and this is not limited to -RT: - omap_gpio_irq_type() is called with IRQ off and has an conditional call to pm_runtime_get_sync() which may sleep. Either it may happen or it may not happen but pm_runtime_get_sync() should not be called with irqs off. - omap_gpio_debounce() is holding the lock with IRQs off. + omap2_set_gpio_debounce() + clk_prepare_enable() + clk_prepare() this one might sleep. The number of users of gpiod_set_debounce() / gpio_set_debounce() looks low but still this is not good. Cc: stable-rt@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:23 +03:00
Thomas Gleixner	9673232a79	workqueue: Prevent deadlock/stall on RT Austin reported a XFS deadlock/stall on RT where scheduled work gets never exececuted and tasks are waiting for each other for ever. The underlying problem is the modification of the RT code to the handling of workers which are about to go to sleep. In mainline a worker thread which goes to sleep wakes an idle worker if there is more work to do. This happens from the guts of the schedule() function. On RT this must be outside and the accessed data structures are not protected against scheduling due to the spinlock to rtmutex conversion. So the naive solution to this was to move the code outside of the scheduler and protect the data structures by the pool lock. That approach turned out to be a little naive as we cannot call into that code when the thread blocks on a lock, as it is not allowed to block on two locks in parallel. So we dont call into the worker wakeup magic when the worker is blocked on a lock, which causes the deadlock/stall observed by Austin and Mike. Looking deeper into that worker code it turns out that the only relevant data structure which needs to be protected is the list of idle workers which can be woken up. So the solution is to protect the list manipulation operations with preempt_enable/disable pairs on RT and call unconditionally into the worker code even when the worker is blocked on a lock. The preemption protection is safe as there is nothing which can fiddle with the list outside of thread context. Reported-and_tested-by: Austin Schuh <austin@peloton-tech.com> Reported-and_tested-by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: http://vger.kernel.org/r/alpine.DEB.2.10.1406271249510.5170@nanos Cc: Richard Weinberger <richard.weinberger@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: stable-rt@vger.kernel.org	2020-10-14 00:59:23 +03:00
Steven Rostedt	504c1e6c4a	sched: Do not clear PF_NO_SETAFFINITY flag in select_fallback_rq() I talked with Peter Zijlstra about this, and he told me that the clearing of the PF_NO_SETAFFINITY flag was to deal with the optimization of migrate_disable/enable() that ignores tasks that have that flag set. But that optimization was removed when I did a rework of the cpu hotplug code. I found that ignoring tasks that had that flag set would cause those tasks to not sync with the hotplug code and cause the kernel to crash. Thus it needed to not treat them special and those tasks had to go though the same work as tasks without that flag set. Now that those tasks are not treated special, there's no reason to clear the flag. May still need to be tested as the migrate_me() code does not ignore those flags. Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Clark Williams <williams@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140701111444.0cfebaa1@gandalf.local.home Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:23 +03:00
Sebastian Andrzej Siewior	cd93a88a67	disable preempt lazy on x86-64 it still explodes Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:23 +03:00
Sebastian Andrzej Siewior	28abbe8efe	md: disable bcache It uses anon semaphores \|drivers/md/bcache/request.c: In function ‘cached_dev_write_complete’: \|drivers/md/bcache/request.c:1007:2: error: implicit declaration of function ‘up_read_non_owner’ [-Werror=implicit-function-declaration] \| up_read_non_owner(&dc->writeback_lock); \| ^ \|drivers/md/bcache/request.c: In function ‘request_write’: \|drivers/md/bcache/request.c:1033:2: error: implicit declaration of function ‘down_read_non_owner’ [-Werror=implicit-function-declaration] \| down_read_non_owner(&dc->writeback_lock); \| ^ either we get rid of those or we have to introduce them… Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:23 +03:00
Steven Rostedt	646a8ab0b1	rt,ntp: Move call to schedule_delayed_work() to helper thread The ntp code for notify_cmos_timer() is called from a hard interrupt context. schedule_delayed_work() under PREEMPT_RT_FULL calls spinlocks that have been converted to mutexes, thus calling schedule_delayed_work() from interrupt is not safe. Add a helper thread that does the call to schedule_delayed_work and wake up that thread instead of calling schedule_delayed_work() directly. This is only for CONFIG_PREEMPT_RT_FULL, otherwise the code still calls schedule_delayed_work() directly in irq context. Note: There's a few places in the kernel that do this. Perhaps the RT code should have a dedicated thread that does the checks. Just register a notifier on boot up for your check and wake up the thread when needed. This will be a todo. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:23 +03:00
Sebastian Andrzej Siewior	b1fcb3c08e	a few open coded completions Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:23 +03:00
Thomas Gleixner	0d3b12ccc6	completion: Use simple wait queues Completions have no long lasting callbacks and therefor do not need the complex waitqueue variant. Use simple waitqueues which reduces the contention on the waitqueue lock. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:23 +03:00
Thomas Gleixner	a4203240fa	rcu-more-swait-conversions.patch Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Merged Steven's static void rcu_nocb_gp_cleanup(struct rcu_state rsp, struct rcu_node rnp) { - swait_wake(&rnp->nocb_gp_wq[rnp->completed & 0x1]); + wake_up_all(&rnp->nocb_gp_wq[rnp->completed & 0x1]); } Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:23 +03:00
Sebastian Andrzej Siewior	4437a7dee2	kernel/treercu: use a simple waitqueue Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:23 +03:00
Paul Gortmaker	ca0179a36a	simple-wait: rename and export the equivalent of waitqueue_active() The function "swait_head_has_waiters()" was internalized into wait-simple.c but it parallels the waitqueue_active of normal waitqueue support. Given that there are over 150 waitqueue_active users in drivers/ fs/ kernel/ and the like, lets make it globally visible, and rename it to parallel the waitqueue_active accordingly. We'll need to do this if we expect to expand its usage beyond RT. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:23 +03:00
Thomas Gleixner	65870d64d3	wait-simple: Rework for use with completions Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:23 +03:00
Thomas Gleixner	99076b3731	wait-simple: Simple waitqueue implementation wait_queue is a swiss army knife and in most of the cases the complexity is not needed. For RT waitqueues are a constant source of trouble as we can't convert the head lock to a raw spinlock due to fancy and long lasting callbacks. Provide a slim version, which allows RT to replace wait queues. This should go mainline as well, as it lowers memory consumption and runtime overhead. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> smp_mb() added by Steven Rostedt to fix a race condition with swait wakeups vs adding items to the list.	2020-10-14 00:59:22 +03:00
Sebastian Andrzej Siewior	e49d664aa7	wait.h: include atomic.h \| CC init/main.o \|In file included from include/linux/mmzone.h:9:0, \| from include/linux/gfp.h:4, \| from include/linux/kmod.h:22, \| from include/linux/module.h:13, \| from init/main.c:15: \|include/linux/wait.h: In function ‘wait_on_atomic_t’: \|include/linux/wait.h:982:2: error: implicit declaration of function ‘atomic_read’ [-Werror=implicit-function-declaration] \| if (atomic_read(val) == 0) \| ^ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:22 +03:00
Sebastian Andrzej Siewior	0e144af33c	drm/i915: drop trace_i915_gem_ring_dispatch on rt This tracepoint is responsible for: \|[<814cc358>] __schedule_bug+0x4d/0x59 \|[<814d24cc>] __schedule+0x88c/0x930 \|[<814d3b90>] ? _raw_spin_unlock_irqrestore+0x40/0x50 \|[<814d3b95>] ? _raw_spin_unlock_irqrestore+0x45/0x50 \|[<810b57b5>] ? task_blocks_on_rt_mutex+0x1f5/0x250 \|[<814d27d9>] schedule+0x29/0x70 \|[<814d3423>] rt_spin_lock_slowlock+0x15b/0x278 \|[<814d3786>] rt_spin_lock+0x26/0x30 \|[<a00dced9>] gen6_gt_force_wake_get+0x29/0x60 [i915] \|[<a00e183f>] gen6_ring_get_irq+0x5f/0x100 [i915] \|[<a00b2a33>] ftrace_raw_event_i915_gem_ring_dispatch+0xe3/0x100 [i915] \|[<a00ac1b3>] i915_gem_do_execbuffer.isra.13+0xbd3/0x1430 [i915] \|[<810f8943>] ? trace_buffer_unlock_commit+0x43/0x60 \|[<8113e8d2>] ? ftrace_raw_event_kmem_alloc+0xd2/0x180 \|[<8101d063>] ? native_sched_clock+0x13/0x80 \|[<a00acf29>] i915_gem_execbuffer2+0x99/0x280 [i915] \|[<a00114a3>] drm_ioctl+0x4c3/0x570 [drm] \|[<8101d0d9>] ? sched_clock+0x9/0x10 \|[<a00ace90>] ? i915_gem_execbuffer+0x480/0x480 [i915] \|[<810f1c18>] ? rb_commit+0x68/0xa0 \|[<810f1c6c>] ? ring_buffer_unlock_commit+0x1c/0xa0 \|[<81197467>] do_vfs_ioctl+0x97/0x540 \|[<81021318>] ? ftrace_raw_event_sys_enter+0xd8/0x130 \|[<811979a1>] sys_ioctl+0x91/0xb0 \|[<814db931>] tracesys+0xe1/0xe6 Chris Wilson does not like to move i915_trace_irq_get() out of the macro \|No. This enables the IRQ, as well as making a number of \|very expensively serialised read, unconditionally. so it is gone now on RT. Cc: stable-rt@vger.kernel.org Reported-by: Joakim Hernberg <jbh@alchemy.lu> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:22 +03:00
Sebastian Andrzej Siewior	5550291744	gpu/i915: don't open code these things The opencode part is gone in `1f83fee0` ("drm/i915: clear up wedged transitions") the owner check is still there. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:22 +03:00
Thomas Gleixner	11fd979912	mmci: Remove bogus local_irq_save() On !RT interrupt runs with interrupts disabled. On RT it's in a thread, so no need to disable interrupts at all. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:22 +03:00
Sebastian Andrzej Siewior	b83d9c8fd4	i2c/omap: drop the lock hard irq context The lock is taken while reading two registers. On RT the first lock is taken in hard irq where it might sleep and in the threaded irq. The threaded irq runs in oneshot mode so the hard irq does not run until the thread the completes so there is no reason to grab the lock. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:22 +03:00
Sebastian Andrzej Siewior	16a70d5345	leds: trigger: disable CPU trigger on -RT as it triggers: \|CPU: 0 PID: 0 Comm: swapper Not tainted 3.12.8-rt10 #141 \|[<c0014aa4>] (unwind_backtrace+0x0/0xf8) from [<c0012788>] (show_stack+0x1c/0x20) \|[<c0012788>] (show_stack+0x1c/0x20) from [<c043c8dc>] (dump_stack+0x20/0x2c) \|[<c043c8dc>] (dump_stack+0x20/0x2c) from [<c004c5e8>] (__might_sleep+0x13c/0x170) \|[<c004c5e8>] (__might_sleep+0x13c/0x170) from [<c043f270>] (__rt_spin_lock+0x28/0x38) \|[<c043f270>] (__rt_spin_lock+0x28/0x38) from [<c043fa00>] (rt_read_lock+0x68/0x7c) \|[<c043fa00>] (rt_read_lock+0x68/0x7c) from [<c036cf74>] (led_trigger_event+0x2c/0x5c) \|[<c036cf74>] (led_trigger_event+0x2c/0x5c) from [<c036e0bc>] (ledtrig_cpu+0x54/0x5c) \|[<c036e0bc>] (ledtrig_cpu+0x54/0x5c) from [<c000ffd8>] (arch_cpu_idle_exit+0x18/0x1c) \|[<c000ffd8>] (arch_cpu_idle_exit+0x18/0x1c) from [<c00590b8>] (cpu_startup_entry+0xa8/0x234) \|[<c00590b8>] (cpu_startup_entry+0xa8/0x234) from [<c043b2cc>] (rest_init+0xb8/0xe0) \|[<c043b2cc>] (rest_init+0xb8/0xe0) from [<c061ebe0>] (start_kernel+0x2c4/0x380) Cc: stable-rt@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:22 +03:00
Thomas Gleixner	322d3dd7f7	powerpc-preempt-lazy-support.patch Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:22 +03:00
Thomas Gleixner	de01f48d58	arm-preempt-lazy-support.patch Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:22 +03:00
Thomas Gleixner	5ccf423693	x86-preempt-lazy.patch Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:22 +03:00
Thomas Gleixner	b33545443e	sched: Add support for lazy preemption It has become an obsession to mitigate the determinism vs. throughput loss of RT. Looking at the mainline semantics of preemption points gives a hint why RT sucks throughput wise for ordinary SCHED_OTHER tasks. One major issue is the wakeup of tasks which are right away preempting the waking task while the waking task holds a lock on which the woken task will block right after having preempted the wakee. In mainline this is prevented due to the implicit preemption disable of spin/rw_lock held regions. On RT this is not possible due to the fully preemptible nature of sleeping spinlocks. Though for a SCHED_OTHER task preempting another SCHED_OTHER task this is really not a correctness issue. RT folks are concerned about SCHED_FIFO/RR tasks preemption and not about the purely fairness driven SCHED_OTHER preemption latencies. So I introduced a lazy preemption mechanism which only applies to SCHED_OTHER tasks preempting another SCHED_OTHER task. Aside of the existing preempt_count each tasks sports now a preempt_lazy_count which is manipulated on lock acquiry and release. This is slightly incorrect as for lazyness reasons I coupled this on migrate_disable/enable so some other mechanisms get the same treatment (e.g. get_cpu_light). Now on the scheduler side instead of setting NEED_RESCHED this sets NEED_RESCHED_LAZY in case of a SCHED_OTHER/SCHED_OTHER preemption and therefor allows to exit the waking task the lock held region before the woken task preempts. That also works better for cross CPU wakeups as the other side can stay in the adaptive spinning loop. For RT class preemption there is no change. This simply sets NEED_RESCHED and forgoes the lazy preemption counter. Initial test do not expose any observable latency increasement, but history shows that I've been proven wrong before :) The lazy preemption mode is per default on, but with CONFIG_SCHED_DEBUG enabled it can be disabled via: # echo NO_PREEMPT_LAZY >/sys/kernel/debug/sched_features and reenabled via # echo PREEMPT_LAZY >/sys/kernel/debug/sched_features The test results so far are very machine and workload dependent, but there is a clear trend that it enhances the non RT workload performance. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:22 +03:00
Sebastian Andrzej Siewior	1b1950518f	rcu: make RCU_BOOST default on RT Since it is no longer invoked from the softirq people run into OOM more often if the priority of the RCU thread is too low. Making boosting default on RT should help in those case and it can be switched off if someone knows better. Cc: stable-rt@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:22 +03:00
Paul E. McKenney	21f9c8f24c	rcu: Eliminate softirq processing from rcutree Running RCU out of softirq is a problem for some workloads that would like to manage RCU core processing independently of other softirq work, for example, setting kthread priority. This commit therefore moves the RCU core work from softirq to a per-CPU/per-flavor SCHED_OTHER kthread named rcuc. The SCHED_OTHER approach avoids the scalability problems that appeared with the earlier attempt to move RCU core processing to from softirq to kthreads. That said, kernels built with RCU_BOOST=y will run the rcuc kthreads at the RCU-boosting priority. Reported-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Mike Galbraith <bitbucket@online.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:22 +03:00
Thomas Gleixner	e69d00b006	rcu: Disable RCU_FAST_NO_HZ on RT This uses a timer_list timer from the irq disabled guts of the idle code. Disable it for now to prevent wreckage. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org	2020-10-14 00:59:22 +03:00

1 2 3 4 5 ...

431760 Commits All Branches Search

431760 Commits

All Branches