linux

Commit Graph

Author	SHA1	Message	Date
Thomas Gleixner	2cac4318fb	net: Use cpu_chill() instead of cpu_relax() Retry loops on RT might loop forever when the modifying side was preempted. Use cpu_chill() instead of cpu_relax() to let the system make progress. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2023-03-25 04:21:28 +03:00
Thomas Gleixner	15387f991a	fs: namespace: Use cpu_chill() in trylock loops Retry loops on RT might loop forever when the modifying side was preempted. Use cpu_chill() instead of cpu_relax() to let the system make progress. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2023-03-25 04:21:28 +03:00
Thomas Gleixner	125c578f0b	block: Use cpu_chill() for retry loops Retry loops on RT might loop forever when the modifying side was preempted. Steven also observed a live lock when there was a concurrent priority boosting going on. Use cpu_chill() instead of cpu_relax() to let the system make progress. [bigeasy: After all those changes that occured over the years, this one hunk is left and should not cause any starvation on -RT anymore] Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2023-03-25 04:21:28 +03:00
Thomas Gleixner	9b9cfa4b04	rt: Introduce cpu_chill() Retry loops on RT might loop forever when the modifying side was preempted. Add cpu_chill() to replace cpu_relax(). cpu_chill() defaults to cpu_relax() for non RT. On RT it puts the looping task to sleep for a tick so the preempted task can make progress. Steven Rostedt changed it to use a hrtimer instead of msleep(): \| \|Ulrich Obergfell pointed out that cpu_chill() calls msleep() which is woken \|up by the ksoftirqd running the TIMER softirq. But as the cpu_chill() is \|called from softirq context, it may block the ksoftirqd() from running, in \|which case, it may never wake up the msleep() causing the deadlock. + bigeasy later changed to schedule_hrtimeout() \|If a task calls cpu_chill() and gets woken up by a regular or spurious \|wakeup and has a signal pending, then it exits the sleep loop in \|do_nanosleep() and sets up the restart block. If restart->nanosleep.type is \|not TI_NONE then this results in accessing a stale user pointer from a \|previously interrupted syscall and a copy to user based on the stale \|pointer or a BUG() when 'type' is not supported in nanosleep_copyout(). + bigeasy: add PF_NOFREEZE: \| [....] Waiting for /dev to be fully populated... \| ===================================== \| [ BUG: udevd/229 still has locks held! ] \| 3.12.11-rt17 #23 Not tainted \| ------------------------------------- \| 1 lock held by udevd/229: \| #0: (&type->i_mutex_dir_key#2){+.+.+.}, at: lookup_slow+0x28/0x98 \| \| stack backtrace: \| CPU: 0 PID: 229 Comm: udevd Not tainted 3.12.11-rt17 #23 \| (unwind_backtrace+0x0/0xf8) from (show_stack+0x10/0x14) \| (show_stack+0x10/0x14) from (dump_stack+0x74/0xbc) \| (dump_stack+0x74/0xbc) from (do_nanosleep+0x120/0x160) \| (do_nanosleep+0x120/0x160) from (hrtimer_nanosleep+0x90/0x110) \| (hrtimer_nanosleep+0x90/0x110) from (cpu_chill+0x30/0x38) \| (cpu_chill+0x30/0x38) from (dentry_kill+0x158/0x1ec) \| (dentry_kill+0x158/0x1ec) from (dput+0x74/0x15c) \| (dput+0x74/0x15c) from (lookup_real+0x4c/0x50) \| (lookup_real+0x4c/0x50) from (__lookup_hash+0x34/0x44) \| (__lookup_hash+0x34/0x44) from (lookup_slow+0x38/0x98) \| (lookup_slow+0x38/0x98) from (path_lookupat+0x208/0x7fc) \| (path_lookupat+0x208/0x7fc) from (filename_lookup+0x20/0x60) \| (filename_lookup+0x20/0x60) from (user_path_at_empty+0x50/0x7c) \| (user_path_at_empty+0x50/0x7c) from (user_path_at+0x14/0x1c) \| (user_path_at+0x14/0x1c) from (vfs_fstatat+0x48/0x94) \| (vfs_fstatat+0x48/0x94) from (SyS_stat64+0x14/0x30) \| (SyS_stat64+0x14/0x30) from (ret_fast_syscall+0x0/0x48) Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:28 +03:00
Mike Galbraith	3ebd0b93c2	sunrpc: Make svc_xprt_do_enqueue() use get_cpu_light() \|BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:915 \|in_atomic(): 1, irqs_disabled(): 0, pid: 3194, name: rpc.nfsd \|Preemption disabled at:[<ffffffffa06bf0bb>] svc_xprt_received+0x4b/0xc0 [sunrpc] \|CPU: 6 PID: 3194 Comm: rpc.nfsd Not tainted 3.18.7-rt1 #9 \|Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.404 11/06/2014 \| ffff880409630000 ffff8800d9a33c78 ffffffff815bdeb5 0000000000000002 \| 0000000000000000 ffff8800d9a33c98 ffffffff81073c86 ffff880408dd6008 \| ffff880408dd6000 ffff8800d9a33cb8 ffffffff815c3d84 ffff88040b3ac000 \|Call Trace: \| [<ffffffff815bdeb5>] dump_stack+0x4f/0x9e \| [<ffffffff81073c86>] __might_sleep+0xe6/0x150 \| [<ffffffff815c3d84>] rt_spin_lock+0x24/0x50 \| [<ffffffffa06beec0>] svc_xprt_do_enqueue+0x80/0x230 [sunrpc] \| [<ffffffffa06bf0bb>] svc_xprt_received+0x4b/0xc0 [sunrpc] \| [<ffffffffa06c03ed>] svc_add_new_perm_xprt+0x6d/0x80 [sunrpc] \| [<ffffffffa06b2693>] svc_addsock+0x143/0x200 [sunrpc] \| [<ffffffffa072e69c>] write_ports+0x28c/0x340 [nfsd] \| [<ffffffffa072d2ac>] nfsctl_transaction_write+0x4c/0x80 [nfsd] \| [<ffffffff8117ee83>] vfs_write+0xb3/0x1d0 \| [<ffffffff8117f889>] SyS_write+0x49/0xb0 \| [<ffffffff815c4556>] system_call_fastpath+0x16/0x1b Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:28 +03:00
Thomas Gleixner	2af2485dd8	scsi/fcoe: Make RT aware. Do not disable preemption while taking sleeping locks. All user look safe for migrate_diable() only. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2023-03-25 04:21:28 +03:00
Thomas Gleixner	fde5499a8f	md: raid5: Make raid5_percpu handling RT aware __raid_run_ops() disables preemption with get_cpu() around the access to the raid5_percpu variables. That causes scheduling while atomic spews on RT. Serialize the access to the percpu data with a lock and keep the code preemptible. Reported-by: Udo van den Heuvel <udovdh@xs4all.nl> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Udo van den Heuvel <udovdh@xs4all.nl>	2023-03-25 04:21:28 +03:00
Sebastian Andrzej Siewior	bad3589dc7	block/mq: don't complete requests via IPI The IPI runs in hardirq context and there are sleeping locks. Assume caches are shared and complete them on the local CPU. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:28 +03:00
Sebastian Andrzej Siewior	4abe6b99f0	block/mq: do not invoke preempt_disable() preempt_disable() and get_cpu() don't play well together with the sleeping locks it tries to allocate later. It seems to be enough to replace it with get_cpu_light() and migrate_disable(). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:28 +03:00
Thomas Gleixner	457d5cccfb	mm/vmalloc: Another preempt disable region which sucks Avoid the preempt disable version of get_cpu_var(). The inner-lock should provide enough serialisation. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2023-03-25 04:21:28 +03:00
Thomas Gleixner	dfd47e14f8	fs/epoll: Do not disable preemption on RT ep_call_nested() takes a sleeping lock so we can't disable preemption. The light version is enough since ep_call_nested() doesn't mind beeing invoked twice on the same CPU. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2023-03-25 04:21:27 +03:00
Scott Wood	a5850930e9	rcutorture: Avoid problematic critical section nesting on RT rcutorture was generating some nesting scenarios that are not reasonable. Constrain the state selection to avoid them. Example #1: 1. preempt_disable() 2. local_bh_disable() 3. preempt_enable() 4. local_bh_enable() On PREEMPT_RT, BH disabling takes a local lock only when called in non-atomic context. Thus, atomic context must be retained until after BH is re-enabled. Likewise, if BH is initially disabled in non-atomic context, it cannot be re-enabled in atomic context. Example #2: 1. rcu_read_lock() 2. local_irq_disable() 3. rcu_read_unlock() 4. local_irq_enable() If the thread is preempted between steps 1 and 2, rcu_read_unlock_special.b.blocked will be set, but it won't be acted on in step 3 because IRQs are disabled. Thus, reporting of the quiescent state will be delayed beyond the local_irq_enable(). For now, these scenarios will continue to be tested on non-PREEMPT_RT kernels, until debug checks are added to ensure that they are not happening elsewhere. Signed-off-by: Scott Wood <swood@redhat.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:27 +03:00
Julia Cartwright	56ca6ad996	rcu: enable rcu_normal_after_boot by default for RT The forcing of an expedited grace period is an expensive and very RT-application unfriendly operation, as it forcibly preempts all running tasks on CPUs which are preventing the gp from expiring. By default, as a policy decision, disable the expediting of grace periods (after boot) on configurations which enable PREEMPT_RT. Suggested-by: Luiz Capitulino <lcapitulino@redhat.com> Acked-by: Paul E. McKenney <paulmck@linux.ibm.com> Signed-off-by: Julia Cartwright <julia@ni.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:27 +03:00
Sebastian Andrzej Siewior	277efec1fa	srcu: replace local_irqsave() with a locallock There are two instances which disable interrupts in order to become a stable this_cpu_ptr() pointer. The restore part is coupled with spin_unlock_irqrestore() which does not work on RT. Replace the local_irq_save() call with the appropriate local_lock() version of it. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:27 +03:00
Scott Wood	fc049aaf7b	rcu: Use rcuc threads on PREEMPT_RT as we did While switching to the reworked RCU-thread code, it has been forgotten to enable the thread processing on -RT. Besides restoring behavior that used to be default on RT, this avoids a deadlock on scheduler locks. Signed-off-by: Scott Wood <swood@redhat.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:27 +03:00
Sebastian Andrzej Siewior	27de62e778	locking: Make spinlock_t and rwlock_t a RCU section on RT On !RT a locked spinlock_t and rwlock_t disables preemption which implies a RCU read section. There is code that relies on that behaviour. Add an explicit RCU read section on RT while a sleeping lock (a lock which would disables preemption on !RT) acquired. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:27 +03:00
Sebastian Andrzej Siewior	f8d8e5803c	locking: don't check for __LINUX_SPINLOCK_TYPES_H on -RT archs Upstream uses arch_spinlock_t within spinlock_t and requests that spinlock_types.h header file is included first. On -RT we have the rt_mutex with its raw_lock wait_lock which needs architectures' spinlock_types.h header file for its definition. However we need rt_mutex first because it is used to build the spinlock_t so that check does not work for us. Therefore I am dropping that check. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:27 +03:00
Thomas Gleixner	c8ff2f06be	futex: workaround migrate_disable/enable in different context migrate_enable() invokes __schedule() and it expects a preempt count of one. Holding a raw_spinlock_t with disabled interrupts should not allow scheduling. These little hacks ensure that we don't schedule while we lock the hb lockwith interrupts enabled and unlock it with interrupts disabled. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [XXX: As per PeterZ suggesstion set_thread_flag(TIF_NEED_RESCHED); preempt_fold_need_resched() would trigger a scheduler invocation on the last preempt_enable() which in turn would allow to drop this. ] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:27 +03:00
Thomas Gleixner	6ff7d19cf2	trace: Add migrate-disabled counter to tracing output Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2023-03-25 04:21:27 +03:00
Scott Wood	a4aa572645	sched: migrate_enable: Remove __schedule() call We can rely on preempt_enable() to schedule. Besides simplifying the code, this potentially allows sequences such as the following to be permitted: migrate_disable(); preempt_disable(); migrate_enable(); preempt_enable(); Suggested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Scott Wood <swood@redhat.com> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:27 +03:00
Scott Wood	e2e316ff2b	sched: migrate_enable: Use per-cpu cpu_stop_work Commit e6c287b1512d ("sched: migrate_enable: Use stop_one_cpu_nowait()") adds a busy wait to deal with an edge case where the migrated thread can resume running on another CPU before the stopper has consumed cpu_stop_work. However, this is done with preemption disabled and can potentially lead to deadlock. While it is not guaranteed that the cpu_stop_work will be consumed before the migrating thread resumes and exits the stack frame, it is guaranteed that nothing other than the stopper can run on the old cpu between the migrating thread scheduling out and the cpu_stop_work being consumed. Thus, we can store cpu_stop_work in per-cpu data without it being reused too early. Fixes: e6c287b1512d ("sched: migrate_enable: Use stop_one_cpu_nowait()") Suggested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Scott Wood <swood@redhat.com> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:27 +03:00
Scott Wood	5be92193b8	sched: migrate_enable: Use stop_one_cpu_nowait() migrate_enable() can be called with current->state != TASK_RUNNING. Avoid clobbering the existing state by using stop_one_cpu_nowait(). Since we're stopping the current cpu, we know that we won't get past __schedule() until migration_cpu_stop() has run (at least up to the point of migrating us to another cpu). Signed-off-by: Scott Wood <swood@redhat.com> [bigeasy: spin until the request has been processed] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:27 +03:00
Sebastian Andrzej Siewior	fe6fe368b0	sched/core: migrate_enable() must access takedown_cpu_task on !HOTPLUG_CPU The variable takedown_cpu_task is never declared/used on !HOTPLUG_CPU except for migrate_enable(). This leads to a link error. Don't use takedown_cpu_task in !HOTPLUG_CPU. Reported-by: Dick Hollenbeck <dick@softplc.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:26 +03:00
Sebastian Andrzej Siewior	1158230ec6	kernel/sched/core: add migrate_disable() [bristot@redhat.com: rt: Increase/decrease the nr of migratory tasks when enabling/disabling migration Link: https://lkml.kernel.org/r/e981d271cbeca975bca710e2fbcc6078c09741b0.1498482127.git.bristot@redhat.com ] [swood@redhat.com: fixups and optimisations Link:https://lkml.kernel.org/r/20190727055638.20443-1-swood@redhat.com Link:https://lkml.kernel.org/r/20191012065214.28109-1-swood@redhat.com ] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:26 +03:00
Sebastian Andrzej Siewior	85b10d0ec5	ptrace: fix ptrace vs tasklist_lock race As explained by Alexander Fyodorov <halcy@yandex.ru>: \|read_lock(&tasklist_lock) in ptrace_stop() is converted to mutex on RT kernel, \|and it can remove __TASK_TRACED from task->state (by moving it to \|task->saved_state). If parent does wait() on child followed by a sys_ptrace \|call, the following race can happen: \| \|- child sets __TASK_TRACED in ptrace_stop() \|- parent does wait() which eventually calls wait_task_stopped() and returns \| child's pid \|- child blocks on read_lock(&tasklist_lock) in ptrace_stop() and moves \| __TASK_TRACED flag to saved_state \|- parent calls sys_ptrace, which calls ptrace_check_attach() and wait_task_inactive() The patch is based on his initial patch where an additional check is added in case the __TASK_TRACED moved to ->saved_state. The pi_lock is taken in case the caller is interrupted between looking into ->state and ->saved_state. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:26 +03:00
Sebastian Andrzej Siewior	70c7cce455	locking/rtmutex: re-init the wait_lock in rt_mutex_init_proxy_locked() We could provide a key-class for the lockdep (and fixup all callers) or move the init to all callers (like it was) in order to avoid lockdep seeing a double-lock of the wait_lock. Reported-by: Fernando Lopez-Lezcano <nando@ccrma.Stanford.EDU> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:26 +03:00
Scott Wood	714e8359dd	locking/rt-mutex: Flush block plug on __down_read() __down_read() bypasses the rtmutex frontend to call rt_mutex_slowlock_locked() directly, and thus it needs to call blk_schedule_flush_flug() itself. Cc: stable-rt@vger.kernel.org Signed-off-by: Scott Wood <swood@redhat.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:26 +03:00
Mikulas Patocka	867ae8be55	locking/rt-mutex: fix deadlock in device mapper / block-IO When some block device driver creates a bio and submits it to another block device driver, the bio is added to current->bio_list (in order to avoid unbounded recursion). However, this queuing of bios can cause deadlocks, in order to avoid them, device mapper registers a function flush_current_bio_list. This function is called when device mapper driver blocks. It redirects bios queued on current->bio_list to helper workqueues, so that these bios can proceed even if the driver is blocked. The problem with CONFIG_PREEMPT_RT is that when the device mapper driver blocks, it won't call flush_current_bio_list (because tsk_is_pi_blocked returns true in sched_submit_work), so deadlocks in block device stack can happen. Note that we can't call blk_schedule_flush_plug if tsk_is_pi_blocked returns true - that would cause BUG_ON(rt_mutex_real_waiter(task->pi_blocked_on)) in task_blocks_on_rt_mutex when flush_current_bio_list attempts to take a spinlock. So the proper fix is to call blk_schedule_flush_plug in rt_mutex_fastlock, when fast acquire failed and when the task is about to block. CC: stable-rt@vger.kernel.org [bigeasy: The deadlock is not device-mapper specific, it can also occur in plain EXT4] Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:26 +03:00
Sebastian Andrzej Siewior	154f5666ec	rtmutex: add ww_mutex addon for mutex-rt Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:26 +03:00
Thomas Gleixner	4b3a456867	rtmutex: wire up RT's locking Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:26 +03:00
Thomas Gleixner	993b8b95ab	rtmutex: add rwlock implementation based on rtmutex The implementation is bias-based, similar to the rwsem implementation. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:26 +03:00
Thomas Gleixner	8a55693ca5	rtmutex: add rwsem implementation based on rtmutex The RT specific R/W semaphore implementation restricts the number of readers to one because a writer cannot block on multiple readers and inherit its priority or budget. The single reader restricting is painful in various ways: - Performance bottleneck for multi-threaded applications in the page fault path (mmap sem) - Progress blocker for drivers which are carefully crafted to avoid the potential reader/writer deadlock in mainline. The analysis of the writer code pathes shows, that properly written RT tasks should not take them. Syscalls like mmap(), file access which take mmap sem write locked have unbound latencies which are completely unrelated to mmap sem. Other R/W sem users like graphics drivers are not suitable for RT tasks either. So there is little risk to hurt RT tasks when the RT rwsem implementation is changed in the following way: - Allow concurrent readers - Make writers block until the last reader left the critical section. This blocking is not subject to priority/budget inheritance. - Readers blocked on a writer inherit their priority/budget in the normal way. There is a drawback with this scheme. R/W semaphores become writer unfair though the applications which have triggered writer starvation (mostly on mmap_sem) in the past are not really the typical workloads running on a RT system. So while it's unlikely to hit writer starvation, it's possible. If there are unexpected workloads on RT systems triggering it, we need to rethink the approach. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:26 +03:00
Thomas Gleixner	c4457270a5	rtmutex: add mutex implementation based on rtmutex Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:26 +03:00
Sebastian Andrzej Siewior	dfa76b1102	rtmutex: trylock is okay on -RT non-RT kernel could deadlock on rt_mutex_trylock() in softirq context. On -RT we don't run softirqs in IRQ context but in thread context so it is not a issue here. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:26 +03:00
Peter Zijlstra	749c611945	locking/rtmutex: Clean ->pi_blocked_on in the error case The function rt_mutex_wait_proxy_lock() cleans ->pi_blocked_on in case of failure (timeout, signal). The same cleanup is required in __rt_mutex_start_proxy_lock(). In both the cases the tasks was interrupted by a signal or timeout while acquiring the lock and after the interruption it longer blocks on the lock. Fixes: `1a1fb985f2` ("futex: Handle early deadlock return correctly") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:25 +03:00
Thomas Gleixner	44f08144d7	sched: Use the proper LOCK_OFFSET for cond_resched() RT does not increment preempt count when a 'sleeping' spinlock is locked. Update PREEMPT_LOCK_OFFSET for that case. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2023-03-25 04:21:25 +03:00
Thomas Gleixner	d3e4782a4e	rtmutex: add sleeping lock implementation Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:25 +03:00
Thomas Gleixner	63a7b1d0ca	rtmutex: export lockdep-less version of rt_mutex's lock, trylock and unlock Required for lock implementation ontop of rtmutex. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:25 +03:00
Thomas Gleixner	2b74b5dd97	rtmutex: Provide rt_mutex_slowlock_locked() This is the inner-part of rt_mutex_slowlock(), required for rwsem-rt. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:25 +03:00
Sebastian Andrzej Siewior	b50141e955	rbtree: don't include the rcu header The RCU header pulls in spinlock.h and fails due not yet defined types: \|In file included from include/linux/spinlock.h:275:0, \| from include/linux/rcupdate.h:38, \| from include/linux/rbtree.h:34, \| from include/linux/rtmutex.h:17, \| from include/linux/spinlock_types.h:18, \| from kernel/bounds.c:13: \|include/linux/rwlock_rt.h:16:38: error: unknown type name ‘rwlock_t’ \| extern void __lockfunc rt_write_lock(rwlock_t *rwlock); \| ^ This patch moves the required RCU function from the rcupdate.h header file into a new header file which can be included by both users. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:25 +03:00
Thomas Gleixner	bbfb297717	rtmutex: Avoid include hell Include only the required raw types. This avoids pulling in the complete spinlock header which in turn requires rtmutex.h at some point. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2023-03-25 04:21:25 +03:00
Thomas Gleixner	8d03371f8a	spinlock: Split the lock types header Split raw_spinlock into its own file and the remaining spinlock_t into its own non-RT header. The non-RT header will be replaced later by sleeping spinlocks. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2023-03-25 04:21:25 +03:00
Thomas Gleixner	9b58817f04	rtmutex: Make lock_killable work Locking an rt mutex killable does not work because signal handling is restricted to TASK_INTERRUPTIBLE. Use signal_pending_state() unconditionaly. Cc: stable-rt@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:25 +03:00
Thomas Gleixner	64474cb7c1	rtmutex: Add rtmutex_lock_killable() Add "killable" type to rtmutex. We need this since rtmutex are used as "normal" mutexes which do use this type. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2023-03-25 04:21:25 +03:00
Wolfgang M. Reimer	f4d2cdae88	locking: locktorture: Do NOT include rwlock.h directly Including rwlock.h directly will cause kernel builds to fail if CONFIG_PREEMPT_RT is defined. The correct header file (rwlock_rt.h OR rwlock.h) will be included by spinlock.h which is included by locktorture.c anyway. Cc: stable-rt@vger.kernel.org Signed-off-by: Wolfgang M. Reimer <linuxball@gmail.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:25 +03:00
Grygorii Strashko	bf31352d9f	pid.h: include atomic.h This patch fixes build error: CC kernel/pid_namespace.o In file included from kernel/pid_namespace.c:11:0: include/linux/pid.h: In function 'get_pid': include/linux/pid.h:78:3: error: implicit declaration of function 'atomic_inc' [-Werror=implicit-function-declaration] atomic_inc(&pid->count); ^ which happens when CONFIG_PROVE_LOCKING=n CONFIG_DEBUG_SPINLOCK=n CONFIG_DEBUG_MUTEXES=n CONFIG_DEBUG_LOCK_ALLOC=n CONFIG_PID_NS=y Vanilla gets this via spinlock.h. Signed-off-by: Grygorii Strashko <Grygorii.Strashko@linaro.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:25 +03:00
Thomas Gleixner	cc08805ba8	futex: Ensure lock/unlock symetry versus pi_lock and hash bucket lock In exit_pi_state_list() we have the following locking construct: spin_lock(&hb->lock); raw_spin_lock_irq(&curr->pi_lock); ... spin_unlock(&hb->lock); In !RT this works, but on RT the migrate_enable() function which is called from spin_unlock() sees atomic context due to the held pi_lock and just decrements the migrate_disable_atomic counter of the task. Now the next call to migrate_disable() sees the counter being negative and issues a warning. That check should be in migrate_enable() already. Fix this by dropping pi_lock before unlocking hb->lock and reaquire pi_lock after that again. This is safe as the loop code reevaluates head again under the pi_lock. Reported-by: Yong Zhang <yong.zhang@windriver.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:25 +03:00
Steven Rostedt	421d4b32b1	futex: Fix bug on when a requeued RT task times out Requeue with timeout causes a bug with PREEMPT_RT. The bug comes from a timed out condition. TASK 1 TASK 2 ------ ------ futex_wait_requeue_pi() futex_wait_queue_me() <timed out> double_lock_hb(); raw_spin_lock(pi_lock); if (current->pi_blocked_on) { } else { current->pi_blocked_on = PI_WAKE_INPROGRESS; run_spin_unlock(pi_lock); spin_lock(hb->lock); <-- blocked! plist_for_each_entry_safe(this) { rt_mutex_start_proxy_lock(); task_blocks_on_rt_mutex(); BUG_ON(task->pi_blocked_on)!!!! The BUG_ON() actually has a check for PI_WAKE_INPROGRESS, but the problem is that, after TASK 1 sets PI_WAKE_INPROGRESS, it then tries to grab the hb->lock, which it fails to do so. As the hb->lock is a mutex, it will block and set the "pi_blocked_on" to the hb->lock. When TASK 2 goes to requeue it, the check for PI_WAKE_INPROGESS fails because the task1's pi_blocked_on is no longer set to that, but instead, set to the hb->lock. The fix: When calling rt_mutex_start_proxy_lock() a check is made to see if the proxy tasks pi_blocked_on is set. If so, exit out early. Otherwise set it to a new flag PI_REQUEUE_INPROGRESS, which notifies the proxy task that it is being requeued, and will handle things appropriately. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2023-03-25 04:21:24 +03:00
Thomas Gleixner	83fddd55bf	rtmutex: Handle the various new futex race conditions RT opens a few new interesting race conditions in the rtmutex/futex combo due to futex hash bucket lock being a 'sleeping' spinlock and therefor not disabling preemption. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2023-03-25 04:21:24 +03:00
Sebastian Andrzej Siewior	597fbf406e	net/core: use local_bh_disable() in netif_rx_ni() In 2004 netif_rx_ni() gained a preempt_disable() section around netif_rx() and its do_softirq() + testing for it. The do_softirq() part is required because netif_rx() raises the softirq but does not invoke it. The preempt_disable() is required to remain on the same CPU which added the skb to the per-CPU list. All this can be avoided be putting this into a local_bh_disable()ed section. The local_bh_enable() part will invoke do_softirq() if required. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2023-03-25 04:21:24 +03:00

1 2 3 4 5 ...

892082 Commits All Branches Search

892082 Commits

All Branches