linux

Commit Graph

Author	SHA1	Message	Date
Nobody	0ce81bc93c	Linux 3.14 with MCST patches	2020-10-14 02:51:01 +03:00
Nobody	3878bb6f09	unionfs: apply unionfs patch	2020-10-14 01:17:03 +03:00
Frederic Weisbecker	fc9c01480b	irq_work: Split raised and lazy lists commit `b93e0b8fa8` upstream. An irq work can be handled from two places: from the tick if the work carries the "lazy" flag and the tick is periodic, or from a self IPI. We merge all these works in a single list and we use some per cpu latch to avoid raising a self-IPI when one is already pending. Now we could do away with this ugly latch if only the list was only made of non-lazy works. Just enqueueing a work on the empty list would be enough to know if we need to raise an IPI or not. Also we are going to implement remote irq work queuing. Then the per CPU latch will need to become atomic in the global scope. That's too bad because, here as well, just enqueueing a work on an empty list of non-lazy works would be enough to know if we need to raise an IPI or not. So lets take a way out of this: split the works in two distinct lists, one for the works that can be handled by the next tick and another one for those handled by the IPI. Just checking if the latter is empty when we queue a new work is enough to know if we need to raise an IPI. Suggested-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Conflicts: kernel/irq_work.c Merged in some changes from 4.0-rt that added the irq_work_tick() code, and also has the raised_list called from hardirq context and the lazy_list always from softirq context (which is threaded on RT) Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:25 +03:00
Peter Zijlstra	f25a5ec582	irq_work: Introduce arch_irq_work_has_interrupt() commit `c5c38ef3d7` upstream. The nohz full code needs irq work to trigger its own interrupt so that the subsystem can work even when the tick is stopped. Lets introduce arch_irq_work_has_interrupt() that archs can override to tell about their support for this ability. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2020-10-14 00:59:25 +03:00
Marcelo Tosatti	ff8bba4038	KVM: use simple waitqueue for vcpu->wq The problem: On -RT, an emulated LAPIC timer instances has the following path: 1) hard interrupt 2) ksoftirqd is scheduled 3) ksoftirqd wakes up vcpu thread 4) vcpu thread is scheduled This extra context switch introduces unnecessary latency in the LAPIC path for a KVM guest. The solution: Allow waking up vcpu thread from hardirq context, thus avoiding the need for ksoftirqd to be scheduled. Normal waitqueues make use of spinlocks, which on -RT are sleepable locks. Therefore, waking up a waitqueue waiter involves locking a sleeping lock, which is not allowed from hard interrupt context. cyclictest command line: This patch reduces the average latency in my tests from 14us to 11us. Cc: stable-rt@vger.kernel.org Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:25 +03:00
Kevin Hao	085ce53a6b	netpoll: guard the access to dev->npinfo with rcu_read_lock/unlock_bh() for CONFIG_PREEMPT_RT_FULL=y For vanilla kernel we don't need to invoke rcu_read_lock/unlock_bh explicitly to mark an RCU-bh critical section in the softirq context because bh is already disabled in this case. But for a rt kernel, the commit ("rcu: Merge RCU-bh into RCU-preempt") implements the RCU-bh in term of RCU-preempt. So we have to use rcu_read_lock/unlock_bh() to mark an RCU-bh critical section even in a softirq context. Otherwise we will get a call trace like this: include/linux/netpoll.h:90 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 1 lock held by irq/177-eth0_g0/129: #0: (&per_cpu(local_softirq_locks[i], __cpu).lock){+.+...}, at: [<8002f544>] do_current_softirqs+0x12c/0x5ec stack backtrace: CPU: 0 PID: 129 Comm: irq/177-eth0_g0 Not tainted 3.14.23 #11 [<80018c0c>] (unwind_backtrace) from [<800138b0>] (show_stack+0x20/0x24) [<800138b0>] (show_stack) from [<8075c3bc>] (dump_stack+0x84/0xd0) [<8075c3bc>] (dump_stack) from [<8008111c>] (lockdep_rcu_suspicious+0xe8/0x11c) [<8008111c>] (lockdep_rcu_suspicious) from [<805e94e8>] (dev_gro_receive+0x240/0x724) [<805e94e8>] (dev_gro_receive) from [<805e9c34>] (napi_gro_receive+0x3c/0x1e8) [<805e9c34>] (napi_gro_receive) from [<804b01ac>] (gfar_clean_rx_ring+0x2d4/0x624) [<804b01ac>] (gfar_clean_rx_ring) from [<804b078c>] (gfar_poll_rx_sq+0x58/0xe8) [<804b078c>] (gfar_poll_rx_sq) from [<805eada8>] (net_rx_action+0x1c8/0x418) [<805eada8>] (net_rx_action) from [<8002f62c>] (do_current_softirqs+0x214/0x5ec) [<8002f62c>] (do_current_softirqs) from [<8002fa88>] (__local_bh_enable+0x84/0x9c) [<8002fa88>] (__local_bh_enable) from [<8002fab8>] (local_bh_enable+0x18/0x1c) [<8002fab8>] (local_bh_enable) from [<80093924>] (irq_forced_thread_fn+0x50/0x74) [<80093924>] (irq_forced_thread_fn) from [<80093c30>] (irq_thread+0x158/0x1c4) [<80093c30>] (irq_thread) from [<800555b8>] (kthread+0xd4/0xe8) [<800555b8>] (kthread) from [<8000ee88>] (ret_from_fork+0x14/0x20) Signed-off-by: Kevin Hao <kexin.hao@windriver.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:25 +03:00
Steven Rostedt (Red Hat)	f13b372196	Revert "timers: do not raise softirq unconditionally" This reverts commit 891f510568343d93c5aa2f477b6bebe009b48f05. An issue arisen that if a rt_mutex (spin_lock converted to a mutex in PREEMPT_RT) is taken in hard interrupt context, it could cause a false deadlock detection and trigger a BUG_ON() from the return value of task_blocks_on_rt_mutex() in rt_spin_lock_slowlock(). The problem is this: CPU0 CPU1 ---- ---- spin_lock(A) spin_lock(A) [ blocks, but spins as owner on CPU 0 is running ] <interrupt> spin_trylock(B) [ succeeds ] spin_lock(B) <blocks> Now the deadlock detection triggers and follows the locking: Task X (on CPU0) blocked on spinlock B owned by task Y on CPU1 (via the interrupt taking it with a try lock) The owner of B (Y) is blocked on spin_lock A (still spinning) A is owned by task X (self). DEADLOCK detected! BUG_ON triggered. This was caused by the code to try to not raise softirq unconditionally to allow NO_HZ_FULL to work. Unfortunately, reverting that patch causes NO_HZ_FULL to break again, but that's still better than triggering a BUG_ON(). Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Conflicts: kernel/timer.c	2020-10-14 00:59:25 +03:00
Sebastian Andrzej Siewior	d12e92f871	Revert "rwsem-rt: Do not allow readers to nest" This behaviour is required by cpufreq and its logic is "okay": It does a read_lock followed by a try_read_lock. Lockdep warns if one try a read_lock twice in -RT and vanilla so it should be good. We still only allow multiple readers as long as it is in the same process. Cc: stable-rt@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:24 +03:00
Daniel Wagner	7dafcfee7b	work-simple: Simple work queue implemenation Provides a framework for enqueuing callbacks from irq context PREEMPT_RT_FULL safe. The callbacks are executed in kthread context. Bases on wait-simple. Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:24 +03:00
Thomas Gleixner	0f48389512	rtmutex: Confine deadlock logic to futex upstream commit: `c051b21f71` The deadlock logic is only required for futexes. Remove the extra arguments for the public functions and also for the futex specific ones which get always called with deadlock detection enabled. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Conflicts: include/linux/rtmutex.h kernel/locking/rtmutex.c	2020-10-14 00:59:23 +03:00
Thomas Gleixner	0d3b12ccc6	completion: Use simple wait queues Completions have no long lasting callbacks and therefor do not need the complex waitqueue variant. Use simple waitqueues which reduces the contention on the waitqueue lock. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:23 +03:00
Paul Gortmaker	ca0179a36a	simple-wait: rename and export the equivalent of waitqueue_active() The function "swait_head_has_waiters()" was internalized into wait-simple.c but it parallels the waitqueue_active of normal waitqueue support. Given that there are over 150 waitqueue_active users in drivers/ fs/ kernel/ and the like, lets make it globally visible, and rename it to parallel the waitqueue_active accordingly. We'll need to do this if we expect to expand its usage beyond RT. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:23 +03:00
Thomas Gleixner	65870d64d3	wait-simple: Rework for use with completions Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:23 +03:00
Thomas Gleixner	99076b3731	wait-simple: Simple waitqueue implementation wait_queue is a swiss army knife and in most of the cases the complexity is not needed. For RT waitqueues are a constant source of trouble as we can't convert the head lock to a raw spinlock due to fancy and long lasting callbacks. Provide a slim version, which allows RT to replace wait queues. This should go mainline as well, as it lowers memory consumption and runtime overhead. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> smp_mb() added by Steven Rostedt to fix a race condition with swait wakeups vs adding items to the list.	2020-10-14 00:59:22 +03:00
Sebastian Andrzej Siewior	e49d664aa7	wait.h: include atomic.h \| CC init/main.o \|In file included from include/linux/mmzone.h:9:0, \| from include/linux/gfp.h:4, \| from include/linux/kmod.h:22, \| from include/linux/module.h:13, \| from init/main.c:15: \|include/linux/wait.h: In function ‘wait_on_atomic_t’: \|include/linux/wait.h:982:2: error: implicit declaration of function ‘atomic_read’ [-Werror=implicit-function-declaration] \| if (atomic_read(val) == 0) \| ^ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:22 +03:00
Thomas Gleixner	b33545443e	sched: Add support for lazy preemption It has become an obsession to mitigate the determinism vs. throughput loss of RT. Looking at the mainline semantics of preemption points gives a hint why RT sucks throughput wise for ordinary SCHED_OTHER tasks. One major issue is the wakeup of tasks which are right away preempting the waking task while the waking task holds a lock on which the woken task will block right after having preempted the wakee. In mainline this is prevented due to the implicit preemption disable of spin/rw_lock held regions. On RT this is not possible due to the fully preemptible nature of sleeping spinlocks. Though for a SCHED_OTHER task preempting another SCHED_OTHER task this is really not a correctness issue. RT folks are concerned about SCHED_FIFO/RR tasks preemption and not about the purely fairness driven SCHED_OTHER preemption latencies. So I introduced a lazy preemption mechanism which only applies to SCHED_OTHER tasks preempting another SCHED_OTHER task. Aside of the existing preempt_count each tasks sports now a preempt_lazy_count which is manipulated on lock acquiry and release. This is slightly incorrect as for lazyness reasons I coupled this on migrate_disable/enable so some other mechanisms get the same treatment (e.g. get_cpu_light). Now on the scheduler side instead of setting NEED_RESCHED this sets NEED_RESCHED_LAZY in case of a SCHED_OTHER/SCHED_OTHER preemption and therefor allows to exit the waking task the lock held region before the woken task preempts. That also works better for cross CPU wakeups as the other side can stay in the adaptive spinning loop. For RT class preemption there is no change. This simply sets NEED_RESCHED and forgoes the lazy preemption counter. Initial test do not expose any observable latency increasement, but history shows that I've been proven wrong before :) The lazy preemption mode is per default on, but with CONFIG_SCHED_DEBUG enabled it can be disabled via: # echo NO_PREEMPT_LAZY >/sys/kernel/debug/sched_features and reenabled via # echo PREEMPT_LAZY >/sys/kernel/debug/sched_features The test results so far are very machine and workload dependent, but there is a clear trend that it enhances the non RT workload performance. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:22 +03:00
Thomas Gleixner	f4d0ede105	softirq: Split softirq locks The 3.x RT series removed the split softirq implementation in favour of pushing softirq processing into the context of the thread which raised it. Though this prevents us from handling the various softirqs at different priorities. Now instead of reintroducing the split softirq threads we split the locks which serialize the softirq processing. If a softirq is raised in context of a thread, then the softirq is noted on a per thread field, if the thread is in a bh disabled region. If the softirq is raised from hard interrupt context, then the bit is set in the flag field of ksoftirqd and ksoftirqd is invoked. When a thread leaves a bh disabled region, then it tries to execute the softirqs which have been raised in its own context. It acquires the per softirq / per cpu lock for the softirq and then checks, whether the softirq is still pending in the per cpu local_softirq_pending() field. If yes, it runs the softirq. If no, then some other task executed it already. This allows for zero config softirq elevation in the context of user space tasks or interrupt threads. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:22 +03:00
Thomas Gleixner	2285aafeb8	softirq: Make serving softirqs a task flag Avoid the percpu softirq_runner pointer magic by using a task flag. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:22 +03:00
Thomas Gleixner	44eb2c6fd2	softirq: Check preemption after reenabling interrupts raise_softirq_irqoff() disables interrupts and wakes the softirq daemon, but after reenabling interrupts there is no preemption check, so the execution of the softirq thread might be delayed arbitrarily. In principle we could add that check to local_irq_enable/restore, but that's overkill as the rasie_softirq_irqoff() sections are the only ones which show this behaviour. Reported-by: Carsten Emde <cbe@osadl.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org	2020-10-14 00:59:21 +03:00
Thomas Gleixner	f46faf6ae0	net: netfilter: Serialize xt_write_recseq sections on RT The netfilter code relies only on the implicit semantics of local_bh_disable() for serializing wt_write_recseq sections. RT breaks that and needs explicit serialization here. Reported-by: Peter LaDow <petela@gocougs.wsu.edu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org	2020-10-14 00:59:21 +03:00
Steven Rostedt	17618e0e1f	cpu/rt: Rework cpu down for PREEMPT_RT Bringing a CPU down is a pain with the PREEMPT_RT kernel because tasks can be preempted in many more places than in non-RT. In order to handle per_cpu variables, tasks may be pinned to a CPU for a while, and even sleep. But these tasks need to be off the CPU if that CPU is going down. Several synchronization methods have been tried, but when stressed they failed. This is a new approach. A sync_tsk thread is still created and tasks may still block on a lock when the CPU is going down, but how that works is a bit different. When cpu_down() starts, it will create the sync_tsk and wait on it to inform that current tasks that are pinned on the CPU are no longer pinned. But new tasks that are about to be pinned will still be allowed to do so at this time. Then the notifiers are called. Several notifiers will bring down tasks that will enter these locations. Some of these tasks will take locks of other tasks that are on the CPU. If we don't let those other tasks continue, but make them block until CPU down is done, the tasks that the notifiers are waiting on will never complete as they are waiting for the locks held by the tasks that are blocked. Thus we still let the task pin the CPU until the notifiers are done. After the notifiers run, we then make new tasks entering the pinned CPU sections grab a mutex and wait. This mutex is now a per CPU mutex in the hotplug_pcp descriptor. To help things along, a new function in the scheduler code is created called migrate_me(). This function will try to migrate the current task off the CPU this is going down if possible. When the sync_tsk is created, all tasks will then try to migrate off the CPU going down. There are several cases that this wont work, but it helps in most cases. After the notifiers are called and if a task can't migrate off but enters the pin CPU sections, it will be forced to wait on the hotplug_pcp mutex until the CPU down is complete. Then the scheduler will force the migration anyway. Also, I found that THREAD_BOUND need to also be accounted for in the pinned CPU, and the migrate_disable no longer treats them special. This helps fix issues with ksoftirqd and workqueue that unbind on CPU down. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:21 +03:00
Nicholas Mc Guire	7f228806ab	seqlock: consolidate spin_lock/unlock waiting with spin_unlock_wait since `c2f21ce` ("locking: Implement new raw_spinlock") include/linux/spinlock.h includes spin_unlock_wait() to wait for a concurren holder of a lock. this patch just moves over to that API. spin_unlock_wait covers both raw_spinlock_t and spinlock_t so it should be safe here as well. the added rt-variant of read_seqbegin in include/linux/seqlock.h that is being modified, was introduced by patch: seqlock-prevent-rt-starvation.patch behavior should be unchanged. Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:21 +03:00
Thomas Gleixner	8ef1b9e61c	seqlock: Prevent rt starvation If a low prio writer gets preempted while holding the seqlock write locked, a high prio reader spins forever on RT. To prevent this let the reader grab the spinlock, so it blocks and eventually boosts the writer. This way the writer can proceed and endless spinning is prevented. For seqcount writers we disable preemption over the update code path. Thaanks to Al Viro for distangling some VFS code to make that possible. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org	2020-10-14 00:59:21 +03:00
Thomas Gleixner	41d88b03da	random: Make it work on rt Delegate the random insertion to the forced threaded interrupt handler. Store the return IP of the hard interrupt handler in the irq descriptor and feed it into the random generator as a source of entropy. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org	2020-10-14 00:59:21 +03:00
Steven Rostedt	ace81409f9	acpi/rt: Convert acpi_gbl_hardware lock back to a raw_spinlock_t We hit the following bug with 3.6-rt: [ 5.898990] BUG: scheduling while atomic: swapper/3/0/0x00000002 [ 5.898991] no locks held by swapper/3/0. [ 5.898993] Modules linked in: [ 5.898996] Pid: 0, comm: swapper/3 Not tainted 3.6.11-rt28.19.el6rt.x86_64.debug #1 [ 5.898997] Call Trace: [ 5.899011] [<ffffffff810804e7>] __schedule_bug+0x67/0x90 [ 5.899028] [<ffffffff81577923>] __schedule+0x793/0x7a0 [ 5.899032] [<ffffffff810b4e40>] ? debug_rt_mutex_print_deadlock+0x50/0x200 [ 5.899034] [<ffffffff81577b89>] schedule+0x29/0x70 [ 5.899036] BUG: scheduling while atomic: swapper/7/0/0x00000002 [ 5.899037] no locks held by swapper/7/0. [ 5.899039] [<ffffffff81578525>] rt_spin_lock_slowlock+0xe5/0x2f0 [ 5.899040] Modules linked in: [ 5.899041] [ 5.899045] [<ffffffff81579a58>] ? _raw_spin_unlock_irqrestore+0x38/0x90 [ 5.899046] Pid: 0, comm: swapper/7 Not tainted 3.6.11-rt28.19.el6rt.x86_64.debug #1 [ 5.899047] Call Trace: [ 5.899049] [<ffffffff81578bc6>] rt_spin_lock+0x16/0x40 [ 5.899052] [<ffffffff810804e7>] __schedule_bug+0x67/0x90 [ 5.899054] [<ffffffff8157d3f0>] ? notifier_call_chain+0x80/0x80 [ 5.899056] [<ffffffff81577923>] __schedule+0x793/0x7a0 [ 5.899059] [<ffffffff812f2034>] acpi_os_acquire_lock+0x1f/0x23 [ 5.899062] [<ffffffff810b4e40>] ? debug_rt_mutex_print_deadlock+0x50/0x200 [ 5.899068] [<ffffffff8130be64>] acpi_write_bit_register+0x33/0xb0 [ 5.899071] [<ffffffff81577b89>] schedule+0x29/0x70 [ 5.899072] [<ffffffff8130be13>] ? acpi_read_bit_register+0x33/0x51 [ 5.899074] [<ffffffff81578525>] rt_spin_lock_slowlock+0xe5/0x2f0 [ 5.899077] [<ffffffff8131d1fc>] acpi_idle_enter_bm+0x8a/0x28e [ 5.899079] [<ffffffff81579a58>] ? _raw_spin_unlock_irqrestore+0x38/0x90 [ 5.899081] [<ffffffff8107e5da>] ? this_cpu_load+0x1a/0x30 [ 5.899083] [<ffffffff81578bc6>] rt_spin_lock+0x16/0x40 [ 5.899087] [<ffffffff8144c759>] cpuidle_enter+0x19/0x20 [ 5.899088] [<ffffffff8157d3f0>] ? notifier_call_chain+0x80/0x80 [ 5.899090] [<ffffffff8144c777>] cpuidle_enter_state+0x17/0x50 [ 5.899092] [<ffffffff812f2034>] acpi_os_acquire_lock+0x1f/0x23 [ 5.899094] [<ffffffff8144d1a1>] cpuidle899101] [<ffffffff8130be13>] ? As the acpi code disables interrupts in acpi_idle_enter_bm, and calls code that grabs the acpi lock, it causes issues as the lock is currently in RT a sleeping lock. The lock was converted from a raw to a sleeping lock due to some previous issues, and tests that showed it didn't seem to matter. Unfortunately, it did matter for one of our boxes. This patch converts the lock back to a raw lock. I've run this code on a few of my own machines, one being my laptop that uses the acpi quite extensively. I've been able to suspend and resume without issues. [ tglx: Made the change exclusive for acpi_gbl_hardware_lock ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: John Kacur <jkacur@gmail.com> Cc: Clark Williams <clark@redhat.com> Link: http://lkml.kernel.org/r/1360765565.23152.5.camel@gandalf.local.home Cc: stable-rt@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:20 +03:00
Thomas Gleixner	4fa1901f89	arm-enable-highmem-for-rt.patch Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:20 +03:00
Peter Zijlstra	c9987e379e	mm, rt: kmap_atomic scheduling In fact, with migrate_disable() existing one could play games with kmap_atomic. You could save/restore the kmap_atomic slots on context switch (if there are any in use of course), this should be esp easy now that we have a kmap_atomic stack. Something like the below.. it wants replacing all the preempt_disable() stuff with pagefault_disable() && migrate_disable() of course, but then you can flip kmaps around like below. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> [dvhart@linux.intel.com: build fix] Link: http://lkml.kernel.org/r/1311842631.5890.208.camel@twins [tglx@linutronix.de: Get rid of the per cpu variable and store the idx and the pte content right away in the task struct. Shortens the context switch code. ]	2020-10-14 00:59:20 +03:00
Jason Wessel	0c57f4abac	kgdb/serial: Short term workaround On 07/27/2011 04:37 PM, Thomas Gleixner wrote: > - KGDB (not yet disabled) is reportedly unusable on -rt right now due > to missing hacks in the console locking which I dropped on purpose. > To work around this in the short term you can use this patch, in addition to the clocksource watchdog patch that Thomas brewed up. Comments are welcome of course. Ultimately the right solution is to change separation between the console and the HW to have a polled mode + work queue so as not to introduce any kind of latency. Thanks, Jason.	2020-10-14 00:59:20 +03:00
Carsten Emde	2840ec0444	net: sysrq via icmp There are (probably rare) situations when a system crashed and the system console becomes unresponsive but the network icmp layer still is alive. Wouldn't it be wonderful, if we then could submit a sysreq command via ping? This patch provides this facility. Please consult the updated documentation Documentation/sysrq.txt for details. Signed-off-by: Carsten Emde <C.Emde@osadl.org>	2020-10-14 00:59:20 +03:00
Sebastian Andrzej Siewior	6b2a704800	irq_work: allow certain work in hard irq context irq_work is processed in softirq context on -RT because we want to avoid long latencies which might arise from processing lots of perf events. The noHZ-full mode requires its callback to be called from real hardirq context (commit `76c24fb` ("nohz: New APIs to re-evaluate the tick on full dynticks CPUs")). If it is called from a thread context we might get wrong results for checks like "is_idle_task(current)". This patch introduces a second list (hirq_work_list) which will be used if irq_work_run() has been invoked from hardirq context and process only work items marked with IRQ_WORK_HARD_IRQ. This patch also removes arch_irq_work_raise() from sparc & powerpc like it is already done for x86. Atleast for powerpc it is somehow superfluous because it is called from the timer interrupt which should invoke update_process_times(). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:19 +03:00
Thomas Gleixner	26af336d12	use skbufhead with raw lock Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:19 +03:00
Thomas Gleixner	0eb8f356f4	jump-label-rt.patch Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:19 +03:00
Thomas Gleixner	3126470072	idr: Use local lock instead of preempt enable/disable We need to protect the per cpu variable and prevent migration. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:19 +03:00
Steven Rostedt	58463fcb2b	rt: Make cpu_chill() use hrtimer instead of msleep() Ulrich Obergfell pointed out that cpu_chill() calls msleep() which is woken up by the ksoftirqd running the TIMER softirq. But as the cpu_chill() is called from softirq context, it may block the ksoftirqd() from running, in which case, it may never wake up the msleep() causing the deadlock. I checked the vmcore, and irq/74-qla2xxx is stuck in the msleep() call, running on CPU 8. The one ksoftirqd that is stuck, happens to be the one that runs on CPU 8, and it is blocked on a lock held by irq/74-qla2xxx. As that ksoftirqd is the one that will wake up irq/74-qla2xxx, and it happens to be blocked on a lock that irq/74-qla2xxx holds, we have our deadlock. The solution is not to convert the cpu_chill() back to a cpu_relax() as that will re-create a possible live lock that the cpu_chill() fixed earlier, and may also leave this bug open on other softirqs. The fix is to remove the dependency on ksoftirqd from cpu_chill(). That is, instead of calling msleep() that requires ksoftirqd to wake it up, use the hrtimer_nanosleep() code that does the wakeup from hard irq context. \|Looks to be the lock of the block softirq. I don't have the core dump \|anymore, but from what I could tell the ksoftirqd was blocked on the \|block softirq lock, where the block softirq handler did a msleep \|(called by the qla2xxx interrupt handler). \| \|Looking at trigger_softirq() in block/blk-softirq.c, it can do a \|smp_callfunction() to another cpu to run the block softirq. If that \|happens to be the cpu where the qla2xx irq handler is doing the block \|softirq and is in a middle of a msleep(), I believe the ksoftirqd will \|try to run the softirq. If it does that, then BOOM, it's deadlocked \|because the ksoftirqd will never run the timer softirq either. \|I should have also stated that it was only one lock that was involved. \|But the lock owner was doing a msleep() that requires a wakeup by \|ksoftirqd to continue. If ksoftirqd happens to be blocked on a lock \|held by the msleep() caller, then you have your deadlock. \| \|It's best not to have any softirqs going to sleep requiring another \|softirq to wake it up. Note, if we ever require a timer softirq to do a \|cpu_chill() it will most definitely hit this deadlock. Cc: stable-rt@vger.kernel.org Found-by: Ulrich Obergfell <uobergfe@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> [bigeasy: add the 4 \| chapters from email] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:18 +03:00
Thomas Gleixner	b1a75a3071	rt: Introduce cpu_chill() Retry loops on RT might loop forever when the modifying side was preempted. Add cpu_chill() to replace cpu_relax(). cpu_chill() defaults to cpu_relax() for non RT. On RT it puts the looping task to sleep for a tick so the preempted task can make progress. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org	2020-10-14 00:59:18 +03:00
Mike Galbraith	8be91122a8	stomp-machine: create lg_global_trylock_relax() primitive Create lg_global_trylock_relax() for use by stopper thread when it cannot schedule, to deal with stop_cpus_lock, which is now an lglock. Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:18 +03:00
Thomas Gleixner	121435cad6	lglocks-rt.patch Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:17 +03:00
Paul E. McKenney	010ec70629	rcu: Make ksoftirqd do RCU quiescent states Implementing RCU-bh in terms of RCU-preempt makes the system vulnerable to network-based denial-of-service attacks. This patch therefore makes __do_softirq() invoke rcu_bh_qs(), but only when __do_softirq() is running in ksoftirqd context. A wrapper layer in interposed so that other calls to __do_softirq() avoid invoking rcu_bh_qs(). The underlying function __do_softirq_common() does the actual work. The reason that rcu_bh_qs() is bad in these non-ksoftirqd contexts is that there might be a local_bh_enable() inside an RCU-preempt read-side critical section. This local_bh_enable() can invoke __do_softirq() directly, so if __do_softirq() were to invoke rcu_bh_qs() (which just calls rcu_preempt_qs() in the PREEMPT_RT_FULL case), there would be an illegal RCU-preempt quiescent state in the middle of an RCU-preempt read-side critical section. Therefore, quiescent states can only happen in cases where __do_softirq() is invoked directly from ksoftirqd. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20111005184518.GA21601@linux.vnet.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:17 +03:00
Thomas Gleixner	04b4e502bb	rcu: Merge RCU-bh into RCU-preempt The Linux kernel has long RCU-bh read-side critical sections that intolerably increase scheduling latency under mainline's RCU-bh rules, which include RCU-bh read-side critical sections being non-preemptible. This patch therefore arranges for RCU-bh to be implemented in terms of RCU-preempt for CONFIG_PREEMPT_RT_FULL=y. This has the downside of defeating the purpose of RCU-bh, namely, handling the case where the system is subjected to a network-based denial-of-service attack that keeps at least one CPU doing full-time softirq processing. This issue will be fixed by a later commit. The current commit will need some work to make it appropriate for mainline use, for example, it needs to be extended to cover Tiny RCU. [ paulmck: Added a useful changelog ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20111005185938.GA20403@linux.vnet.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:17 +03:00
Sebastian Andrzej Siewior	493d266dbd	rtmutex: use a trylock for waiter lock in trylock Mike Galbraith captered the following: \| >#11 [ffff88017b243e90] _raw_spin_lock at ffffffff815d2596 \| >#12 [ffff88017b243e90] rt_mutex_trylock at ffffffff815d15be \| >#13 [ffff88017b243eb0] get_next_timer_interrupt at ffffffff81063b42 \| >#14 [ffff88017b243f00] tick_nohz_stop_sched_tick at ffffffff810bd1fd \| >#15 [ffff88017b243f70] tick_nohz_irq_exit at ffffffff810bd7d2 \| >#16 [ffff88017b243f90] irq_exit at ffffffff8105b02d \| >#17 [ffff88017b243fb0] reschedule_interrupt at ffffffff815db3dd \| >--- <IRQ stack> --- \| >#18 [ffff88017a2a9bc8] reschedule_interrupt at ffffffff815db3dd \| > [exception RIP: task_blocks_on_rt_mutex+51] \| >#19 [ffff88017a2a9ce0] rt_spin_lock_slowlock at ffffffff815d183c \| >#20 [ffff88017a2a9da0] lock_timer_base.isra.35 at ffffffff81061cbf \| >#21 [ffff88017a2a9dd0] schedule_timeout at ffffffff815cf1ce \| >#22 [ffff88017a2a9e50] rcu_gp_kthread at ffffffff810f9bbb \| >#23 [ffff88017a2a9ed0] kthread at ffffffff810796d5 \| >#24 [ffff88017a2a9f50] ret_from_fork at ffffffff815da04c lock_timer_base() does a try_lock() which deadlocks on the waiter lock not the lock itself. This patch takes the waiter_lock with trylock so it should work from interrupt context as well. If the fastpath doesn't work and the waiter_lock itself is taken then it seems that the lock itself taken. This patch also adds "rt_spin_unlock_after_trylock_in_irq" to keep lockdep happy. If we managed to take the wait_lock in the first place we should also be able to take it in the unlock path. Cc: stable-rt@vger.kernel.org Reported-by: Mike Galbraith <bitbucket@online.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:17 +03:00
Thomas Gleixner	d3b8578de8	timers: do not raise softirq unconditionally Mike, On Thu, 7 Nov 2013, Mike Galbraith wrote: > On Thu, 2013-11-07 at 04:26 +0100, Mike Galbraith wrote: > > On Wed, 2013-11-06 at 18:49 +0100, Thomas Gleixner wrote: > > > > I bet you are trying to work around some of the side effects of the > > > occasional tick which is still necessary despite of full nohz, right? > > > > Nope, I wanted to check out cost of nohz_full for rt, and found that it > > doesn't work at all instead, looked, and found that the sole running > > task has just awakened ksoftirqd when it wants to shut the tick down, so > > that shutdown never happens. > > Like so in virgin 3.10-rt. Box is x3550 M3 booted nowatchdog > rcu_nocbs=1-3 nohz_full=1-3, and CPUs1-3 are completely isolated via > cpusets as well. well, that very same problem is in mainline if you add "threadirqs" to the command line. But we can be smart about this. The untested patch below should address that issue. If that works on mainline we can adapt it for RT (needs a trylock(&base->lock) there). Though it's not a full solution. It needs some thought versus the softirq code of timers. Assume we have only one timer queued 1000 ticks into the future. So this change will cause the timer softirq not to be called until that timer expires and then the timer softirq is going to do 1000 loops until it catches up with jiffies. That's anything but pretty ... What worries me more is this one: pert-5229 [003] d..h1.. 684.482618: softirq_raise: vec=9 [action=RCU] The CPU has no callbacks as you shoved them over to cpu 0, so why is the RCU softirq raised? Thanks, tglx ------------------ Message-id: <alpine.DEB.2.02.1311071158350.23353@ionos.tec.linutronix.de> \|CONFIG_NO_HZ_FULL + CONFIG_PREEMPT_RT_FULL = nogo Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:17 +03:00
Thomas Gleixner	8c09ef3177	timer-handle-idle-trylock-in-get-next-timer-irq.patch Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:17 +03:00
John Kacur	d4001bfc52	rwlocks: Fix section mismatch This fixes the following build error for the preempt-rt kernel. make kernel/fork.o CC kernel/fork.o kernel/fork.c:90: error: section of tasklist_lock conflicts with previous declaration make[2]: * [kernel/fork.o] Error 1 make[1]: * [kernel/fork.o] Error 2 The rt kernel cache aligns the RWLOCK in DEFINE_RWLOCK by default. The non-rt kernels explicitly cache align only the tasklist_lock in kernel/fork.c That can create a build conflict. This fixes the build problem by making the non-rt kernels cache align RWLOCKs by default. The side effect is that the other RWLOCKs are also cache aligned for non-rt. This is a short term solution for rt only. The longer term solution would be to push the cache aligned DEFINE_RWLOCK to mainline. If there are objections, then we could create a DEFINE_RWLOCK_CACHE_ALIGNED or something of that nature. Comments? Objections? Signed-off-by: John Kacur <jkacur@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/alpine.LFD.2.00.1109191104010.23118@localhost6.localdomain6 Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:17 +03:00
Nicholas Mc Guire	a9200d7cde	rt: Cleanup of unnecessary do while 0 in read/write _lock() With the migration pushdonw a few of the do{ }while(0) loops became obsolete but got left over - this patch only removes this fallout. Patch applies on top of 3.12.9-rt13 Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:17 +03:00
Nicholas Mc Guire	b939a242f6	read_lock migrate_disable pushdown to rt_read_lock pushdown of migrate_disable/enable from read_lock to the rt_read_lock api level general mapping to mutexes: read_lock `-> rt_read_lock `-> __spin_lock (the sleeping spin locks) `-> rt_mutex The real read_lock* mapping: read_lock_irqsave -. read_lock_irq `-> rt_read_lock_irqsave() `->read_lock ---------. \ read_lock_bh ------+ \ `--> rt_read_lock() if (rt_mutex_owner(lock) != current){ `-> __rt_spin_lock() rt_spin_lock_fastlock() `->rt_mutex_cmpxchg() migrate_disable() } rwlock->read_depth++; read_trylock mapping: read_trylock `-> rt_read_trylock if (rt_mutex_owner(lock) != current){ `-> rt_mutex_trylock() rt_mutex_fasttrylock() rt_mutex_cmpxchg() migrate_disable() } rwlock->read_depth++; read_unlock* mapping: read_unlock_bh --------+ read_unlock_irq -------+ read_unlock_irqrestore + read_unlock -----------+ `-> rt_read_unlock() if(--rwlock->read_depth==0){ `-> __rt_spin_unlock() rt_spin_lock_fastunlock() `-> rt_mutex_cmpxchg() migrate_disable() } So calls to migrate_disable/enable() are better placed at the rt_read_* level of lock/trylock/unlock as all of the read_lock API has this as a common path. In the rt_read* API of lock/trylock/unlock the nesting level is already being recorded in rwlock->read_depth, so we can push down the migrate disable/enable to that level and condition it on the read_depth going from 0 to 1 -> migrate_disable and 1 to 0 -> migrate_enable. This eliminates the recursive calls that were needed when migrate_disable/enable was done at the read_lock level. The approach to read_*_bh also eliminates the concerns raised with the regards to api inbalances (read_lock_bh -> read_unlock+local_bh_enable) Tested-by: Carsten Emde <C.Emde@osadl.org> Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:17 +03:00
Nicholas Mc Guire	8a5392d84d	write_lock migrate_disable pushdown to rt_write_lock pushdown of migrate_disable/enable from write_lock to the rt_write_lock api level general mapping of write_lock to mutexes: write_lock `-> rt_write_lock `-> __spin_lock (the sleeping __spin_lock) `-> rt_mutex write_locks are non-recursive so we have two lock chains to consider - write_trylock/write_unlock - write_lock/wirte_unlock for both paths the migration_disable/enable must be balanced. write_trylock* mapping: write_trylock_irqsave `-> rt_write_trylock_irqsave write_trylock \ `--------> rt_write_trylock ret = rt_mutex_trylock rt_mutex_fasttrylock rt_mutex_cmpxchg if (ret) migrate_disable write_lock* mapping: write_lock_irqsave `-> rt_write_lock_irqsave write_lock_irq -> write_lock ----. \ write_lock_bh -+ \ `-> rt_write_lock __rt_spin_lock() rt_spin_lock_fastlock() rt_mutex_cmpxchg() migrate_disable() write_unlock* mapping: write_unlock_irqrestore. write_unlock_bh -------+ write_unlock_irq -> write_unlock ----------+ `-> rt_write_unlock() __rt_spin_unlock() rt_spin_lock_fastunlock() rt_mutex_cmpxchg() migrate_enable() So calls to migrate_disable/enable() are better placed at the rt_write_* level of lock/trylock/unlock as all of the write_lock API has this as a common path. This approach to write_*_bh also eliminates the concerns raised with regards to api inbalances (write_lock_bh -> write_unlock+local_bh_enable) Tested-by: Carsten Emde <C.Emde@osadl.org> Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:17 +03:00
Steven Rostedt (Red Hat)	d5af4a6a85	rwsem-rt: Do not allow readers to nest The readers of mainline rwsems are not allowed to nest, the rwsems in the PREEMPT_RT kernel should not nest either. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:16 +03:00
Thomas Gleixner	6c23aac84d	rt: Add the preempt-rt lock replacement APIs Map spinlocks, rwlocks, rw_semaphores and semaphores to the rt_mutex based locking functions for preempt-rt. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:16 +03:00
Thomas Gleixner	d2cbb5602d	rwsem-add-rt-variant.patch Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:16 +03:00
Thomas Gleixner	be6c3445a5	rt-add-rt-to-mutex-headers.patch Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:16 +03:00

1 2 3 4 5 ...

64767 Commits