linux

Commit Graph

Author	SHA1	Message	Date
Steven Rostedt	ace81409f9	acpi/rt: Convert acpi_gbl_hardware lock back to a raw_spinlock_t We hit the following bug with 3.6-rt: [ 5.898990] BUG: scheduling while atomic: swapper/3/0/0x00000002 [ 5.898991] no locks held by swapper/3/0. [ 5.898993] Modules linked in: [ 5.898996] Pid: 0, comm: swapper/3 Not tainted 3.6.11-rt28.19.el6rt.x86_64.debug #1 [ 5.898997] Call Trace: [ 5.899011] [<ffffffff810804e7>] __schedule_bug+0x67/0x90 [ 5.899028] [<ffffffff81577923>] __schedule+0x793/0x7a0 [ 5.899032] [<ffffffff810b4e40>] ? debug_rt_mutex_print_deadlock+0x50/0x200 [ 5.899034] [<ffffffff81577b89>] schedule+0x29/0x70 [ 5.899036] BUG: scheduling while atomic: swapper/7/0/0x00000002 [ 5.899037] no locks held by swapper/7/0. [ 5.899039] [<ffffffff81578525>] rt_spin_lock_slowlock+0xe5/0x2f0 [ 5.899040] Modules linked in: [ 5.899041] [ 5.899045] [<ffffffff81579a58>] ? _raw_spin_unlock_irqrestore+0x38/0x90 [ 5.899046] Pid: 0, comm: swapper/7 Not tainted 3.6.11-rt28.19.el6rt.x86_64.debug #1 [ 5.899047] Call Trace: [ 5.899049] [<ffffffff81578bc6>] rt_spin_lock+0x16/0x40 [ 5.899052] [<ffffffff810804e7>] __schedule_bug+0x67/0x90 [ 5.899054] [<ffffffff8157d3f0>] ? notifier_call_chain+0x80/0x80 [ 5.899056] [<ffffffff81577923>] __schedule+0x793/0x7a0 [ 5.899059] [<ffffffff812f2034>] acpi_os_acquire_lock+0x1f/0x23 [ 5.899062] [<ffffffff810b4e40>] ? debug_rt_mutex_print_deadlock+0x50/0x200 [ 5.899068] [<ffffffff8130be64>] acpi_write_bit_register+0x33/0xb0 [ 5.899071] [<ffffffff81577b89>] schedule+0x29/0x70 [ 5.899072] [<ffffffff8130be13>] ? acpi_read_bit_register+0x33/0x51 [ 5.899074] [<ffffffff81578525>] rt_spin_lock_slowlock+0xe5/0x2f0 [ 5.899077] [<ffffffff8131d1fc>] acpi_idle_enter_bm+0x8a/0x28e [ 5.899079] [<ffffffff81579a58>] ? _raw_spin_unlock_irqrestore+0x38/0x90 [ 5.899081] [<ffffffff8107e5da>] ? this_cpu_load+0x1a/0x30 [ 5.899083] [<ffffffff81578bc6>] rt_spin_lock+0x16/0x40 [ 5.899087] [<ffffffff8144c759>] cpuidle_enter+0x19/0x20 [ 5.899088] [<ffffffff8157d3f0>] ? notifier_call_chain+0x80/0x80 [ 5.899090] [<ffffffff8144c777>] cpuidle_enter_state+0x17/0x50 [ 5.899092] [<ffffffff812f2034>] acpi_os_acquire_lock+0x1f/0x23 [ 5.899094] [<ffffffff8144d1a1>] cpuidle899101] [<ffffffff8130be13>] ? As the acpi code disables interrupts in acpi_idle_enter_bm, and calls code that grabs the acpi lock, it causes issues as the lock is currently in RT a sleeping lock. The lock was converted from a raw to a sleeping lock due to some previous issues, and tests that showed it didn't seem to matter. Unfortunately, it did matter for one of our boxes. This patch converts the lock back to a raw lock. I've run this code on a few of my own machines, one being my laptop that uses the acpi quite extensively. I've been able to suspend and resume without issues. [ tglx: Made the change exclusive for acpi_gbl_hardware_lock ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: John Kacur <jkacur@gmail.com> Cc: Clark Williams <clark@redhat.com> Link: http://lkml.kernel.org/r/1360765565.23152.5.camel@gandalf.local.home Cc: stable-rt@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:20 +03:00
Thomas Gleixner	c2f971239d	dm: Make rt aware Use the BUG_ON_NORT variant for the irq_disabled() checks. RT has interrupts legitimately enabled here as we cant deadlock against the irq thread due to the "sleeping spinlocks" conversion. Reported-by: Luis Claudio R. Goncalves <lclaudio@uudg.org> Cc: stable-rt@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:20 +03:00
Sebastian Andrzej Siewior	f24bd91f4b	crypto: Reduce preempt disabled regions, more algos Don Estabrook reported \| kernel: WARNING: CPU: 2 PID: 858 at kernel/sched/core.c:2428 migrate_disable+0xed/0x100() \| kernel: WARNING: CPU: 2 PID: 858 at kernel/sched/core.c:2462 migrate_enable+0x17b/0x200() \| kernel: WARNING: CPU: 3 PID: 865 at kernel/sched/core.c:2428 migrate_disable+0xed/0x100() and his backtrace showed some crypto functions which looked fine. The problem is the following sequence: glue_xts_crypt_128bit() { blkcipher_walk_virt(); /* normal migrate_disable() / glue_fpu_begin(); / get atomic / while (nbytes) { __glue_xts_crypt_128bit(); blkcipher_walk_done(); / with nbytes = 0, migrate_enable() * while we are atomic / }; glue_fpu_end() / no longer atomic */ } and this is why the counter get out of sync and the warning is printed. The other problem is that we are non-preemptible between glue_fpu_begin() and glue_fpu_end() and the latency grows. To fix this, I shorten the FPU off region and ensure blkcipher_walk_done() is called with preemption enabled. This might hurt the performance because we now enable/disable the FPU state more often but we gain lower latency and the bug is gone. Cc: stable-rt@vger.kernel.org Reported-by: Don Estabrook <don.estabrook@gmail.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:20 +03:00
Peter Zijlstra	d6bfa1b4dd	x86: crypto: Reduce preempt disabled regions Restrict the preempt disabled regions to the actual floating point operations and enable preemption for the administrative actions. This is necessary on RT to avoid that kfree and other operations are called with preemption disabled. Reported-and-tested-by: Carsten Emde <cbe@osadl.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: stable-rt@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:20 +03:00
Thomas Gleixner	d9e0960e2b	scsi-fcoe-rt-aware.patch Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:20 +03:00
Thomas Gleixner	5494c91c48	x86-kvm-require-const-tsc-for-rt.patch Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:20 +03:00
Peter Zijlstra	76c00e307a	ipc/sem: Rework semaphore wakeups Current sysv sems have a weird ass wakeup scheme that involves keeping preemption disabled over a potential O(n^2) loop and busy waiting on that on other CPUs. Kill this and simply wake the task directly from under the sem_lock. This was discovered by a migrate_disable() debug feature that disallows: spin_lock(); preempt_disable(); spin_unlock() preempt_enable(); Cc: Manfred Spraul <manfred@colorfullife.com> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Reported-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Manfred Spraul <manfred@colorfullife.com> Link: http://lkml.kernel.org/r/1315994224.5040.1.camel@twins Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:20 +03:00
Thomas Gleixner	4fa1901f89	arm-enable-highmem-for-rt.patch Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:20 +03:00
Sebastian Andrzej Siewior	6f156f654f	arm/highmem: flush tlb on unmap The tlb should be flushed on unmap and thus make the mapping entry invalid. This is only done in the non-debug case which does not look right. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:20 +03:00
Sebastian Andrzej Siewior	ebc1791957	x86/highmem: add a "already used pte" check This is a copy from kmap_atomic_prot(). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:20 +03:00
Peter Zijlstra	c9987e379e	mm, rt: kmap_atomic scheduling In fact, with migrate_disable() existing one could play games with kmap_atomic. You could save/restore the kmap_atomic slots on context switch (if there are any in use of course), this should be esp easy now that we have a kmap_atomic stack. Something like the below.. it wants replacing all the preempt_disable() stuff with pagefault_disable() && migrate_disable() of course, but then you can flip kmaps around like below. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> [dvhart@linux.intel.com: build fix] Link: http://lkml.kernel.org/r/1311842631.5890.208.camel@twins [tglx@linutronix.de: Get rid of the per cpu variable and store the idx and the pte content right away in the task struct. Shortens the context switch code. ]	2020-10-14 00:59:20 +03:00
Clark Williams	7d273fd704	add /sys/kernel/realtime entry Add a /sys/kernel entry to indicate that the kernel is a realtime kernel. Clark says that he needs this for udev rules, udev needs to evaluate if its a PREEMPT_RT kernel a few thousand times and parsing uname output is too slow or so. Are there better solutions? Should it exist and return 0 on !-rt? Signed-off-by: Clark Williams <williams@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>	2020-10-14 00:59:20 +03:00
Jason Wessel	0c57f4abac	kgdb/serial: Short term workaround On 07/27/2011 04:37 PM, Thomas Gleixner wrote: > - KGDB (not yet disabled) is reportedly unusable on -rt right now due > to missing hacks in the console locking which I dropped on purpose. > To work around this in the short term you can use this patch, in addition to the clocksource watchdog patch that Thomas brewed up. Comments are welcome of course. Ultimately the right solution is to change separation between the console and the HW to have a polled mode + work queue so as not to introduce any kind of latency. Thanks, Jason.	2020-10-14 00:59:20 +03:00
Carsten Emde	2840ec0444	net: sysrq via icmp There are (probably rare) situations when a system crashed and the system console becomes unresponsive but the network icmp layer still is alive. Wouldn't it be wonderful, if we then could submit a sysreq command via ping? This patch provides this facility. Please consult the updated documentation Documentation/sysrq.txt for details. Signed-off-by: Carsten Emde <C.Emde@osadl.org>	2020-10-14 00:59:20 +03:00
Steven Rostedt	9d54583072	net: Avoid livelock in net_tx_action() on RT qdisc_lock is taken w/o disabling interrupts or bottom halfs. So code holding a qdisc_lock() can be interrupted and softirqs can run on the return of interrupt in !RT. The spin_trylock() in net_tx_action() makes sure, that the softirq does not deadlock. When the lock can't be acquired q is requeued and the NET_TX softirq is raised. That causes the softirq to run over and over. That works in mainline as do_softirq() has a retry loop limit and leaves the softirq processing in the interrupt return path and schedules ksoftirqd. The task which holds qdisc_lock cannot be preempted, so the lock is released and either ksoftirqd or the next softirq in the return from interrupt path can proceed. Though it's a bit strange to actually run MAX_SOFTIRQ_RESTART (10) loops before it decides to bail out even if it's clear in the first iteration :) On RT all softirq processing is done in a FIFO thread and we don't have a loop limit, so ksoftirqd preempts the lock holder forever and unqueues and requeues until the reset button is hit. Due to the forced threading of ksoftirqd on RT we actually cannot deadlock on qdisc_lock because it's a "sleeping lock". So it's safe to replace the spin_trylock() with a spin_lock(). When contended, ksoftirqd is scheduled out and the lock holder can proceed. [ tglx: Massaged changelog and code comments ] Solved-by: Thomas Gleixner <tglx@linuxtronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Tested-by: Carsten Emde <cbe@osadl.org> Cc: Clark Williams <williams@redhat.com> Cc: John Kacur <jkacur@redhat.com> Cc: Luis Claudio R. Goncalves <lclaudio@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:20 +03:00
Thomas Gleixner	8360f12a58	mips-disable-highmem-on-rt.patch Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:20 +03:00
Sebastian Andrzej Siewior	89b21de987	arm/unwind: use a raw_spin_lock Mostly unwind is done with irqs enabled however SLUB may call it with irqs disabled while creating a new SLUB cache. I had system freeze while loading a module which called kmem_cache_create() on init. That means SLUB's __slab_alloc() disabled interrupts and then ->new_slab_objects() ->new_slab() ->setup_object() ->setup_object_debug() ->init_tracking() ->set_track() ->save_stack_trace() ->save_stack_trace_tsk() ->walk_stackframe() ->unwind_frame() ->unwind_find_idx() =>spin_lock_irqsave(&unwind_lock); Cc: stable-rt@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:20 +03:00
Thomas Gleixner	5a25025701	ARM: at91: tclib: Default to tclib timer for RT RT is not too happy about the shared timer interrupt in AT91 devices. Default to tclib timer for RT. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:20 +03:00
Thomas Gleixner	b632644665	arm-disable-highmem-on-rt.patch Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:20 +03:00
Thomas Gleixner	7bfb5a3d14	power-disable-highmem-on-rt.patch Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:20 +03:00
Thomas Gleixner	3e66edec99	Powerpc: Use generic rwsem on RT Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:19 +03:00
Sebastian Andrzej Siewior	522e494098	HACK: printk: drop the logbuf_lock more often The lock is hold with irgs off. The latency drops 500us+ on my arm bugs with a "full" buffer after executing "dmesg" on the shell. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:19 +03:00
Thomas Gleixner	3ad7ca37ae	printk-rt-aware.patch Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:19 +03:00
Sebastian Andrzej Siewior	6b2a704800	irq_work: allow certain work in hard irq context irq_work is processed in softirq context on -RT because we want to avoid long latencies which might arise from processing lots of perf events. The noHZ-full mode requires its callback to be called from real hardirq context (commit `76c24fb` ("nohz: New APIs to re-evaluate the tick on full dynticks CPUs")). If it is called from a thread context we might get wrong results for checks like "is_idle_task(current)". This patch introduces a second list (hirq_work_list) which will be used if irq_work_run() has been invoked from hardirq context and process only work items marked with IRQ_WORK_HARD_IRQ. This patch also removes arch_irq_work_raise() from sparc & powerpc like it is already done for x86. Atleast for powerpc it is somehow superfluous because it is called from the timer interrupt which should invoke update_process_times(). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:19 +03:00
Thomas Gleixner	10a8fb755e	x86-no-perf-irq-work-rt.patch Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:19 +03:00
Thomas Gleixner	26af336d12	use skbufhead with raw lock Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:19 +03:00
Thomas Gleixner	0eb8f356f4	jump-label-rt.patch Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:19 +03:00
Thomas Gleixner	898b8b796f	debugobjects-rt.patch Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:19 +03:00
Sebastian Andrzej Siewior	5646da4a11	percpu_ida: use locklocks the local_irq_save() + spin_lock() does not work that well on -RT Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:19 +03:00
Thomas Gleixner	3126470072	idr: Use local lock instead of preempt enable/disable We need to protect the per cpu variable and prevent migration. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:19 +03:00
Thomas Gleixner	1d1230719a	sched: Distangle worker accounting from rqlock The worker accounting for cpu bound workers is plugged into the core scheduler code and the wakeup code. This is not a hard requirement and can be avoided by keeping track of the state in the workqueue code itself. Keep track of the sleeping state in the worker itself and call the notifier before entering the core scheduler. There might be false positives when the task is woken between that call and actually scheduling, but that's not really different from scheduling and being woken immediately after switching away. There is also no harm from updating nr_running when the task returns from scheduling instead of accounting it in the wakeup code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20110622174919.135236139@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:19 +03:00
Thomas Gleixner	bbb2691eab	workqueue vs ata-piix livelock fixup An Intel i7 system regularly detected rcu_preempt stalls after the kernel was upgraded from 3.6-rt to 3.8-rt. When the stall happened, disk I/O was no longer possible, unless the system was restarted. The kernel message was: INFO: rcu_preempt self-detected stall on CPU { 6} [..] NMI backtrace for cpu 6 CPU 6 Pid: 119, comm: irq/19-ata_piix Not tainted 3.8.13-rt13 #11 Shuttle Inc. SX58/SX58 RIP: 0010:[<ffffffff8124ca60>] [<ffffffff8124ca60>] ip_compute_csum+0x30/0x30 RSP: 0018:ffff880333303cb0 EFLAGS: 00000002 RAX: 0000000000000006 RBX: 00000000000003e9 RCX: 0000000000000034 RDX: 0000000000000000 RSI: ffffffff81aa16d0 RDI: 0000000000000001 RBP: ffff880333303ce8 R08: ffffffff81aa16d0 R09: ffffffff81c1b8cc R10: 0000000000000000 R11: 0000000000000000 R12: 000000000005161f R13: 0000000000000006 R14: ffffffff81aa16d0 R15: 0000000000000002 FS: 0000000000000000(0000) GS:ffff880333300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000003c1b2bb420 CR3: 0000000001a0f000 CR4: 00000000000007e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process irq/19-ata_piix (pid: 119, threadinfo ffff88032d88a000, task ffff88032df80000) Stack: ffffffff8124cb32 000000000005161e 00000000000003e9 0000000000001000 0000000000009022 ffffffff81aa16d0 0000000000000002 ffff880333303cf8 ffffffff8124caa9 ffff880333303d08 ffffffff8124cad2 ffff880333303d28 Call Trace: <IRQ> [<ffffffff8124cb32>] ? delay_tsc+0x33/0xe3 [<ffffffff8124caa9>] __delay+0xf/0x11 [<ffffffff8124cad2>] __const_udelay+0x27/0x29 [<ffffffff8102d1fa>] native_safe_apic_wait_icr_idle+0x39/0x45 [<ffffffff8102dc9b>] __default_send_IPI_dest_field.constprop.0+0x1e/0x58 [<ffffffff8102dd1e>] default_send_IPI_mask_sequence_phys+0x49/0x7d [<ffffffff81030326>] physflat_send_IPI_all+0x17/0x19 [<ffffffff8102de53>] arch_trigger_all_cpu_backtrace+0x50/0x79 [<ffffffff810b21d0>] rcu_check_callbacks+0x1cb/0x568 [<ffffffff81048c9c>] ? raise_softirq+0x2e/0x35 [<ffffffff81086be0>] ? tick_sched_do_timer+0x38/0x38 [<ffffffff8104f653>] update_process_times+0x44/0x55 [<ffffffff81086866>] tick_sched_handle+0x4a/0x59 [<ffffffff81086c1c>] tick_sched_timer+0x3c/0x5b [<ffffffff81062845>] __run_hrtimer+0x9b/0x158 [<ffffffff810631d8>] hrtimer_interrupt+0x172/0x2aa [<ffffffff8102d498>] smp_apic_timer_interrupt+0x76/0x89 [<ffffffff814d881d>] apic_timer_interrupt+0x6d/0x80 <EOI> [<ffffffff81057cd2>] ? __local_lock_irqsave+0x17/0x4a [<ffffffff81059336>] try_to_grab_pending+0x42/0x17e [<ffffffff8105a699>] mod_delayed_work_on+0x32/0x88 [<ffffffff8105a70b>] mod_delayed_work+0x1c/0x1e [<ffffffff8122ae84>] blk_run_queue_async+0x37/0x39 [<ffffffff81230985>] flush_end_io+0xf1/0x107 [<ffffffff8122e0da>] blk_finish_request+0x21e/0x264 [<ffffffff8122e162>] blk_end_bidi_request+0x42/0x60 [<ffffffff8122e1ba>] blk_end_request+0x10/0x12 [<ffffffff8132de46>] scsi_io_completion+0x1bf/0x492 [<ffffffff81335cec>] ? sd_done+0x298/0x2ef [<ffffffff81325a02>] scsi_finish_command+0xe9/0xf2 [<ffffffff8132dbcb>] scsi_softirq_done+0x106/0x10f [<ffffffff812333d3>] blk_done_softirq+0x77/0x87 [<ffffffff8104826f>] do_current_softirqs+0x172/0x2e1 [<ffffffff810aa820>] ? irq_thread_fn+0x3a/0x3a [<ffffffff81048466>] local_bh_enable+0x43/0x72 [<ffffffff810aa866>] irq_forced_thread_fn+0x46/0x52 [<ffffffff810ab089>] irq_thread+0x8c/0x17c [<ffffffff810ab179>] ? irq_thread+0x17c/0x17c [<ffffffff810aaffd>] ? wake_threads_waitq+0x44/0x44 [<ffffffff8105eb18>] kthread+0x8d/0x95 [<ffffffff8105ea8b>] ? __kthread_parkme+0x65/0x65 [<ffffffff814d7b7c>] ret_from_fork+0x7c/0xb0 [<ffffffff8105ea8b>] ? __kthread_parkme+0x65/0x65 The state of softirqd of this CPU at the time of the crash was: ksoftirqd/6 R running task 0 53 2 0x00000000 ffff88032fc39d18 0000000000000046 ffff88033330c4c0 ffff8803303f4710 ffff88032fc39fd8 ffff88032fc39fd8 0000000000000000 0000000000062500 ffff88032df88000 ffff8803303f4710 0000000000000000 ffff88032fc38000 Call Trace: [<ffffffff8105a3ae>] ? __queue_work+0x27c/0x27c [<ffffffff814d178c>] preempt_schedule+0x61/0x76 [<ffffffff8106cccf>] migrate_enable+0xe5/0x1df [<ffffffff8105a3ae>] ? __queue_work+0x27c/0x27c [<ffffffff8104ef52>] run_timer_softirq+0x161/0x1d6 [<ffffffff8104826f>] do_current_softirqs+0x172/0x2e1 [<ffffffff8104840b>] run_ksoftirqd+0x2d/0x45 [<ffffffff8106658a>] smpboot_thread_fn+0x2ea/0x308 [<ffffffff810662a0>] ? test_ti_thread_flag+0xc/0xc [<ffffffff810662a0>] ? test_ti_thread_flag+0xc/0xc [<ffffffff8105eb18>] kthread+0x8d/0x95 [<ffffffff8105ea8b>] ? __kthread_parkme+0x65/0x65 [<ffffffff814d7afc>] ret_from_fork+0x7c/0xb0 [<ffffffff8105ea8b>] ? __kthread_parkme+0x65/0x65 Apparently, the softirq demon and the ata_piix IRQ handler were waiting for each other to finish ending up in a livelock. After the below patch was applied, the system no longer crashes. Reported-by: Carsten Emde <C.Emde@osadl.org> Proposed-by: Thomas Gleixner <tglx@linutronix.de> Tested by: Carsten Emde <C.Emde@osadl.org> Signed-off-by: Carsten Emde <C.Emde@osadl.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:19 +03:00
Thomas Gleixner	2d1e8fd236	Use local irq lock instead of irq disable regions Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:19 +03:00
Thomas Gleixner	91965a4c3d	workqueue: Use normal rcu There is no need for sched_rcu. The undocumented reason why sched_rcu is used is to avoid a few explicit rcu_read_lock()/unlock() pairs by abusing the fact that sched_rcu reader side critical sections are also protected by preempt or irq disabled regions. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:19 +03:00
Thomas Gleixner	59c262358b	net: Use cpu_chill() instead of cpu_relax() Retry loops on RT might loop forever when the modifying side was preempted. Use cpu_chill() instead of cpu_relax() to let the system make progress. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org	2020-10-14 00:59:19 +03:00
Thomas Gleixner	c2141dac11	fs: dcache: Use cpu_chill() in trylock loops Retry loops on RT might loop forever when the modifying side was preempted. Use cpu_chill() instead of cpu_relax() to let the system make progress. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org	2020-10-14 00:59:19 +03:00
Thomas Gleixner	6f78cf1452	block: Use cpu_chill() for retry loops Retry loops on RT might loop forever when the modifying side was preempted. Steven also observed a live lock when there was a concurrent priority boosting going on. Use cpu_chill() instead of cpu_relax() to let the system make progress. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org	2020-10-14 00:59:19 +03:00
Sebastian Andrzej Siewior	86a638b855	blk-mq: revert raw locks, post pone notifier to POST_DEAD The blk_mq_cpu_notify_lock should be raw because some CPU down levels are called with interrupts off. The notifier itself calls currently one function that is blk_mq_hctx_notify(). That function acquires the ctx->lock lock which is sleeping and I would prefer to keep it that way. That function only moves IO-requests from the CPU that is going offline to another CPU and it is currently the only one. Therefore I revert the list lock back to sleeping spinlocks and let the notifier run at POST_DEAD time. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:19 +03:00
Steven Rostedt	0041f854b5	cpu_chill: Add a UNINTERRUPTIBLE hrtimer_nanosleep We hit another bug that was caused by switching cpu_chill() from msleep() to hrtimer_nanosleep(). This time it is a livelock. The problem is that hrtimer_nanosleep() calls schedule with the state == TASK_INTERRUPTIBLE. But these means that if a signal is pending, the scheduler wont schedule, and will simply change the current task state back to TASK_RUNNING. This nullifies the whole point of cpu_chill() in the first place. That is, if a task is spinning on a try_lock() and it preempted the owner of the lock, if it has a signal pending, it will never give up the CPU to let the owner of the lock run. I made a static function __hrtimer_nanosleep() that takes a fifth parameter "state", which determines the task state of that the nanosleep() will be in. The normal hrtimer_nanosleep() will act the same, but cpu_chill() will call the __hrtimer_nanosleep() directly with the TASK_UNINTERRUPTIBLE state. cpu_chill() only cares that the first sleep happens, and does not care about the state of the restart schedule (in hrtimer_nanosleep_restart). Cc: stable-rt@vger.kernel.org Reported-by: Ulrich Obergfell <uobergfe@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:19 +03:00
Sebastian Andrzej Siewior	f0414724df	kernel/hrtimer: be non-freezeable in cpu_chill() Since we replaced msleep() by hrtimer I see now and then (rarely) this: \| [....] Waiting for /dev to be fully populated... \| ===================================== \| [ BUG: udevd/229 still has locks held! ] \| 3.12.11-rt17 #23 Not tainted \| ------------------------------------- \| 1 lock held by udevd/229: \| #0: (&type->i_mutex_dir_key#2){+.+.+.}, at: lookup_slow+0x28/0x98 \| \| stack backtrace: \| CPU: 0 PID: 229 Comm: udevd Not tainted 3.12.11-rt17 #23 \| (unwind_backtrace+0x0/0xf8) from (show_stack+0x10/0x14) \| (show_stack+0x10/0x14) from (dump_stack+0x74/0xbc) \| (dump_stack+0x74/0xbc) from (do_nanosleep+0x120/0x160) \| (do_nanosleep+0x120/0x160) from (hrtimer_nanosleep+0x90/0x110) \| (hrtimer_nanosleep+0x90/0x110) from (cpu_chill+0x30/0x38) \| (cpu_chill+0x30/0x38) from (dentry_kill+0x158/0x1ec) \| (dentry_kill+0x158/0x1ec) from (dput+0x74/0x15c) \| (dput+0x74/0x15c) from (lookup_real+0x4c/0x50) \| (lookup_real+0x4c/0x50) from (__lookup_hash+0x34/0x44) \| (__lookup_hash+0x34/0x44) from (lookup_slow+0x38/0x98) \| (lookup_slow+0x38/0x98) from (path_lookupat+0x208/0x7fc) \| (path_lookupat+0x208/0x7fc) from (filename_lookup+0x20/0x60) \| (filename_lookup+0x20/0x60) from (user_path_at_empty+0x50/0x7c) \| (user_path_at_empty+0x50/0x7c) from (user_path_at+0x14/0x1c) \| (user_path_at+0x14/0x1c) from (vfs_fstatat+0x48/0x94) \| (vfs_fstatat+0x48/0x94) from (SyS_stat64+0x14/0x30) \| (SyS_stat64+0x14/0x30) from (ret_fast_syscall+0x0/0x48) For now I see no better way but to disable the freezer the sleep the period. Cc: stable-rt@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:19 +03:00
Steven Rostedt	58463fcb2b	rt: Make cpu_chill() use hrtimer instead of msleep() Ulrich Obergfell pointed out that cpu_chill() calls msleep() which is woken up by the ksoftirqd running the TIMER softirq. But as the cpu_chill() is called from softirq context, it may block the ksoftirqd() from running, in which case, it may never wake up the msleep() causing the deadlock. I checked the vmcore, and irq/74-qla2xxx is stuck in the msleep() call, running on CPU 8. The one ksoftirqd that is stuck, happens to be the one that runs on CPU 8, and it is blocked on a lock held by irq/74-qla2xxx. As that ksoftirqd is the one that will wake up irq/74-qla2xxx, and it happens to be blocked on a lock that irq/74-qla2xxx holds, we have our deadlock. The solution is not to convert the cpu_chill() back to a cpu_relax() as that will re-create a possible live lock that the cpu_chill() fixed earlier, and may also leave this bug open on other softirqs. The fix is to remove the dependency on ksoftirqd from cpu_chill(). That is, instead of calling msleep() that requires ksoftirqd to wake it up, use the hrtimer_nanosleep() code that does the wakeup from hard irq context. \|Looks to be the lock of the block softirq. I don't have the core dump \|anymore, but from what I could tell the ksoftirqd was blocked on the \|block softirq lock, where the block softirq handler did a msleep \|(called by the qla2xxx interrupt handler). \| \|Looking at trigger_softirq() in block/blk-softirq.c, it can do a \|smp_callfunction() to another cpu to run the block softirq. If that \|happens to be the cpu where the qla2xx irq handler is doing the block \|softirq and is in a middle of a msleep(), I believe the ksoftirqd will \|try to run the softirq. If it does that, then BOOM, it's deadlocked \|because the ksoftirqd will never run the timer softirq either. \|I should have also stated that it was only one lock that was involved. \|But the lock owner was doing a msleep() that requires a wakeup by \|ksoftirqd to continue. If ksoftirqd happens to be blocked on a lock \|held by the msleep() caller, then you have your deadlock. \| \|It's best not to have any softirqs going to sleep requiring another \|softirq to wake it up. Note, if we ever require a timer softirq to do a \|cpu_chill() it will most definitely hit this deadlock. Cc: stable-rt@vger.kernel.org Found-by: Ulrich Obergfell <uobergfe@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> [bigeasy: add the 4 \| chapters from email] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:18 +03:00
Thomas Gleixner	b1a75a3071	rt: Introduce cpu_chill() Retry loops on RT might loop forever when the modifying side was preempted. Add cpu_chill() to replace cpu_relax(). cpu_chill() defaults to cpu_relax() for non RT. On RT it puts the looping task to sleep for a tick so the preempted task can make progress. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org	2020-10-14 00:59:18 +03:00
Sebastian Andrzej Siewior	2d535af0db	block: mq: use cpu_light() there is a might sleep splat because get_cpu() disables preemption and later we grab a lock. As a workaround for this we use get_cpu_light() and an additional lock to prevent taking the same ctx. There is a lock member in the ctx already but there some functions which do ++ on the member and this works with irq off but on RT we would need the extra lock. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:18 +03:00
Thomas Gleixner	cc5d4c7047	mm-vmalloc.patch Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:18 +03:00
Thomas Gleixner	7643ead733	epoll.patch Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:18 +03:00
Thomas Gleixner	8ed5077c73	x86: Use generic rwsem_spinlocks on -rt Simplifies the separation of anon_rw_semaphores and rw_semaphores for -rt. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:18 +03:00
Thomas Gleixner	633db31439	x86: stackprotector: Avoid random pool on rt CPU bringup calls into the random pool to initialize the stack canary. During boot that works nicely even on RT as the might sleep checks are disabled. During CPU hotplug the might sleep checks trigger. Making the locks in random raw is a major PITA, so avoid the call on RT is the only sensible solution. This is basically the same randomness which we get during boot where the random pool has no entropy and we rely on the TSC randomnness. Reported-by: Carsten Emde <carsten.emde@osadl.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:18 +03:00
Steven Rostedt	2aa6aec664	x86/mce: Defer mce wakeups to threads for PREEMPT_RT We had a customer report a lockup on a 3.0-rt kernel that had the following backtrace: [ffff88107fca3e80] rt_spin_lock_slowlock at ffffffff81499113 [ffff88107fca3f40] rt_spin_lock at ffffffff81499a56 [ffff88107fca3f50] __wake_up at ffffffff81043379 [ffff88107fca3f80] mce_notify_irq at ffffffff81017328 [ffff88107fca3f90] intel_threshold_interrupt at ffffffff81019508 [ffff88107fca3fa0] smp_threshold_interrupt at ffffffff81019fc1 [ffff88107fca3fb0] threshold_interrupt at ffffffff814a1853 It actually bugged because the lock was taken by the same owner that already had that lock. What happened was the thread that was setting itself on a wait queue had the lock when an MCE triggered. The MCE interrupt does a wake up on its wait list and grabs the same lock. NOTE: THIS IS NOT A BUG ON MAINLINE Sorry for yelling, but as I Cc'd mainline maintainers I want them to know that this is an PREEMPT_RT bug only. I only Cc'd them for advice. On PREEMPT_RT the wait queue locks are converted from normal "spin_locks" into an rt_mutex (see the rt_spin_lock_slowlock above). These are not to be taken by hard interrupt context. This usually isn't a problem as most all interrupts in PREEMPT_RT are converted into schedulable threads. Unfortunately that's not the case with the MCE irq. As wait queue locks are notorious for long hold times, we can not convert them to raw_spin_locks without causing issues with -rt. But Thomas has created a "simple-wait" structure that uses raw spin locks which may have been a good fit. Unfortunately, wait queues are not the only issue, as the mce_notify_irq also does a schedule_work(), which grabs the workqueue spin locks that have the exact same issue. Thus, this patch I'm proposing is to move the actual work of the MCE interrupt into a helper thread that gets woken up on the MCE interrupt and does the work in a schedulable context. NOTE: THIS PATCH ONLY CHANGES THE BEHAVIOR WHEN PREEMPT_RT IS SET Oops, sorry for yelling again, but I want to stress that I keep the same behavior of mainline when PREEMPT_RT is not set. Thus, this only changes the MCE behavior when PREEMPT_RT is configured. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> [bigeasy@linutronix: make mce_notify_work() a proper prototype, use kthread_run()] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:18 +03:00
Thomas Gleixner	f962e0c91a	x86: Convert mce timer to hrtimer mce_timer is started in atomic contexts of cpu bringup. This results in might_sleep() warnings on RT. Convert mce_timer to a hrtimer to avoid this. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> fold in: \|From: Mike Galbraith <bitbucket@online.de> \|Date: Wed, 29 May 2013 13:52:13 +0200 \|Subject: [PATCH] x86/mce: fix mce timer interval \| \|Seems mce timer fire at the wrong frequency in -rt kernels since roughly \|forever due to 32 bit overflow. 3.8-rt is also missing a multiplier. \| \|Add missing us -> ns conversion and 32 bit overflow prevention. \| \|Signed-off-by: Mike Galbraith <bitbucket@online.de> \|[bigeasy: use ULL instead of u64 cast] \|Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:18 +03:00
Sebastian Andrzej Siewior	f9b2a5eee5	fs: jbd2: pull your plug when waiting for space Two cps in parallel managed to stall the the ext4 fs. It seems that journal code is either waiting for locks or sleeping waiting for something to happen. This seems similar to what Mike observed on ext3, here is his description: \|With an -rt kernel, and a heavy sync IO load, tasks can jam \|up on journal locks without unplugging, which can lead to \|terminal IO starvation. Unplug and schedule when waiting \|for space. Cc: stable-rt@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:18 +03:00

1 2 3 4 5 ...

431682 Commits All Branches Search

431682 Commits

All Branches