linux

Commit Graph

Author	SHA1	Message	Date
Thomas Gleixner	3126470072	idr: Use local lock instead of preempt enable/disable We need to protect the per cpu variable and prevent migration. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:19 +03:00
Thomas Gleixner	1d1230719a	sched: Distangle worker accounting from rqlock The worker accounting for cpu bound workers is plugged into the core scheduler code and the wakeup code. This is not a hard requirement and can be avoided by keeping track of the state in the workqueue code itself. Keep track of the sleeping state in the worker itself and call the notifier before entering the core scheduler. There might be false positives when the task is woken between that call and actually scheduling, but that's not really different from scheduling and being woken immediately after switching away. There is also no harm from updating nr_running when the task returns from scheduling instead of accounting it in the wakeup code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20110622174919.135236139@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:19 +03:00
Thomas Gleixner	bbb2691eab	workqueue vs ata-piix livelock fixup An Intel i7 system regularly detected rcu_preempt stalls after the kernel was upgraded from 3.6-rt to 3.8-rt. When the stall happened, disk I/O was no longer possible, unless the system was restarted. The kernel message was: INFO: rcu_preempt self-detected stall on CPU { 6} [..] NMI backtrace for cpu 6 CPU 6 Pid: 119, comm: irq/19-ata_piix Not tainted 3.8.13-rt13 #11 Shuttle Inc. SX58/SX58 RIP: 0010:[<ffffffff8124ca60>] [<ffffffff8124ca60>] ip_compute_csum+0x30/0x30 RSP: 0018:ffff880333303cb0 EFLAGS: 00000002 RAX: 0000000000000006 RBX: 00000000000003e9 RCX: 0000000000000034 RDX: 0000000000000000 RSI: ffffffff81aa16d0 RDI: 0000000000000001 RBP: ffff880333303ce8 R08: ffffffff81aa16d0 R09: ffffffff81c1b8cc R10: 0000000000000000 R11: 0000000000000000 R12: 000000000005161f R13: 0000000000000006 R14: ffffffff81aa16d0 R15: 0000000000000002 FS: 0000000000000000(0000) GS:ffff880333300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000003c1b2bb420 CR3: 0000000001a0f000 CR4: 00000000000007e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process irq/19-ata_piix (pid: 119, threadinfo ffff88032d88a000, task ffff88032df80000) Stack: ffffffff8124cb32 000000000005161e 00000000000003e9 0000000000001000 0000000000009022 ffffffff81aa16d0 0000000000000002 ffff880333303cf8 ffffffff8124caa9 ffff880333303d08 ffffffff8124cad2 ffff880333303d28 Call Trace: <IRQ> [<ffffffff8124cb32>] ? delay_tsc+0x33/0xe3 [<ffffffff8124caa9>] __delay+0xf/0x11 [<ffffffff8124cad2>] __const_udelay+0x27/0x29 [<ffffffff8102d1fa>] native_safe_apic_wait_icr_idle+0x39/0x45 [<ffffffff8102dc9b>] __default_send_IPI_dest_field.constprop.0+0x1e/0x58 [<ffffffff8102dd1e>] default_send_IPI_mask_sequence_phys+0x49/0x7d [<ffffffff81030326>] physflat_send_IPI_all+0x17/0x19 [<ffffffff8102de53>] arch_trigger_all_cpu_backtrace+0x50/0x79 [<ffffffff810b21d0>] rcu_check_callbacks+0x1cb/0x568 [<ffffffff81048c9c>] ? raise_softirq+0x2e/0x35 [<ffffffff81086be0>] ? tick_sched_do_timer+0x38/0x38 [<ffffffff8104f653>] update_process_times+0x44/0x55 [<ffffffff81086866>] tick_sched_handle+0x4a/0x59 [<ffffffff81086c1c>] tick_sched_timer+0x3c/0x5b [<ffffffff81062845>] __run_hrtimer+0x9b/0x158 [<ffffffff810631d8>] hrtimer_interrupt+0x172/0x2aa [<ffffffff8102d498>] smp_apic_timer_interrupt+0x76/0x89 [<ffffffff814d881d>] apic_timer_interrupt+0x6d/0x80 <EOI> [<ffffffff81057cd2>] ? __local_lock_irqsave+0x17/0x4a [<ffffffff81059336>] try_to_grab_pending+0x42/0x17e [<ffffffff8105a699>] mod_delayed_work_on+0x32/0x88 [<ffffffff8105a70b>] mod_delayed_work+0x1c/0x1e [<ffffffff8122ae84>] blk_run_queue_async+0x37/0x39 [<ffffffff81230985>] flush_end_io+0xf1/0x107 [<ffffffff8122e0da>] blk_finish_request+0x21e/0x264 [<ffffffff8122e162>] blk_end_bidi_request+0x42/0x60 [<ffffffff8122e1ba>] blk_end_request+0x10/0x12 [<ffffffff8132de46>] scsi_io_completion+0x1bf/0x492 [<ffffffff81335cec>] ? sd_done+0x298/0x2ef [<ffffffff81325a02>] scsi_finish_command+0xe9/0xf2 [<ffffffff8132dbcb>] scsi_softirq_done+0x106/0x10f [<ffffffff812333d3>] blk_done_softirq+0x77/0x87 [<ffffffff8104826f>] do_current_softirqs+0x172/0x2e1 [<ffffffff810aa820>] ? irq_thread_fn+0x3a/0x3a [<ffffffff81048466>] local_bh_enable+0x43/0x72 [<ffffffff810aa866>] irq_forced_thread_fn+0x46/0x52 [<ffffffff810ab089>] irq_thread+0x8c/0x17c [<ffffffff810ab179>] ? irq_thread+0x17c/0x17c [<ffffffff810aaffd>] ? wake_threads_waitq+0x44/0x44 [<ffffffff8105eb18>] kthread+0x8d/0x95 [<ffffffff8105ea8b>] ? __kthread_parkme+0x65/0x65 [<ffffffff814d7b7c>] ret_from_fork+0x7c/0xb0 [<ffffffff8105ea8b>] ? __kthread_parkme+0x65/0x65 The state of softirqd of this CPU at the time of the crash was: ksoftirqd/6 R running task 0 53 2 0x00000000 ffff88032fc39d18 0000000000000046 ffff88033330c4c0 ffff8803303f4710 ffff88032fc39fd8 ffff88032fc39fd8 0000000000000000 0000000000062500 ffff88032df88000 ffff8803303f4710 0000000000000000 ffff88032fc38000 Call Trace: [<ffffffff8105a3ae>] ? __queue_work+0x27c/0x27c [<ffffffff814d178c>] preempt_schedule+0x61/0x76 [<ffffffff8106cccf>] migrate_enable+0xe5/0x1df [<ffffffff8105a3ae>] ? __queue_work+0x27c/0x27c [<ffffffff8104ef52>] run_timer_softirq+0x161/0x1d6 [<ffffffff8104826f>] do_current_softirqs+0x172/0x2e1 [<ffffffff8104840b>] run_ksoftirqd+0x2d/0x45 [<ffffffff8106658a>] smpboot_thread_fn+0x2ea/0x308 [<ffffffff810662a0>] ? test_ti_thread_flag+0xc/0xc [<ffffffff810662a0>] ? test_ti_thread_flag+0xc/0xc [<ffffffff8105eb18>] kthread+0x8d/0x95 [<ffffffff8105ea8b>] ? __kthread_parkme+0x65/0x65 [<ffffffff814d7afc>] ret_from_fork+0x7c/0xb0 [<ffffffff8105ea8b>] ? __kthread_parkme+0x65/0x65 Apparently, the softirq demon and the ata_piix IRQ handler were waiting for each other to finish ending up in a livelock. After the below patch was applied, the system no longer crashes. Reported-by: Carsten Emde <C.Emde@osadl.org> Proposed-by: Thomas Gleixner <tglx@linutronix.de> Tested by: Carsten Emde <C.Emde@osadl.org> Signed-off-by: Carsten Emde <C.Emde@osadl.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:19 +03:00
Thomas Gleixner	2d1e8fd236	Use local irq lock instead of irq disable regions Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:19 +03:00
Thomas Gleixner	91965a4c3d	workqueue: Use normal rcu There is no need for sched_rcu. The undocumented reason why sched_rcu is used is to avoid a few explicit rcu_read_lock()/unlock() pairs by abusing the fact that sched_rcu reader side critical sections are also protected by preempt or irq disabled regions. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:19 +03:00
Thomas Gleixner	59c262358b	net: Use cpu_chill() instead of cpu_relax() Retry loops on RT might loop forever when the modifying side was preempted. Use cpu_chill() instead of cpu_relax() to let the system make progress. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org	2020-10-14 00:59:19 +03:00
Thomas Gleixner	c2141dac11	fs: dcache: Use cpu_chill() in trylock loops Retry loops on RT might loop forever when the modifying side was preempted. Use cpu_chill() instead of cpu_relax() to let the system make progress. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org	2020-10-14 00:59:19 +03:00
Thomas Gleixner	6f78cf1452	block: Use cpu_chill() for retry loops Retry loops on RT might loop forever when the modifying side was preempted. Steven also observed a live lock when there was a concurrent priority boosting going on. Use cpu_chill() instead of cpu_relax() to let the system make progress. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org	2020-10-14 00:59:19 +03:00
Sebastian Andrzej Siewior	86a638b855	blk-mq: revert raw locks, post pone notifier to POST_DEAD The blk_mq_cpu_notify_lock should be raw because some CPU down levels are called with interrupts off. The notifier itself calls currently one function that is blk_mq_hctx_notify(). That function acquires the ctx->lock lock which is sleeping and I would prefer to keep it that way. That function only moves IO-requests from the CPU that is going offline to another CPU and it is currently the only one. Therefore I revert the list lock back to sleeping spinlocks and let the notifier run at POST_DEAD time. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:19 +03:00
Steven Rostedt	0041f854b5	cpu_chill: Add a UNINTERRUPTIBLE hrtimer_nanosleep We hit another bug that was caused by switching cpu_chill() from msleep() to hrtimer_nanosleep(). This time it is a livelock. The problem is that hrtimer_nanosleep() calls schedule with the state == TASK_INTERRUPTIBLE. But these means that if a signal is pending, the scheduler wont schedule, and will simply change the current task state back to TASK_RUNNING. This nullifies the whole point of cpu_chill() in the first place. That is, if a task is spinning on a try_lock() and it preempted the owner of the lock, if it has a signal pending, it will never give up the CPU to let the owner of the lock run. I made a static function __hrtimer_nanosleep() that takes a fifth parameter "state", which determines the task state of that the nanosleep() will be in. The normal hrtimer_nanosleep() will act the same, but cpu_chill() will call the __hrtimer_nanosleep() directly with the TASK_UNINTERRUPTIBLE state. cpu_chill() only cares that the first sleep happens, and does not care about the state of the restart schedule (in hrtimer_nanosleep_restart). Cc: stable-rt@vger.kernel.org Reported-by: Ulrich Obergfell <uobergfe@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:19 +03:00
Sebastian Andrzej Siewior	f0414724df	kernel/hrtimer: be non-freezeable in cpu_chill() Since we replaced msleep() by hrtimer I see now and then (rarely) this: \| [....] Waiting for /dev to be fully populated... \| ===================================== \| [ BUG: udevd/229 still has locks held! ] \| 3.12.11-rt17 #23 Not tainted \| ------------------------------------- \| 1 lock held by udevd/229: \| #0: (&type->i_mutex_dir_key#2){+.+.+.}, at: lookup_slow+0x28/0x98 \| \| stack backtrace: \| CPU: 0 PID: 229 Comm: udevd Not tainted 3.12.11-rt17 #23 \| (unwind_backtrace+0x0/0xf8) from (show_stack+0x10/0x14) \| (show_stack+0x10/0x14) from (dump_stack+0x74/0xbc) \| (dump_stack+0x74/0xbc) from (do_nanosleep+0x120/0x160) \| (do_nanosleep+0x120/0x160) from (hrtimer_nanosleep+0x90/0x110) \| (hrtimer_nanosleep+0x90/0x110) from (cpu_chill+0x30/0x38) \| (cpu_chill+0x30/0x38) from (dentry_kill+0x158/0x1ec) \| (dentry_kill+0x158/0x1ec) from (dput+0x74/0x15c) \| (dput+0x74/0x15c) from (lookup_real+0x4c/0x50) \| (lookup_real+0x4c/0x50) from (__lookup_hash+0x34/0x44) \| (__lookup_hash+0x34/0x44) from (lookup_slow+0x38/0x98) \| (lookup_slow+0x38/0x98) from (path_lookupat+0x208/0x7fc) \| (path_lookupat+0x208/0x7fc) from (filename_lookup+0x20/0x60) \| (filename_lookup+0x20/0x60) from (user_path_at_empty+0x50/0x7c) \| (user_path_at_empty+0x50/0x7c) from (user_path_at+0x14/0x1c) \| (user_path_at+0x14/0x1c) from (vfs_fstatat+0x48/0x94) \| (vfs_fstatat+0x48/0x94) from (SyS_stat64+0x14/0x30) \| (SyS_stat64+0x14/0x30) from (ret_fast_syscall+0x0/0x48) For now I see no better way but to disable the freezer the sleep the period. Cc: stable-rt@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:19 +03:00
Steven Rostedt	58463fcb2b	rt: Make cpu_chill() use hrtimer instead of msleep() Ulrich Obergfell pointed out that cpu_chill() calls msleep() which is woken up by the ksoftirqd running the TIMER softirq. But as the cpu_chill() is called from softirq context, it may block the ksoftirqd() from running, in which case, it may never wake up the msleep() causing the deadlock. I checked the vmcore, and irq/74-qla2xxx is stuck in the msleep() call, running on CPU 8. The one ksoftirqd that is stuck, happens to be the one that runs on CPU 8, and it is blocked on a lock held by irq/74-qla2xxx. As that ksoftirqd is the one that will wake up irq/74-qla2xxx, and it happens to be blocked on a lock that irq/74-qla2xxx holds, we have our deadlock. The solution is not to convert the cpu_chill() back to a cpu_relax() as that will re-create a possible live lock that the cpu_chill() fixed earlier, and may also leave this bug open on other softirqs. The fix is to remove the dependency on ksoftirqd from cpu_chill(). That is, instead of calling msleep() that requires ksoftirqd to wake it up, use the hrtimer_nanosleep() code that does the wakeup from hard irq context. \|Looks to be the lock of the block softirq. I don't have the core dump \|anymore, but from what I could tell the ksoftirqd was blocked on the \|block softirq lock, where the block softirq handler did a msleep \|(called by the qla2xxx interrupt handler). \| \|Looking at trigger_softirq() in block/blk-softirq.c, it can do a \|smp_callfunction() to another cpu to run the block softirq. If that \|happens to be the cpu where the qla2xx irq handler is doing the block \|softirq and is in a middle of a msleep(), I believe the ksoftirqd will \|try to run the softirq. If it does that, then BOOM, it's deadlocked \|because the ksoftirqd will never run the timer softirq either. \|I should have also stated that it was only one lock that was involved. \|But the lock owner was doing a msleep() that requires a wakeup by \|ksoftirqd to continue. If ksoftirqd happens to be blocked on a lock \|held by the msleep() caller, then you have your deadlock. \| \|It's best not to have any softirqs going to sleep requiring another \|softirq to wake it up. Note, if we ever require a timer softirq to do a \|cpu_chill() it will most definitely hit this deadlock. Cc: stable-rt@vger.kernel.org Found-by: Ulrich Obergfell <uobergfe@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> [bigeasy: add the 4 \| chapters from email] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:18 +03:00
Thomas Gleixner	b1a75a3071	rt: Introduce cpu_chill() Retry loops on RT might loop forever when the modifying side was preempted. Add cpu_chill() to replace cpu_relax(). cpu_chill() defaults to cpu_relax() for non RT. On RT it puts the looping task to sleep for a tick so the preempted task can make progress. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org	2020-10-14 00:59:18 +03:00
Sebastian Andrzej Siewior	2d535af0db	block: mq: use cpu_light() there is a might sleep splat because get_cpu() disables preemption and later we grab a lock. As a workaround for this we use get_cpu_light() and an additional lock to prevent taking the same ctx. There is a lock member in the ctx already but there some functions which do ++ on the member and this works with irq off but on RT we would need the extra lock. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:18 +03:00
Thomas Gleixner	cc5d4c7047	mm-vmalloc.patch Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:18 +03:00
Thomas Gleixner	7643ead733	epoll.patch Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:18 +03:00
Thomas Gleixner	8ed5077c73	x86: Use generic rwsem_spinlocks on -rt Simplifies the separation of anon_rw_semaphores and rw_semaphores for -rt. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:18 +03:00
Thomas Gleixner	633db31439	x86: stackprotector: Avoid random pool on rt CPU bringup calls into the random pool to initialize the stack canary. During boot that works nicely even on RT as the might sleep checks are disabled. During CPU hotplug the might sleep checks trigger. Making the locks in random raw is a major PITA, so avoid the call on RT is the only sensible solution. This is basically the same randomness which we get during boot where the random pool has no entropy and we rely on the TSC randomnness. Reported-by: Carsten Emde <carsten.emde@osadl.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:18 +03:00
Steven Rostedt	2aa6aec664	x86/mce: Defer mce wakeups to threads for PREEMPT_RT We had a customer report a lockup on a 3.0-rt kernel that had the following backtrace: [ffff88107fca3e80] rt_spin_lock_slowlock at ffffffff81499113 [ffff88107fca3f40] rt_spin_lock at ffffffff81499a56 [ffff88107fca3f50] __wake_up at ffffffff81043379 [ffff88107fca3f80] mce_notify_irq at ffffffff81017328 [ffff88107fca3f90] intel_threshold_interrupt at ffffffff81019508 [ffff88107fca3fa0] smp_threshold_interrupt at ffffffff81019fc1 [ffff88107fca3fb0] threshold_interrupt at ffffffff814a1853 It actually bugged because the lock was taken by the same owner that already had that lock. What happened was the thread that was setting itself on a wait queue had the lock when an MCE triggered. The MCE interrupt does a wake up on its wait list and grabs the same lock. NOTE: THIS IS NOT A BUG ON MAINLINE Sorry for yelling, but as I Cc'd mainline maintainers I want them to know that this is an PREEMPT_RT bug only. I only Cc'd them for advice. On PREEMPT_RT the wait queue locks are converted from normal "spin_locks" into an rt_mutex (see the rt_spin_lock_slowlock above). These are not to be taken by hard interrupt context. This usually isn't a problem as most all interrupts in PREEMPT_RT are converted into schedulable threads. Unfortunately that's not the case with the MCE irq. As wait queue locks are notorious for long hold times, we can not convert them to raw_spin_locks without causing issues with -rt. But Thomas has created a "simple-wait" structure that uses raw spin locks which may have been a good fit. Unfortunately, wait queues are not the only issue, as the mce_notify_irq also does a schedule_work(), which grabs the workqueue spin locks that have the exact same issue. Thus, this patch I'm proposing is to move the actual work of the MCE interrupt into a helper thread that gets woken up on the MCE interrupt and does the work in a schedulable context. NOTE: THIS PATCH ONLY CHANGES THE BEHAVIOR WHEN PREEMPT_RT IS SET Oops, sorry for yelling again, but I want to stress that I keep the same behavior of mainline when PREEMPT_RT is not set. Thus, this only changes the MCE behavior when PREEMPT_RT is configured. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> [bigeasy@linutronix: make mce_notify_work() a proper prototype, use kthread_run()] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:18 +03:00
Thomas Gleixner	f962e0c91a	x86: Convert mce timer to hrtimer mce_timer is started in atomic contexts of cpu bringup. This results in might_sleep() warnings on RT. Convert mce_timer to a hrtimer to avoid this. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> fold in: \|From: Mike Galbraith <bitbucket@online.de> \|Date: Wed, 29 May 2013 13:52:13 +0200 \|Subject: [PATCH] x86/mce: fix mce timer interval \| \|Seems mce timer fire at the wrong frequency in -rt kernels since roughly \|forever due to 32 bit overflow. 3.8-rt is also missing a multiplier. \| \|Add missing us -> ns conversion and 32 bit overflow prevention. \| \|Signed-off-by: Mike Galbraith <bitbucket@online.de> \|[bigeasy: use ULL instead of u64 cast] \|Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:18 +03:00
Sebastian Andrzej Siewior	f9b2a5eee5	fs: jbd2: pull your plug when waiting for space Two cps in parallel managed to stall the the ext4 fs. It seems that journal code is either waiting for locks or sleeping waiting for something to happen. This seems similar to what Mike observed on ext3, here is his description: \|With an -rt kernel, and a heavy sync IO load, tasks can jam \|up on journal locks without unplugging, which can lead to \|terminal IO starvation. Unplug and schedule when waiting \|for space. Cc: stable-rt@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:18 +03:00
Mike Galbraith	c80a59e1aa	fs, jbd: pull your plug when waiting for space With an -rt kernel, and a heavy sync IO load, tasks can jam up on journal locks without unplugging, which can lead to terminal IO starvation. Unplug and schedule when waiting for space. Signed-off-by: Mike Galbraith <mgalbraith@suse.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Theodore Tso <tytso@mit.edu> Link: http://lkml.kernel.org/r/1341812414.7370.73.camel@marge.simpson.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:18 +03:00
Mike Galbraith	8f3f53adf4	fs: ntfs: disable interrupt only on !RT On Sat, 2007-10-27 at 11:44 +0200, Ingo Molnar wrote: > * Nick Piggin <nickpiggin@yahoo.com.au> wrote: > > > > [10138.175796] [<c0105de3>] show_trace+0x12/0x14 > > > [10138.180291] [<c0105dfb>] dump_stack+0x16/0x18 > > > [10138.184769] [<c011609f>] native_smp_call_function_mask+0x138/0x13d > > > [10138.191117] [<c0117606>] smp_call_function+0x1e/0x24 > > > [10138.196210] [<c012f85c>] on_each_cpu+0x25/0x50 > > > [10138.200807] [<c0115c74>] flush_tlb_all+0x1e/0x20 > > > [10138.205553] [<c016caaf>] kmap_high+0x1b6/0x417 > > > [10138.210118] [<c011ec88>] kmap+0x4d/0x4f > > > [10138.214102] [<c026a9d8>] ntfs_end_buffer_async_read+0x228/0x2f9 > > > [10138.220163] [<c01a0e9e>] end_bio_bh_io_sync+0x26/0x3f > > > [10138.225352] [<c01a2b09>] bio_endio+0x42/0x6d > > > [10138.229769] [<c02c2a08>] __end_that_request_first+0x115/0x4ac > > > [10138.235682] [<c02c2da7>] end_that_request_chunk+0x8/0xa > > > [10138.241052] [<c0365943>] ide_end_request+0x55/0x10a > > > [10138.246058] [<c036dae3>] ide_dma_intr+0x6f/0xac > > > [10138.250727] [<c0366d83>] ide_intr+0x93/0x1e0 > > > [10138.255125] [<c015afb4>] handle_IRQ_event+0x5c/0xc9 > > > > Looks like ntfs is kmap()ing from interrupt context. Should be using > > kmap_atomic instead, I think. > > it's not atomic interrupt context but irq thread context - and -rt > remaps kmap_atomic() to kmap() internally. Hm. Looking at the change to mm/bounce.c, perhaps I should do this instead? Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:18 +03:00
Thomas Gleixner	d067c11142	fs-block-rt-support.patch Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:18 +03:00
Yong Zhang	6da4ddd9c1	mm: Protect activate_mm() by preempt_[disable&enable]_rt() User preempt__rt instead of local_irq__rt or otherwise there will be warning on ARM like below: WARNING: at build/linux/kernel/smp.c:459 smp_call_function_many+0x98/0x264() Modules linked in: [<c0013bb4>] (unwind_backtrace+0x0/0xe4) from [<c001be94>] (warn_slowpath_common+0x4c/0x64) [<c001be94>] (warn_slowpath_common+0x4c/0x64) from [<c001bec4>] (warn_slowpath_null+0x18/0x1c) [<c001bec4>] (warn_slowpath_null+0x18/0x1c) from [<c0053ff8>](smp_call_function_many+0x98/0x264) [<c0053ff8>] (smp_call_function_many+0x98/0x264) from [<c0054364>] (smp_call_function+0x44/0x6c) [<c0054364>] (smp_call_function+0x44/0x6c) from [<c0017d50>] (__new_context+0xbc/0x124) [<c0017d50>] (__new_context+0xbc/0x124) from [<c009e49c>] (flush_old_exec+0x460/0x5e4) [<c009e49c>] (flush_old_exec+0x460/0x5e4) from [<c00d61ac>] (load_elf_binary+0x2e0/0x11ac) [<c00d61ac>] (load_elf_binary+0x2e0/0x11ac) from [<c009d060>] (search_binary_handler+0x94/0x2a4) [<c009d060>] (search_binary_handler+0x94/0x2a4) from [<c009e8fc>] (do_execve+0x254/0x364) [<c009e8fc>] (do_execve+0x254/0x364) from [<c0010e84>] (sys_execve+0x34/0x54) [<c0010e84>] (sys_execve+0x34/0x54) from [<c000da00>] (ret_fast_syscall+0x0/0x30) ---[ end trace 0000000000000002 ]--- The reason is that ARM need irq enabled when doing activate_mm(). According to mm-protect-activate-switch-mm.patch, actually preempt_[disable\|enable]_rt() is sufficient. Inspired-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Yong Zhang <yong.zhang0@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1337061236-1766-1-git-send-email-yong.zhang0@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:18 +03:00
Thomas Gleixner	b4f3b7b714	fs: namespace preemption fix On RT we cannot loop with preemption disabled here as mnt_make_readonly() might have been preempted. We can safely enable preemption while waiting for MNT_WRITE_HOLD to be cleared. Safe on !RT as well. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:18 +03:00
Ingo Molnar	797bad0fa7	rt: Improve the serial console PASS_LIMIT Beyond the warning: drivers/tty/serial/8250/8250.c:1613:6: warning: unused variable ‘pass_counter’ [-Wunused-variable] the solution of just looping infinitely was ugly - up it to 1 million to give it a chance to continue in some really ugly situation. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:18 +03:00
Thomas Gleixner	9e1befb3d6	drivers-tty-pl011-irq-disable-madness.patch Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:18 +03:00
Thomas Gleixner	c564c05244	drivers-tty-fix-omap-lock-crap.patch Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:18 +03:00
Ingo Molnar	1c81d2eb01	serial: 8250: Clean up the locking for -rt In -RT the spin_lock_irqsave() does not spin but sleep if the lock is taken. Before that, local_irq_save() is invoked which disables interrupts even on -RT. Therefore local_irq_save() + spin_lock() does not work. In the ->sysrq and oops_in_progress case it is save to trylock the lock i.e. this is what we do now anyway except for ->sysrq where we assume that the lock is already taken. The spin_lock_irqsave() grabs the lock and disables the interrupts on vanilla (the same behavior) and on -RT it won't disable interrupts. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [bigeasy: add a patch description] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:18 +03:00
Mike Galbraith	29c09674c6	stomp-machine: use lg_global_trylock_relax() to dead with stop_cpus_lock lglock If the stop machinery is called from inactive CPU we cannot use lg_global_lock(), because some other stomp machine invocation might be in progress and the lock can be contended. We cannot schedule from this context, so use the lovely new lg_global_trylock_relax() primitive to do what we used to do via one mutex_trylock()/cpu_relax() loop. We now do that trylock()/relax() across an entire herd of locks. Joy. Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:18 +03:00
Mike Galbraith	8be91122a8	stomp-machine: create lg_global_trylock_relax() primitive Create lg_global_trylock_relax() for use by stopper thread when it cannot schedule, to deal with stop_cpus_lock, which is now an lglock. Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:18 +03:00
Thomas Gleixner	121435cad6	lglocks-rt.patch Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:17 +03:00
Tiejun Chen	22775b1bef	rcutree/rcu_bh_qs: disable irq while calling rcu_preempt_qs() Any callers to the function rcu_preempt_qs() must disable irqs in order to protect the assignment to ->rcu_read_unlock_special. In RT case, rcu_bh_qs() as the wrapper of rcu_preempt_qs() is called in some scenarios where irq is enabled, like this path, do_single_softirq() \| + local_irq_enable(); + handle_softirq() \| \| \| + rcu_bh_qs() \| \| \| + rcu_preempt_qs() \| + local_irq_disable() So here we'd better disable irq directly inside of rcu_bh_qs() to fix this, otherwise the kernel may be freezable sometimes as observed. And especially this way is also kind and safe for the potential rcu_bh_qs() usage elsewhere in the future. Cc: stable-rt@vger.kernel.org Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com> Signed-off-by: Bin Jiang <bin.jiang@windriver.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:17 +03:00
Paul E. McKenney	010ec70629	rcu: Make ksoftirqd do RCU quiescent states Implementing RCU-bh in terms of RCU-preempt makes the system vulnerable to network-based denial-of-service attacks. This patch therefore makes __do_softirq() invoke rcu_bh_qs(), but only when __do_softirq() is running in ksoftirqd context. A wrapper layer in interposed so that other calls to __do_softirq() avoid invoking rcu_bh_qs(). The underlying function __do_softirq_common() does the actual work. The reason that rcu_bh_qs() is bad in these non-ksoftirqd contexts is that there might be a local_bh_enable() inside an RCU-preempt read-side critical section. This local_bh_enable() can invoke __do_softirq() directly, so if __do_softirq() were to invoke rcu_bh_qs() (which just calls rcu_preempt_qs() in the PREEMPT_RT_FULL case), there would be an illegal RCU-preempt quiescent state in the middle of an RCU-preempt read-side critical section. Therefore, quiescent states can only happen in cases where __do_softirq() is invoked directly from ksoftirqd. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20111005184518.GA21601@linux.vnet.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:17 +03:00
Thomas Gleixner	482b29e64e	rcu-more-fallout.patch Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:17 +03:00
Thomas Gleixner	04b4e502bb	rcu: Merge RCU-bh into RCU-preempt The Linux kernel has long RCU-bh read-side critical sections that intolerably increase scheduling latency under mainline's RCU-bh rules, which include RCU-bh read-side critical sections being non-preemptible. This patch therefore arranges for RCU-bh to be implemented in terms of RCU-preempt for CONFIG_PREEMPT_RT_FULL=y. This has the downside of defeating the purpose of RCU-bh, namely, handling the case where the system is subjected to a network-based denial-of-service attack that keeps at least one CPU doing full-time softirq processing. This issue will be fixed by a later commit. The current commit will need some work to make it appropriate for mainline use, for example, it needs to be extended to cover Tiny RCU. [ paulmck: Added a useful changelog ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20111005185938.GA20403@linux.vnet.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:17 +03:00
Peter Zijlstra	9f1c3d0f92	rcu: Frob softirq test With RT_FULL we get the below wreckage: [ 126.060484] ======================================================= [ 126.060486] [ INFO: possible circular locking dependency detected ] [ 126.060489] 3.0.1-rt10+ #30 [ 126.060490] ------------------------------------------------------- [ 126.060492] irq/24-eth0/1235 is trying to acquire lock: [ 126.060495] (&(lock)->wait_lock#2){+.+...}, at: [<ffffffff81501c81>] rt_mutex_slowunlock+0x16/0x55 [ 126.060503] [ 126.060504] but task is already holding lock: [ 126.060506] (&p->pi_lock){-...-.}, at: [<ffffffff81074fdc>] try_to_wake_up+0x35/0x429 [ 126.060511] [ 126.060511] which lock already depends on the new lock. [ 126.060513] [ 126.060514] [ 126.060514] the existing dependency chain (in reverse order) is: [ 126.060516] [ 126.060516] -> #1 (&p->pi_lock){-...-.}: [ 126.060519] [<ffffffff810afe9e>] lock_acquire+0x145/0x18a [ 126.060524] [<ffffffff8150291e>] _raw_spin_lock_irqsave+0x4b/0x85 [ 126.060527] [<ffffffff810b5aa4>] task_blocks_on_rt_mutex+0x36/0x20f [ 126.060531] [<ffffffff815019bb>] rt_mutex_slowlock+0xd1/0x15a [ 126.060534] [<ffffffff81501ae3>] rt_mutex_lock+0x2d/0x2f [ 126.060537] [<ffffffff810d9020>] rcu_boost+0xad/0xde [ 126.060541] [<ffffffff810d90ce>] rcu_boost_kthread+0x7d/0x9b [ 126.060544] [<ffffffff8109a760>] kthread+0x99/0xa1 [ 126.060547] [<ffffffff81509b14>] kernel_thread_helper+0x4/0x10 [ 126.060551] [ 126.060552] -> #0 (&(lock)->wait_lock#2){+.+...}: [ 126.060555] [<ffffffff810af1b8>] __lock_acquire+0x1157/0x1816 [ 126.060558] [<ffffffff810afe9e>] lock_acquire+0x145/0x18a [ 126.060561] [<ffffffff8150279e>] _raw_spin_lock+0x40/0x73 [ 126.060564] [<ffffffff81501c81>] rt_mutex_slowunlock+0x16/0x55 [ 126.060566] [<ffffffff81501ce7>] rt_mutex_unlock+0x27/0x29 [ 126.060569] [<ffffffff810d9f86>] rcu_read_unlock_special+0x17e/0x1c4 [ 126.060573] [<ffffffff810da014>] __rcu_read_unlock+0x48/0x89 [ 126.060576] [<ffffffff8106847a>] select_task_rq_rt+0xc7/0xd5 [ 126.060580] [<ffffffff8107511c>] try_to_wake_up+0x175/0x429 [ 126.060583] [<ffffffff81075425>] wake_up_process+0x15/0x17 [ 126.060585] [<ffffffff81080a51>] wakeup_softirqd+0x24/0x26 [ 126.060590] [<ffffffff81081df9>] irq_exit+0x49/0x55 [ 126.060593] [<ffffffff8150a3bd>] smp_apic_timer_interrupt+0x8a/0x98 [ 126.060597] [<ffffffff81509793>] apic_timer_interrupt+0x13/0x20 [ 126.060600] [<ffffffff810d5952>] irq_forced_thread_fn+0x1b/0x44 [ 126.060603] [<ffffffff810d582c>] irq_thread+0xde/0x1af [ 126.060606] [<ffffffff8109a760>] kthread+0x99/0xa1 [ 126.060608] [<ffffffff81509b14>] kernel_thread_helper+0x4/0x10 [ 126.060611] [ 126.060612] other info that might help us debug this: [ 126.060614] [ 126.060615] Possible unsafe locking scenario: [ 126.060616] [ 126.060617] CPU0 CPU1 [ 126.060619] ---- ---- [ 126.060620] lock(&p->pi_lock); [ 126.060623] lock(&(lock)->wait_lock); [ 126.060625] lock(&p->pi_lock); [ 126.060627] lock(&(lock)->wait_lock); [ 126.060629] [ 126.060629] * DEADLOCK * [ 126.060630] [ 126.060632] 1 lock held by irq/24-eth0/1235: [ 126.060633] #0: (&p->pi_lock){-...-.}, at: [<ffffffff81074fdc>] try_to_wake_up+0x35/0x429 [ 126.060638] [ 126.060638] stack backtrace: [ 126.060641] Pid: 1235, comm: irq/24-eth0 Not tainted 3.0.1-rt10+ #30 [ 126.060643] Call Trace: [ 126.060644] <IRQ> [<ffffffff810acbde>] print_circular_bug+0x289/0x29a [ 126.060651] [<ffffffff810af1b8>] __lock_acquire+0x1157/0x1816 [ 126.060655] [<ffffffff810ab3aa>] ? trace_hardirqs_off_caller+0x1f/0x99 [ 126.060658] [<ffffffff81501c81>] ? rt_mutex_slowunlock+0x16/0x55 [ 126.060661] [<ffffffff810afe9e>] lock_acquire+0x145/0x18a [ 126.060664] [<ffffffff81501c81>] ? rt_mutex_slowunlock+0x16/0x55 [ 126.060668] [<ffffffff8150279e>] _raw_spin_lock+0x40/0x73 [ 126.060671] [<ffffffff81501c81>] ? rt_mutex_slowunlock+0x16/0x55 [ 126.060674] [<ffffffff810d9655>] ? rcu_report_qs_rsp+0x87/0x8c [ 126.060677] [<ffffffff81501c81>] rt_mutex_slowunlock+0x16/0x55 [ 126.060680] [<ffffffff810d9ea3>] ? rcu_read_unlock_special+0x9b/0x1c4 [ 126.060683] [<ffffffff81501ce7>] rt_mutex_unlock+0x27/0x29 [ 126.060687] [<ffffffff810d9f86>] rcu_read_unlock_special+0x17e/0x1c4 [ 126.060690] [<ffffffff810da014>] __rcu_read_unlock+0x48/0x89 [ 126.060693] [<ffffffff8106847a>] select_task_rq_rt+0xc7/0xd5 [ 126.060696] [<ffffffff810683da>] ? select_task_rq_rt+0x27/0xd5 [ 126.060701] [<ffffffff810a852a>] ? clockevents_program_event+0x8e/0x90 [ 126.060704] [<ffffffff8107511c>] try_to_wake_up+0x175/0x429 [ 126.060708] [<ffffffff810a95dc>] ? tick_program_event+0x1f/0x21 [ 126.060711] [<ffffffff81075425>] wake_up_process+0x15/0x17 [ 126.060715] [<ffffffff81080a51>] wakeup_softirqd+0x24/0x26 [ 126.060718] [<ffffffff81081df9>] irq_exit+0x49/0x55 [ 126.060721] [<ffffffff8150a3bd>] smp_apic_timer_interrupt+0x8a/0x98 [ 126.060724] [<ffffffff81509793>] apic_timer_interrupt+0x13/0x20 [ 126.060726] <EOI> [<ffffffff81072855>] ? migrate_disable+0x75/0x12d [ 126.060733] [<ffffffff81080a61>] ? local_bh_disable+0xe/0x1f [ 126.060736] [<ffffffff81080a70>] ? local_bh_disable+0x1d/0x1f [ 126.060739] [<ffffffff810d5952>] irq_forced_thread_fn+0x1b/0x44 [ 126.060742] [<ffffffff81502ac0>] ? _raw_spin_unlock_irq+0x3b/0x59 [ 126.060745] [<ffffffff810d582c>] irq_thread+0xde/0x1af [ 126.060748] [<ffffffff810d5937>] ? irq_thread_fn+0x3a/0x3a [ 126.060751] [<ffffffff810d574e>] ? irq_finalize_oneshot+0xd1/0xd1 [ 126.060754] [<ffffffff810d574e>] ? irq_finalize_oneshot+0xd1/0xd1 [ 126.060757] [<ffffffff8109a760>] kthread+0x99/0xa1 [ 126.060761] [<ffffffff81509b14>] kernel_thread_helper+0x4/0x10 [ 126.060764] [<ffffffff81069ed7>] ? finish_task_switch+0x87/0x10a [ 126.060768] [<ffffffff81502ec4>] ? retint_restore_args+0xe/0xe [ 126.060771] [<ffffffff8109a6c7>] ? __init_kthread_worker+0x8c/0x8c [ 126.060774] [<ffffffff81509b10>] ? gs_change+0xb/0xb Because irq_exit() does: void irq_exit(void) { account_system_vtime(current); trace_hardirq_exit(); sub_preempt_count(IRQ_EXIT_OFFSET); if (!in_interrupt() && local_softirq_pending()) invoke_softirq(); ... } Which triggers a wakeup, which uses RCU, now if the interrupted task has t->rcu_read_unlock_special set, the rcu usage from the wakeup will end up in rcu_read_unlock_special(). rcu_read_unlock_special() will test for in_irq(), which will fail as we just decremented preempt_count with IRQ_EXIT_OFFSET, and in_sering_softirq(), which for PREEMPT_RT_FULL reads: int in_serving_softirq(void) { int res; preempt_disable(); res = __get_cpu_var(local_softirq_runner) == current; preempt_enable(); return res; } Which will thus also fail, resulting in the above wreckage. The 'somewhat' ugly solution is to open-code the preempt_count() test in rcu_read_unlock_special(). Also, we're not at all sure how ->rcu_read_unlock_special gets set here... so this is very likely a bandaid and more thought is required. Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>	2020-10-14 00:59:17 +03:00
Sebastian Andrzej Siewior	4ddcb3378a	timer: do not spin_trylock() on UP This will void a warning comming from the spin-lock debugging code. The lock avoiding idea is from Steven Rostedt. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:17 +03:00
Sebastian Andrzej Siewior	493d266dbd	rtmutex: use a trylock for waiter lock in trylock Mike Galbraith captered the following: \| >#11 [ffff88017b243e90] _raw_spin_lock at ffffffff815d2596 \| >#12 [ffff88017b243e90] rt_mutex_trylock at ffffffff815d15be \| >#13 [ffff88017b243eb0] get_next_timer_interrupt at ffffffff81063b42 \| >#14 [ffff88017b243f00] tick_nohz_stop_sched_tick at ffffffff810bd1fd \| >#15 [ffff88017b243f70] tick_nohz_irq_exit at ffffffff810bd7d2 \| >#16 [ffff88017b243f90] irq_exit at ffffffff8105b02d \| >#17 [ffff88017b243fb0] reschedule_interrupt at ffffffff815db3dd \| >--- <IRQ stack> --- \| >#18 [ffff88017a2a9bc8] reschedule_interrupt at ffffffff815db3dd \| > [exception RIP: task_blocks_on_rt_mutex+51] \| >#19 [ffff88017a2a9ce0] rt_spin_lock_slowlock at ffffffff815d183c \| >#20 [ffff88017a2a9da0] lock_timer_base.isra.35 at ffffffff81061cbf \| >#21 [ffff88017a2a9dd0] schedule_timeout at ffffffff815cf1ce \| >#22 [ffff88017a2a9e50] rcu_gp_kthread at ffffffff810f9bbb \| >#23 [ffff88017a2a9ed0] kthread at ffffffff810796d5 \| >#24 [ffff88017a2a9f50] ret_from_fork at ffffffff815da04c lock_timer_base() does a try_lock() which deadlocks on the waiter lock not the lock itself. This patch takes the waiter_lock with trylock so it should work from interrupt context as well. If the fastpath doesn't work and the waiter_lock itself is taken then it seems that the lock itself taken. This patch also adds "rt_spin_unlock_after_trylock_in_irq" to keep lockdep happy. If we managed to take the wait_lock in the first place we should also be able to take it in the unlock path. Cc: stable-rt@vger.kernel.org Reported-by: Mike Galbraith <bitbucket@online.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:17 +03:00
Steven Rostedt	dd2550f702	timer/rt: Always raise the softirq if there's irq_work to be done It was previously discovered that some systems would hang on boot up with a previous version of 3.12-rt. This was due to RCU using irq_work, and RT defers the irq_work to a softirq. But if there's no active timers, the softirq will not be raised, and RCU work will not get done, causing the system to hang. The fix was to check that if there was no active timers but irq_work to be done, then we should raise the softirq. But this fix was not 100% correct. It left out the case that there were active timers that were not expired yet. This would have the softirq not get raised even if there was irq work to be done. If there is irq_work to be done, then we must raise the timer softirq regardless of if there is active timers or whether they are expired or not. The softirq can handle those cases. But we can never ignore irq_work. As it is only PREEMPT_RT_FULL that requires irq_work to be done in the softirq, we can pull out the check in the active_timers condition, and make the code a bit cleaner by having the irq_work check separate, and put the code in with the other #ifdef PREEMPT_RT. If there is irq_work to be done, there's no need to check the active timers or if they are expired. Just raise the time softirq and be done with it. Otherwise, we can do the timer checks just like we do with non -rt. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:17 +03:00
Steven Rostedt	98916bc05c	timer: Raise softirq if there's irq_work [ Talking with Sebastian on IRC, it seems that doing the irq_work_run() from the interrupt in -rt is a bad thing. Here we simply raise the softirq if there's irq work to do. This too boots on my i7 ] After trying hard to figure out why my i7 box was locking up with the new active_timers code, that does not run the timer softirq if there are no active timers, I took an extra look at the softirq handler and noticed that it doesn't just run timer softirqs, it also runs irq work. This was the bug that was locking up the system. It wasn't missing a timer, it was missing irq work. By always doing the irq work callbacks, the system boots fine. The missing irq work callback was the RCU's sp_wakeup() function. No need to check for defined(CONFIG_IRQ_WORK). When that's not set the "irq_work_needs_cpu()" is a static inline that returns false. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:17 +03:00
Thomas Gleixner	d3b8578de8	timers: do not raise softirq unconditionally Mike, On Thu, 7 Nov 2013, Mike Galbraith wrote: > On Thu, 2013-11-07 at 04:26 +0100, Mike Galbraith wrote: > > On Wed, 2013-11-06 at 18:49 +0100, Thomas Gleixner wrote: > > > > I bet you are trying to work around some of the side effects of the > > > occasional tick which is still necessary despite of full nohz, right? > > > > Nope, I wanted to check out cost of nohz_full for rt, and found that it > > doesn't work at all instead, looked, and found that the sole running > > task has just awakened ksoftirqd when it wants to shut the tick down, so > > that shutdown never happens. > > Like so in virgin 3.10-rt. Box is x3550 M3 booted nowatchdog > rcu_nocbs=1-3 nohz_full=1-3, and CPUs1-3 are completely isolated via > cpusets as well. well, that very same problem is in mainline if you add "threadirqs" to the command line. But we can be smart about this. The untested patch below should address that issue. If that works on mainline we can adapt it for RT (needs a trylock(&base->lock) there). Though it's not a full solution. It needs some thought versus the softirq code of timers. Assume we have only one timer queued 1000 ticks into the future. So this change will cause the timer softirq not to be called until that timer expires and then the timer softirq is going to do 1000 loops until it catches up with jiffies. That's anything but pretty ... What worries me more is this one: pert-5229 [003] d..h1.. 684.482618: softirq_raise: vec=9 [action=RCU] The CPU has no callbacks as you shoved them over to cpu 0, so why is the RCU softirq raised? Thanks, tglx ------------------ Message-id: <alpine.DEB.2.02.1311071158350.23353@ionos.tec.linutronix.de> \|CONFIG_NO_HZ_FULL + CONFIG_PREEMPT_RT_FULL = nogo Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:17 +03:00
Thomas Gleixner	8c09ef3177	timer-handle-idle-trylock-in-get-next-timer-irq.patch Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:17 +03:00
John Kacur	d4001bfc52	rwlocks: Fix section mismatch This fixes the following build error for the preempt-rt kernel. make kernel/fork.o CC kernel/fork.o kernel/fork.c:90: error: section of tasklist_lock conflicts with previous declaration make[2]: * [kernel/fork.o] Error 1 make[1]: * [kernel/fork.o] Error 2 The rt kernel cache aligns the RWLOCK in DEFINE_RWLOCK by default. The non-rt kernels explicitly cache align only the tasklist_lock in kernel/fork.c That can create a build conflict. This fixes the build problem by making the non-rt kernels cache align RWLOCKs by default. The side effect is that the other RWLOCKs are also cache aligned for non-rt. This is a short term solution for rt only. The longer term solution would be to push the cache aligned DEFINE_RWLOCK to mainline. If there are objections, then we could create a DEFINE_RWLOCK_CACHE_ALIGNED or something of that nature. Comments? Objections? Signed-off-by: John Kacur <jkacur@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/alpine.LFD.2.00.1109191104010.23118@localhost6.localdomain6 Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-10-14 00:59:17 +03:00
Nicholas Mc Guire	8bd895ef4c	bad return value in __mutex_lock_check_stamp Bad return value in _mutex_lock_check_stamp - this problem only would show up with 3.12.1 rt4 applied but CONFIG_PREEMPT_RT_FULL not enabled currently it would be returning what ever vprintk_emit ended up with (atleast on x86), which probably is not the intended behavior. Added a return 0; as in the case with CONFIG_PREEMPT_RT_FULL enabled. Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:17 +03:00
Sebastian Andrzej Siewior	a367c3cbb5	rtmutex: add a first shot of ww_mutex lockdep says: \| -------------------------------------------------------------------------- \| \| Wound/wait tests \| \| --------------------- \| ww api failures: ok \| ok \| ok \| \| ww contexts mixing: ok \| ok \| \| finishing ww context: ok \| ok \| ok \| ok \| \| locking mismatches: ok \| ok \| ok \| \| EDEADLK handling: ok \| ok \| ok \| ok \| ok \| ok \| ok \| ok \| ok \| ok \| \| spinlock nest unlocked: ok \| \| ----------------------------------------------------- \| \|block \| try \|context\| \| ----------------------------------------------------- \| context: ok \| ok \| ok \| \| try: ok \| ok \| ok \| \| block: ok \| ok \| ok \| \| spinlock: ok \| ok \| ok \| Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>	2020-10-14 00:59:17 +03:00
Sebastian Andrzej Siewior	6dc75cfbe3	percpu-rwsem: compile fix The shortcut on mainline skip lockdep. No idea why this is a good thing. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:17 +03:00
Nicholas Mc Guire	a9200d7cde	rt: Cleanup of unnecessary do while 0 in read/write _lock() With the migration pushdonw a few of the do{ }while(0) loops became obsolete but got left over - this patch only removes this fallout. Patch applies on top of 3.12.9-rt13 Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2020-10-14 00:59:17 +03:00
Steven Rostedt	6ab7c49428	rwlock: disable migration before taking a lock If there's no complaints about it. I'm going to add this to the 3.12-rt stable tree. As without it, it fails horribly with the cpu hotplug stress test, and I wont release a stable kernel that does that. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2020-10-14 00:59:17 +03:00

1 2 3 4 5 ...

431653 Commits All Branches Search

431653 Commits

All Branches