Commit Graph

431755 Commits

Author SHA1 Message Date
Mike Galbraith 53d7783841 x86: UV: raw_spinlock conversion
Shrug.  Lots of hobbyists have a beast in their basement, right?

Cc: stable-rt@vger.kernel.org
Signed-off-by: Mike Galbraith <mgalbraith@suse.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2020-10-14 00:59:24 +03:00
Gustavo Bittencourt dc678d68b7 rtmutex: enable deadlock detection in ww_mutex_lock functions
The functions ww_mutex_lock_interruptible and ww_mutex_lock should return -EDEADLK when faced with
a deadlock. To do so, the paramenter detect_deadlock in rt_mutex_slowlock must be TRUE.
This patch corrects potential deadlocks when running PREEMPT_RT with nouveau driver.

Cc: stable-rt@vger.kernel.org
Signed-off-by: Gustavo Bittencourt <gbitten@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2020-10-14 00:59:24 +03:00
Mike Galbraith 0d54782ed8 rt,locking: fix __ww_mutex_lock_interruptible() lockdep annotation
Using mutex_acquire_nest() as used in __ww_mutex_lock() fixes the
splat below.  Remove superfluous line break in __ww_mutex_lock()
as well.

|=============================================
|[ INFO: possible recursive locking detected ]
|3.14.4-rt5 #26 Not tainted
|---------------------------------------------
|Xorg/4298 is trying to acquire lock:
| (reservation_ww_class_mutex){+.+.+.}, at: [<ffffffffa02b4270>] nouveau_gem_ioctl_pushbuf+0x870/0x19f0 [nouveau]
|but task is already holding lock:
| (reservation_ww_class_mutex){+.+.+.}, at: [<ffffffffa02b4270>] nouveau_gem_ioctl_pushbuf+0x870/0x19f0 [nouveau]
|other info that might help us debug this:
| Possible unsafe locking scenario:
|       CPU0
|       ----
|  lock(reservation_ww_class_mutex);
|  lock(reservation_ww_class_mutex);
|
| *** DEADLOCK ***
|
| May be due to missing lock nesting notation
|
|3 locks held by Xorg/4298:
| #0:  (&cli->mutex){+.+.+.}, at: [<ffffffffa02b597b>] nouveau_abi16_get+0x2b/0x100 [nouveau]
| #1:  (reservation_ww_class_acquire){+.+...}, at: [<ffffffffa0160cd2>] drm_ioctl+0x4d2/0x610 [drm]
| #2:  (reservation_ww_class_mutex){+.+.+.}, at: [<ffffffffa02b4270>] nouveau_gem_ioctl_pushbuf+0x870/0x19f0 [nouveau]

Cc: stable-rt@vger.kernel.org
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2020-10-14 00:59:24 +03:00
Brad Mouring c9ac083943 rtmutex.c: Fix incorrect waiter check
In task_blocks_on_lock, there's a null check on pi_blocked_on
of the task_struct. This pointer can encode the fact that the
task that contains the pointer is waking (preventing requeuing)
and therefore is non-null. Use the inline function to avoid
dereferencing an invalid "pointer"

Signed-off-by: Brad Mouring <brad.mouring@ni.com>
Reported-by: Ben Shelton <ben.shelton@ni.com>
Reviewed-by: T Makphaibulchoke <tmac@hp.com>
Tested-by: T Makphaibulchoke <tmac@hp.com>
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2020-10-14 00:59:24 +03:00
Sebastian Andrzej Siewior 7ffb28ebaa locking/rt-mutex: avoid a NULL pointer dereference on deadlock
With task_blocks_on_rt_mutex() returning early -EDEADLK we never add the
waiter to the waitqueue. Later, we try to remove it via remove_waiter()
and go boom in rt_mutex_top_waiter() because rb_entry() gives a NULL
pointer.
Tested on v3.18-RT where rtmutex is used for regular mutex and I tried
to get one twice in a row.

Not sure when this started but I guess 397335f00 ("rtmutex: Fix deadlock
detector for real") or commit 3d5c9340 ("rtmutex: Handle deadlock
detection smarter").

Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2020-10-14 00:59:24 +03:00
Thomas Gleixner ef453cd0d4 futex: Simplify futex_lock_pi_atomic() and make it more robust
upstream commit: af54d6a1c3

futex_lock_pi_atomic() is a maze of retry hoops and loops.

Reduce it to simple and understandable states:

First step is to lookup existing waiters (state) in the kernel.

If there is an existing waiter, validate it and attach to it.

If there is no existing waiter, check the user space value

If the TID encoded in the user space value is 0, take over the futex
preserving the owner died bit.

If the TID encoded in the user space value is != 0, lookup the owner
task, validate it and attach to it.

Reduces text size by 128 bytes on x8664.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Kees Cook <kees@outflux.net>
Cc: wad@chromium.org
Cc: Darren Hart <darren@dvhart.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1406131137020.5170@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2020-10-14 00:59:24 +03:00
Thomas Gleixner c3cefde051 futex: Split out the first waiter attachment from lookup_pi_state()
upstream commit: 04e1b2e52b

We want to be a bit more clever in futex_lock_pi_atomic() and separate
the possible states. Split out the code which attaches the first
waiter to the owner into a separate function. No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Darren Hart <darren@dvhart.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Kees Cook <kees@outflux.net>
Cc: wad@chromium.org
Link: http://lkml.kernel.org/r/20140611204237.271300614@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2020-10-14 00:59:24 +03:00
Thomas Gleixner 16268bd779 futex: Split out the waiter check from lookup_pi_state()
upstream commit: e60cbc5cea

We want to be a bit more clever in futex_lock_pi_atomic() and separate
the possible states. Split out the waiter verification into a separate
function. No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Darren Hart <darren@dvhart.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Kees Cook <kees@outflux.net>
Cc: wad@chromium.org
Link: http://lkml.kernel.org/r/20140611204237.180458410@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2020-10-14 00:59:24 +03:00
Thomas Gleixner 9e62263a76 futex: Use futex_top_waiter() in lookup_pi_state()
upstream commit: bd1dbcc67c

No point in open coding the same function again.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Darren Hart <darren@dvhart.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Kees Cook <kees@outflux.net>
Cc: wad@chromium.org
Link: http://lkml.kernel.org/r/20140611204237.092947239@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2020-10-14 00:59:24 +03:00
Thomas Gleixner 5e3746259e futex: Make unlock_pi more robust
upstream commit: ccf9e6a80d

The kernel tries to atomically unlock the futex without checking
whether there is kernel state associated to the futex.

So if user space manipulated the user space value, this will leave
kernel internal state around associated to the owner task.

For robustness sake, lookup first whether there are waiters on the
futex. If there are waiters, wake the top priority waiter with all the
proper sanity checks applied.

If there are no waiters, do the atomic release. We do not have to
preserve the waiters bit in this case, because a potentially incoming
waiter is blocked on the hb->lock and will acquire the futex
atomically. We neither have to preserve the owner died bit. The caller
is the owner and it was supposed to cleanup the mess.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Darren Hart <darren@dvhart.com>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Kees Cook <kees@outflux.net>
Cc: wad@chromium.org
Link: http://lkml.kernel.org/r/20140611204237.016987332@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2020-10-14 00:59:24 +03:00
Thomas Gleixner 41e57dbd40 rtmutex: Avoid pointless requeueing in the deadlock detection chain walk
upstream commit: 67792e2cab

In case the dead lock detector is enabled we follow the lock chain to
the end in rt_mutex_adjust_prio_chain, even if we could stop earlier
due to the priority/waiter constellation.

But once we are no longer the top priority waiter in a certain step
or the task holding the lock has already the same priority then there
is no point in dequeing and enqueing along the lock chain as there is
no change at all.

So stop the queueing at this point.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Link: http://lkml.kernel.org/r/20140522031950.280830190@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2020-10-14 00:59:24 +03:00
Thomas Gleixner 401bec21e8 rtmutex: Cleanup deadlock detector debug logic
upstream commit: 8930ed80f9

The conditions under which deadlock detection is conducted are unclear
and undocumented.

Add constants instead of using 0/1 and provide a selection function
which hides the additional debug dependency from the calling code.

Add comments where needed.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Link: http://lkml.kernel.org/r/20140522031949.947264874@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

Conflicts:
	kernel/locking/rtmutex.c
2020-10-14 00:59:24 +03:00
Thomas Gleixner 0f48389512 rtmutex: Confine deadlock logic to futex
upstream commit: c051b21f71

The deadlock logic is only required for futexes.

Remove the extra arguments for the public functions and also for the
futex specific ones which get always called with deadlock detection
enabled.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

Conflicts:
	include/linux/rtmutex.h
	kernel/locking/rtmutex.c
2020-10-14 00:59:23 +03:00
Thomas Gleixner 41e69f2371 rtmutex: Simplify remove_waiter()
upstream commit: 1ca7b86062

Exit right away, when the removed waiter was not the top priority
waiter on the lock. Get rid of the extra indent level.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

Conflicts:
	kernel/locking/rtmutex.c
2020-10-14 00:59:23 +03:00
Thomas Gleixner 5974d85ed0 rtmutex: Document pi chain walk
upstream commit: 3eb65aeadf

Add commentry to document the chain walk and the protection mechanisms
and their scope.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2020-10-14 00:59:23 +03:00
Thomas Gleixner af64a14546 rtmutex: Clarify the boost/deboost part
upstream commit: a57594a13a

Add a separate local variable for the boost/deboost logic to make the
code more readable. Add comments where appropriate.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

Conflicts:
	kernel/locking/rtmutex.c
2020-10-14 00:59:23 +03:00
Thomas Gleixner 5c1c8184e8 rtmutex: No need to keep task ref for lock owner check
upstream commit: 2ffa5a5cd2

There is no point to keep the task ref across the check for lock
owner. Drop the ref before that, so the protection context is clear.

Found while documenting the chain walk.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2020-10-14 00:59:23 +03:00
Thomas Gleixner b69f25004a rtmutex: Simplify and document try_to_take_rtmutex()
upstream commit: 358c331f39

The current implementation of try_to_take_rtmutex() is correct, but
requires more than a single brain twist to understand the clever
encoded conditionals.

Untangle it and document the cases proper.

Looks less efficient at the first glance, but actually reduces the
binary code size on x8664 by 80 bytes.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

Conflicts:
	kernel/locking/rtmutex.c
2020-10-14 00:59:23 +03:00
Thomas Gleixner ba1d3cbb24 rtmutex: Simplify rtmutex_slowtrylock()
upstream-commit: 88f2b4c15e

Oleg noticed that rtmutex_slowtrylock() has a pointless check for
rt_mutex_owner(lock) != current.

To avoid calling try_to_take_rtmutex() we really want to check whether
the lock has an owner at all or whether the trylock failed because the
owner is NULL, but the RT_MUTEX_HAS_WAITERS bit is set. This covers
the lock is owned by caller situation as well.

We can actually do this check lockless. trylock is taking a chance
whether we take lock->wait_lock to do the check or not.

Add comments to the function while at it.

Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

Conflicts:
	kernel/locking/rtmutex.c
2020-10-14 00:59:23 +03:00
Sebastian Andrzej Siewior 88f8541ddd gpio: omap: use raw locks for locking
This patch converts gpio_bank.lock from a spin_lock into a
raw_spin_lock. The call path is to access this lock is always under a
raw_spin_lock, for instance
- __setup_irq() holds &desc->lock with irq off
  + __irq_set_trigger()
   + omap_gpio_irq_type()

- handle_level_irq() (runs with irqs off therefore raw locks)
  + mask_ack_irq()
   + omap_gpio_mask_irq()

This fixes the obvious backtrace on -RT. However the locking vs context
is not and this is not limited to -RT:
- omap_gpio_irq_type() is called with IRQ off and has an conditional
  call to pm_runtime_get_sync() which may sleep. Either it may happen or
  it may not happen but pm_runtime_get_sync() should not be called with
  irqs off.

- omap_gpio_debounce() is holding the lock with IRQs off.
  + omap2_set_gpio_debounce()
   + clk_prepare_enable()
    + clk_prepare() this one might sleep.
  The number of users of gpiod_set_debounce() / gpio_set_debounce()
  looks low but still this is not good.

Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2020-10-14 00:59:23 +03:00
Thomas Gleixner 9673232a79 workqueue: Prevent deadlock/stall on RT
Austin reported a XFS deadlock/stall on RT where scheduled work gets
never exececuted and tasks are waiting for each other for ever.

The underlying problem is the modification of the RT code to the
handling of workers which are about to go to sleep. In mainline a
worker thread which goes to sleep wakes an idle worker if there is
more work to do. This happens from the guts of the schedule()
function. On RT this must be outside and the accessed data structures
are not protected against scheduling due to the spinlock to rtmutex
conversion. So the naive solution to this was to move the code outside
of the scheduler and protect the data structures by the pool
lock. That approach turned out to be a little naive as we cannot call
into that code when the thread blocks on a lock, as it is not allowed
to block on two locks in parallel. So we dont call into the worker
wakeup magic when the worker is blocked on a lock, which causes the
deadlock/stall observed by Austin and Mike.

Looking deeper into that worker code it turns out that the only
relevant data structure which needs to be protected is the list of
idle workers which can be woken up.

So the solution is to protect the list manipulation operations with
preempt_enable/disable pairs on RT and call unconditionally into the
worker code even when the worker is blocked on a lock. The preemption
protection is safe as there is nothing which can fiddle with the list
outside of thread context.

Reported-and_tested-by: Austin Schuh <austin@peloton-tech.com>
Reported-and_tested-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://vger.kernel.org/r/alpine.DEB.2.10.1406271249510.5170@nanos
Cc: Richard Weinberger <richard.weinberger@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: stable-rt@vger.kernel.org
2020-10-14 00:59:23 +03:00
Steven Rostedt 504c1e6c4a sched: Do not clear PF_NO_SETAFFINITY flag in select_fallback_rq()
I talked with Peter Zijlstra about this, and he told me that the clearing
of the PF_NO_SETAFFINITY flag was to deal with the optimization of
migrate_disable/enable() that ignores tasks that have that flag set. But
that optimization was removed when I did a rework of the cpu hotplug code.

I found that ignoring tasks that had that flag set would cause those tasks
to not sync with the hotplug code and cause the kernel to crash. Thus it
needed to not treat them special and those tasks had to go though the same
work as tasks without that flag set.

Now that those tasks are not treated special, there's no reason to clear the
flag.

May still need to be tested as the migrate_me() code does not ignore those
flags.

Cc: stable-rt@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Clark Williams <williams@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140701111444.0cfebaa1@gandalf.local.home
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2020-10-14 00:59:23 +03:00
Sebastian Andrzej Siewior cd93a88a67 disable preempt lazy on x86-64
it still explodes

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2020-10-14 00:59:23 +03:00
Sebastian Andrzej Siewior 28abbe8efe md: disable bcache
It uses anon semaphores
|drivers/md/bcache/request.c: In function ‘cached_dev_write_complete’:
|drivers/md/bcache/request.c:1007:2: error: implicit declaration of function ‘up_read_non_owner’ [-Werror=implicit-function-declaration]
|  up_read_non_owner(&dc->writeback_lock);
|  ^
|drivers/md/bcache/request.c: In function ‘request_write’:
|drivers/md/bcache/request.c:1033:2: error: implicit declaration of function ‘down_read_non_owner’ [-Werror=implicit-function-declaration]
|  down_read_non_owner(&dc->writeback_lock);
|  ^

either we get rid of those or we have to introduce them…

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2020-10-14 00:59:23 +03:00
Steven Rostedt 646a8ab0b1 rt,ntp: Move call to schedule_delayed_work() to helper thread
The ntp code for notify_cmos_timer() is called from a hard interrupt
context. schedule_delayed_work() under PREEMPT_RT_FULL calls spinlocks
that have been converted to mutexes, thus calling schedule_delayed_work()
from interrupt is not safe.

Add a helper thread that does the call to schedule_delayed_work and wake
up that thread instead of calling schedule_delayed_work() directly.
This is only for CONFIG_PREEMPT_RT_FULL, otherwise the code still calls
schedule_delayed_work() directly in irq context.

Note: There's a few places in the kernel that do this. Perhaps the RT
code should have a dedicated thread that does the checks. Just register
a notifier on boot up for your check and wake up the thread when
needed. This will be a todo.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2020-10-14 00:59:23 +03:00
Sebastian Andrzej Siewior b1fcb3c08e a few open coded completions
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2020-10-14 00:59:23 +03:00
Thomas Gleixner 0d3b12ccc6 completion: Use simple wait queues
Completions have no long lasting callbacks and therefor do not need
the complex waitqueue variant. Use simple waitqueues which reduces the
contention on the waitqueue lock.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2020-10-14 00:59:23 +03:00
Thomas Gleixner a4203240fa rcu-more-swait-conversions.patch
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Merged Steven's

 static void rcu_nocb_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp) {
-       swait_wake(&rnp->nocb_gp_wq[rnp->completed & 0x1]);
+       wake_up_all(&rnp->nocb_gp_wq[rnp->completed & 0x1]);
 }

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2020-10-14 00:59:23 +03:00
Sebastian Andrzej Siewior 4437a7dee2 kernel/treercu: use a simple waitqueue
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2020-10-14 00:59:23 +03:00
Paul Gortmaker ca0179a36a simple-wait: rename and export the equivalent of waitqueue_active()
The function "swait_head_has_waiters()" was internalized into
wait-simple.c but it parallels the waitqueue_active of normal
waitqueue support. Given that there are over 150 waitqueue_active
users in drivers/ fs/ kernel/ and the like, lets make it globally
visible, and rename it to parallel the waitqueue_active accordingly.
We'll need to do this if we expect to expand its usage beyond RT.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2020-10-14 00:59:23 +03:00
Thomas Gleixner 65870d64d3 wait-simple: Rework for use with completions
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2020-10-14 00:59:23 +03:00
Thomas Gleixner 99076b3731 wait-simple: Simple waitqueue implementation
wait_queue is a swiss army knife and in most of the cases the
complexity is not needed. For RT waitqueues are a constant source of
trouble as we can't convert the head lock to a raw spinlock due to
fancy and long lasting callbacks.

Provide a slim version, which allows RT to replace wait queues. This
should go mainline as well, as it lowers memory consumption and
runtime overhead.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

smp_mb() added by Steven Rostedt to fix a race condition with swait
wakeups vs adding items to the list.
2020-10-14 00:59:22 +03:00
Sebastian Andrzej Siewior e49d664aa7 wait.h: include atomic.h
|  CC      init/main.o
|In file included from include/linux/mmzone.h:9:0,
|                 from include/linux/gfp.h:4,
|                 from include/linux/kmod.h:22,
|                 from include/linux/module.h:13,
|                 from init/main.c:15:
|include/linux/wait.h: In function ‘wait_on_atomic_t’:
|include/linux/wait.h:982:2: error: implicit declaration of function ‘atomic_read’ [-Werror=implicit-function-declaration]
|  if (atomic_read(val) == 0)
|  ^

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2020-10-14 00:59:22 +03:00
Sebastian Andrzej Siewior 0e144af33c drm/i915: drop trace_i915_gem_ring_dispatch on rt
This tracepoint is responsible for:

|[<814cc358>] __schedule_bug+0x4d/0x59
|[<814d24cc>] __schedule+0x88c/0x930
|[<814d3b90>] ? _raw_spin_unlock_irqrestore+0x40/0x50
|[<814d3b95>] ? _raw_spin_unlock_irqrestore+0x45/0x50
|[<810b57b5>] ? task_blocks_on_rt_mutex+0x1f5/0x250
|[<814d27d9>] schedule+0x29/0x70
|[<814d3423>] rt_spin_lock_slowlock+0x15b/0x278
|[<814d3786>] rt_spin_lock+0x26/0x30
|[<a00dced9>] gen6_gt_force_wake_get+0x29/0x60 [i915]
|[<a00e183f>] gen6_ring_get_irq+0x5f/0x100 [i915]
|[<a00b2a33>] ftrace_raw_event_i915_gem_ring_dispatch+0xe3/0x100 [i915]
|[<a00ac1b3>] i915_gem_do_execbuffer.isra.13+0xbd3/0x1430 [i915]
|[<810f8943>] ? trace_buffer_unlock_commit+0x43/0x60
|[<8113e8d2>] ? ftrace_raw_event_kmem_alloc+0xd2/0x180
|[<8101d063>] ? native_sched_clock+0x13/0x80
|[<a00acf29>] i915_gem_execbuffer2+0x99/0x280 [i915]
|[<a00114a3>] drm_ioctl+0x4c3/0x570 [drm]
|[<8101d0d9>] ? sched_clock+0x9/0x10
|[<a00ace90>] ? i915_gem_execbuffer+0x480/0x480 [i915]
|[<810f1c18>] ? rb_commit+0x68/0xa0
|[<810f1c6c>] ? ring_buffer_unlock_commit+0x1c/0xa0
|[<81197467>] do_vfs_ioctl+0x97/0x540
|[<81021318>] ? ftrace_raw_event_sys_enter+0xd8/0x130
|[<811979a1>] sys_ioctl+0x91/0xb0
|[<814db931>] tracesys+0xe1/0xe6

Chris Wilson does not like to move i915_trace_irq_get() out of the macro

|No. This enables the IRQ, as well as making a number of
|very expensively serialised read, unconditionally.

so it is gone now on RT.

Cc: stable-rt@vger.kernel.org
Reported-by: Joakim Hernberg <jbh@alchemy.lu>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2020-10-14 00:59:22 +03:00
Sebastian Andrzej Siewior 5550291744 gpu/i915: don't open code these things
The opencode part is gone in 1f83fee0 ("drm/i915: clear up wedged transitions")
the owner check is still there.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2020-10-14 00:59:22 +03:00
Thomas Gleixner 11fd979912 mmci: Remove bogus local_irq_save()
On !RT interrupt runs with interrupts disabled. On RT it's in a
thread, so no need to disable interrupts at all.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2020-10-14 00:59:22 +03:00
Sebastian Andrzej Siewior b83d9c8fd4 i2c/omap: drop the lock hard irq context
The lock is taken while reading two registers. On RT the first lock is
taken in hard irq where it might sleep and in the threaded irq.
The threaded irq runs in oneshot mode so the hard irq does not run until
the thread the completes so there is no reason to grab the lock.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2020-10-14 00:59:22 +03:00
Sebastian Andrzej Siewior 16a70d5345 leds: trigger: disable CPU trigger on -RT
as it triggers:
|CPU: 0 PID: 0 Comm: swapper Not tainted 3.12.8-rt10 #141
|[<c0014aa4>] (unwind_backtrace+0x0/0xf8) from [<c0012788>] (show_stack+0x1c/0x20)
|[<c0012788>] (show_stack+0x1c/0x20) from [<c043c8dc>] (dump_stack+0x20/0x2c)
|[<c043c8dc>] (dump_stack+0x20/0x2c) from [<c004c5e8>] (__might_sleep+0x13c/0x170)
|[<c004c5e8>] (__might_sleep+0x13c/0x170) from [<c043f270>] (__rt_spin_lock+0x28/0x38)
|[<c043f270>] (__rt_spin_lock+0x28/0x38) from [<c043fa00>] (rt_read_lock+0x68/0x7c)
|[<c043fa00>] (rt_read_lock+0x68/0x7c) from [<c036cf74>] (led_trigger_event+0x2c/0x5c)
|[<c036cf74>] (led_trigger_event+0x2c/0x5c) from [<c036e0bc>] (ledtrig_cpu+0x54/0x5c)
|[<c036e0bc>] (ledtrig_cpu+0x54/0x5c) from [<c000ffd8>] (arch_cpu_idle_exit+0x18/0x1c)
|[<c000ffd8>] (arch_cpu_idle_exit+0x18/0x1c) from [<c00590b8>] (cpu_startup_entry+0xa8/0x234)
|[<c00590b8>] (cpu_startup_entry+0xa8/0x234) from [<c043b2cc>] (rest_init+0xb8/0xe0)
|[<c043b2cc>] (rest_init+0xb8/0xe0) from [<c061ebe0>] (start_kernel+0x2c4/0x380)

Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2020-10-14 00:59:22 +03:00
Thomas Gleixner 322d3dd7f7 powerpc-preempt-lazy-support.patch
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2020-10-14 00:59:22 +03:00
Thomas Gleixner de01f48d58 arm-preempt-lazy-support.patch
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2020-10-14 00:59:22 +03:00
Thomas Gleixner 5ccf423693 x86-preempt-lazy.patch
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2020-10-14 00:59:22 +03:00
Thomas Gleixner b33545443e sched: Add support for lazy preemption
It has become an obsession to mitigate the determinism vs. throughput
loss of RT. Looking at the mainline semantics of preemption points
gives a hint why RT sucks throughput wise for ordinary SCHED_OTHER
tasks. One major issue is the wakeup of tasks which are right away
preempting the waking task while the waking task holds a lock on which
the woken task will block right after having preempted the wakee. In
mainline this is prevented due to the implicit preemption disable of
spin/rw_lock held regions. On RT this is not possible due to the fully
preemptible nature of sleeping spinlocks.

Though for a SCHED_OTHER task preempting another SCHED_OTHER task this
is really not a correctness issue. RT folks are concerned about
SCHED_FIFO/RR tasks preemption and not about the purely fairness
driven SCHED_OTHER preemption latencies.

So I introduced a lazy preemption mechanism which only applies to
SCHED_OTHER tasks preempting another SCHED_OTHER task. Aside of the
existing preempt_count each tasks sports now a preempt_lazy_count
which is manipulated on lock acquiry and release. This is slightly
incorrect as for lazyness reasons I coupled this on
migrate_disable/enable so some other mechanisms get the same treatment
(e.g. get_cpu_light).

Now on the scheduler side instead of setting NEED_RESCHED this sets
NEED_RESCHED_LAZY in case of a SCHED_OTHER/SCHED_OTHER preemption and
therefor allows to exit the waking task the lock held region before
the woken task preempts. That also works better for cross CPU wakeups
as the other side can stay in the adaptive spinning loop.

For RT class preemption there is no change. This simply sets
NEED_RESCHED and forgoes the lazy preemption counter.

 Initial test do not expose any observable latency increasement, but
history shows that I've been proven wrong before :)

The lazy preemption mode is per default on, but with
CONFIG_SCHED_DEBUG enabled it can be disabled via:

 # echo NO_PREEMPT_LAZY >/sys/kernel/debug/sched_features

and reenabled via

 # echo PREEMPT_LAZY >/sys/kernel/debug/sched_features

The test results so far are very machine and workload dependent, but
there is a clear trend that it enhances the non RT workload
performance.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2020-10-14 00:59:22 +03:00
Sebastian Andrzej Siewior 1b1950518f rcu: make RCU_BOOST default on RT
Since it is no longer invoked from the softirq people run into OOM more
often if the priority of the RCU thread is too low. Making boosting
default on RT should help in those case and it can be switched off if
someone knows better.

Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2020-10-14 00:59:22 +03:00
Paul E. McKenney 21f9c8f24c rcu: Eliminate softirq processing from rcutree
Running RCU out of softirq is a problem for some workloads that would
like to manage RCU core processing independently of other softirq work,
for example, setting kthread priority.  This commit therefore moves the
RCU core work from softirq to a per-CPU/per-flavor SCHED_OTHER kthread
named rcuc.  The SCHED_OTHER approach avoids the scalability problems
that appeared with the earlier attempt to move RCU core processing to
from softirq to kthreads.  That said, kernels built with RCU_BOOST=y
will run the rcuc kthreads at the RCU-boosting priority.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2020-10-14 00:59:22 +03:00
Thomas Gleixner e69d00b006 rcu: Disable RCU_FAST_NO_HZ on RT
This uses a timer_list timer from the irq disabled guts of the idle
code. Disable it for now to prevent wreckage.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable-rt@vger.kernel.org
2020-10-14 00:59:22 +03:00
Nicholas Mc Guire f335b3cd6c softirq: make migrate disable/enable conditioned on softirq_nestcnt transition
This patch removes the recursive calls to migrate_disable/enable in
local_bh_disable/enable

the softirq-local-lock.patch introduces local_bh_disable/enable wich
decrements/increments the current->softirq_nestcnt and disable/enables
migration as well. as softirq_nestcnt (include/linux/sched.h conditioned
on CONFIG_PREEMPT_RT_BASE) already is tracking the nesting level of the
recursive calls to local_bh_disable/enable (all in kernel/softirq.c) - no
need to do it twice.

migrate_disable/enable thus can be conditionsed on softirq_nestcnt making
a transition from 0-1 to disable migration and 1-0 to re-enable it.

No change of functional behavior, this does noticably reduce the observed
nesting level of migrate_disable/enable

Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2020-10-14 00:59:22 +03:00
Thomas Gleixner e1bb862d7e softirq: Adapt NOHZ softirq pending check to new RT scheme
We can't rely on ksoftirqd anymore and we need to check the tasks
which run a particular softirq and if such a task is pi blocked ignore
the other pending bits of that task as well.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2020-10-14 00:59:22 +03:00
Nicholas Mc Guire d2f09d0ed6 API cleanup - use local_lock not __local_lock for soft
trivial API cleanup - kernel/softirq.c was mimiking local_lock.

No change of functional behavior

Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2020-10-14 00:59:22 +03:00
Thomas Gleixner f4d0ede105 softirq: Split softirq locks
The 3.x RT series removed the split softirq implementation in favour
of pushing softirq processing into the context of the thread which
raised it. Though this prevents us from handling the various softirqs
at different priorities. Now instead of reintroducing the split
softirq threads we split the locks which serialize the softirq
processing.

If a softirq is raised in context of a thread, then the softirq is
noted on a per thread field, if the thread is in a bh disabled
region. If the softirq is raised from hard interrupt context, then the
bit is set in the flag field of ksoftirqd and ksoftirqd is invoked.
When a thread leaves a bh disabled region, then it tries to execute
the softirqs which have been raised in its own context. It acquires
the per softirq / per cpu lock for the softirq and then checks,
whether the softirq is still pending in the per cpu
local_softirq_pending() field. If yes, it runs the softirq. If no,
then some other task executed it already. This allows for zero config
softirq elevation in the context of user space tasks or interrupt
threads.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2020-10-14 00:59:22 +03:00
Thomas Gleixner c8bec60fb0 softirq: Split handling function
Split out the inner handling function, so RT can reuse it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2020-10-14 00:59:22 +03:00