Commit Graph

892180 Commits

Author SHA1 Message Date
Peter Zijlstra 3b130445a5 mm, rt: kmap_atomic scheduling
In fact, with migrate_disable() existing one could play games with
kmap_atomic. You could save/restore the kmap_atomic slots on context
switch (if there are any in use of course), this should be esp easy now
that we have a kmap_atomic stack.

Something like the below.. it wants replacing all the preempt_disable()
stuff with pagefault_disable() && migrate_disable() of course, but then
you can flip kmaps around like below.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
[dvhart@linux.intel.com: build fix]
Link: http://lkml.kernel.org/r/1311842631.5890.208.camel@twins

[tglx@linutronix.de: Get rid of the per cpu variable and store the idx
		     and the pte content right away in the task struct.
		     Shortens the context switch code. ]
2023-03-25 04:21:32 +03:00
Sebastian Andrzej Siewior 3922736135 x86: Allow to enable RT
Allow to select RT.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:32 +03:00
David Miller 53755fc090 bpf/stackmap: Dont trylock mmap_sem with PREEMPT_RT and interrupts disabled
In a RT kernel down_read_trylock() cannot be used from NMI context and
up_read_non_owner() is another problematic issue.

So in such a configuration, simply elide the annotated stackmap and
just report the raw IPs.

In the longer term, it might be possible to provide a atomic friendly
versions of the page cache traversal which will at least provide the info
if the pages are resident and don't need to be paged in.

[ tglx: Use IS_ENABLED() to avoid the #ifdeffery, fixup the irq work
  	callback and add a comment ]

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:32 +03:00
Thomas Gleixner 36ee95a49d bpf, lpm: Make locking RT friendly
The LPM trie map cannot be used in contexts like perf, kprobes and tracing
as this map type dynamically allocates memory.

The memory allocation happens with a raw spinlock held which is a truly
spinning lock on a PREEMPT RT enabled kernel which disables preemption and
interrupts.

As RT does not allow memory allocation from such a section for various
reasons, convert the raw spinlock to a regular spinlock.

On a RT enabled kernel these locks are substituted by 'sleeping' spinlocks
which provide the proper protection but keep the code preemptible.

On a non-RT kernel regular spinlocks map to raw spinlocks, i.e. this does
not cause any functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:32 +03:00
Thomas Gleixner b1e96321f4 bpf: Prepare hashtab locking for PREEMPT_RT
PREEMPT_RT forbids certain operations like memory allocations (even with
GFP_ATOMIC) from atomic contexts. This is required because even with
GFP_ATOMIC the memory allocator calls into code pathes which acquire locks
with long held lock sections. To ensure the deterministic behaviour these
locks are regular spinlocks, which are converted to 'sleepable' spinlocks
on RT. The only true atomic contexts on an RT kernel are the low level
hardware handling, scheduling, low level interrupt handling, NMIs etc. None
of these contexts should ever do memory allocations.

As regular device interrupt handlers and soft interrupts are forced into
thread context, the existing code which does
  spin_lock*(); alloc(GPF_ATOMIC); spin_unlock*();
just works.

In theory the BPF locks could be converted to regular spinlocks as well,
but the bucket locks and percpu_freelist locks can be taken from arbitrary
contexts (perf, kprobes, tracepoints) which are required to be atomic
contexts even on RT. These mechanisms require preallocated maps, so there
is no need to invoke memory allocations within the lock held sections.

BPF maps which need dynamic allocation are only used from (forced) thread
context on RT and can therefore use regular spinlocks which in turn allows
to invoke memory allocations from the lock held section.

To achieve this make the hash bucket lock a union of a raw and a regular
spinlock and initialize and lock/unlock either the raw spinlock for
preallocated maps or the regular variant for maps which require memory
allocations.

On a non RT kernel this distinction is neither possible nor required.
spinlock maps to raw_spinlock and the extra code and conditional is
optimized out by the compiler. No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:32 +03:00
Thomas Gleixner 45fa008a46 bpf: Factor out hashtab bucket lock operations
As a preparation for making the BPF locking RT friendly, factor out the
hash bucket lock operations into inline functions. This allows to do the
necessary RT modification in one place instead of sprinkling it all over
the place. No functional change.

The now unused htab argument of the lock/unlock functions will be used in
the next step which adds PREEMPT_RT support.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:32 +03:00
Thomas Gleixner 0c01246e9e bpf: Replace open coded recursion prevention in sys_bpf()
The required protection is that the caller cannot be migrated to a
different CPU as these functions end up in places which take either a hash
bucket lock or might trigger a kprobe inside the memory allocator. Both
scenarios can lead to deadlocks. The deadlock prevention is per CPU by
incrementing a per CPU variable which temporarily blocks the invocation of
BPF programs from perf and kprobes.

Replace the open coded preempt_[dis|en]able and __this_cpu_[inc|dec] pairs
with the new helper functions. These functions are already prepared to make
BPF work on PREEMPT_RT enabled kernels. No functional change for !RT
kernels.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:32 +03:00
Thomas Gleixner 5b7f1c0fbf bpf: Provide recursion prevention helpers
The places which need to prevent the execution of trace type BPF programs
to prevent deadlocks on the hash bucket lock do this open coded.

Provide two inline functions, bpf_disable/enable_instrumentation() to
replace these open coded protection constructs.

Use migrate_disable/enable() instead of preempt_disable/enable() right away
so this works on RT enabled kernels. On a !RT kernel migrate_disable /
enable() are mapped to preempt_disable/enable().

These helpers use this_cpu_inc/dec() instead of __this_cpu_inc/dec() on an
RT enabled kernel because migrate disabled regions are preemptible and
preemption might hit in the middle of a RMW operation which can lead to
inconsistent state.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:31 +03:00
David Miller 01b782ec6a bpf: Use migrate_disable/enable in array macros and cgroup/lirc code.
Replace the preemption disable/enable with migrate_disable/enable() to
reflect the actual requirement and to allow PREEMPT_RT to substitute it
with an actual migration disable mechanism which does not disable
preemption.

Including the code paths that go via __bpf_prog_run_save_cb().

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:31 +03:00
David Miller 8884961ecd bpf/tests: Use migrate disable instead of preempt disable
Replace the preemption disable/enable with migrate_disable/enable() to
reflect the actual requirement and to allow PREEMPT_RT to substitute it
with an actual migration disable mechanism which does not disable
preemption.

[ tglx: Switched it over to migrate disable ]

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:31 +03:00
David Miller f6eb4c80c0 bpf: Use bpf_prog_run_pin_on_cpu() at simple call sites.
All of these cases are strictly of the form:

	preempt_disable();
	BPF_PROG_RUN(...);
	preempt_enable();

Replace this with bpf_prog_run_pin_on_cpu() which wraps BPF_PROG_RUN()
with:

	migrate_disable();
	BPF_PROG_RUN(...);
	migrate_enable();

On non RT enabled kernels this maps to preempt_disable/enable() and on RT
enabled kernels this solely prevents migration, which is sufficient as
there is no requirement to prevent reentrancy to any BPF program from a
preempting task. The only requirement is that the program stays on the same
CPU.

Therefore, this is a trivially correct transformation.

The seccomp loop does not need protection over the loop. It only needs
protection per BPF filter program

[ tglx: Converted to bpf_prog_run_pin_on_cpu() ]

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:31 +03:00
Thomas Gleixner 9cf419f6fb bpf: Replace cant_sleep() with cant_migrate()
As already discussed in the previous change which introduced
BPF_RUN_PROG_PIN_ON_CPU() BPF only requires to disable migration to
guarantee per CPUness.

If RT substitutes the preempt disable based migration protection then the
cant_sleep() check will obviously trigger as preemption is not disabled.

Replace it by cant_migrate() which maps to cant_sleep() on a non RT kernel
and will verify that migration is disabled on a full RT kernel.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:31 +03:00
Thomas Gleixner c8fed7b37d bpf: Provide bpf_prog_run_pin_on_cpu() helper
BPF programs require to run on one CPU to completion as they use per CPU
storage, but according to Alexei they don't need reentrancy protection as
obviously BPF programs running in thread context can always be 'preempted'
by hard and soft interrupts and instrumentation and the same program can
run concurrently on a different CPU.

The currently used mechanism to ensure CPUness is to wrap the invocation
into a preempt_disable/enable() pair. Disabling preemption is also
disabling migration for a task.

preempt_disable/enable() is used because there is no explicit way to
reliably disable only migration.

Provide a separate macro to invoke a BPF program which can be used in
migrateable task context.

It wraps BPF_PROG_RUN() in a migrate_disable/enable() pair which maps on
non RT enabled kernels to preempt_disable/enable(). On RT enabled kernels
this merely disables migration. Both methods ensure that the invoked BPF
program runs on one CPU to completion.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:31 +03:00
Thomas Gleixner 60f1dcfbe0 bpf: Dont iterate over possible CPUs with interrupts disabled
pcpu_freelist_populate() is disabling interrupts and then iterates over the
possible CPUs. The reason why this disables interrupts is to silence
lockdep because the invoked ___pcpu_freelist_push() takes spin locks.

Neither the interrupt disabling nor the locking are required in this
function because it's called during initialization and the resulting map is
not yet visible to anything.

Split out the actual push assignement into an inline, call it from the loop
and remove the interrupt disable.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:31 +03:00
Thomas Gleixner d1a724bf1e perf/bpf: Remove preempt disable around BPF invocation
The BPF invocation from the perf event overflow handler does not require to
disable preemption because this is called from NMI or at least hard
interrupt context which is already non-preemptible.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:31 +03:00
Thomas Gleixner ded67a610a bpf/trace: Remove redundant preempt_disable from trace_call_bpf()
Similar to __bpf_trace_run this is redundant because __bpf_trace_run() is
invoked from a trace point via __DO_TRACE() which already disables
preemption _before_ invoking any of the functions which are attached to a
trace point.

Remove it and add a cant_sleep() check.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:31 +03:00
Alexei Starovoitov 3afab04a11 bpf: disable preemption for bpf progs attached to uprobe
trace_call_bpf() no longer disables preemption on its own.
All callers of this function has to do it explicitly.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:31 +03:00
Thomas Gleixner bd22595aad bpf/trace: Remove EXPORT from trace_call_bpf()
All callers are built in. No point to export this.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:31 +03:00
Thomas Gleixner 98a5660da8 bpf/tracing: Remove redundant preempt_disable() in __bpf_trace_run()
__bpf_trace_run() disables preemption around the BPF_PROG_RUN() invocation.

This is redundant because __bpf_trace_run() is invoked from a trace point
via __DO_TRACE() which already disables preemption _before_ invoking any of
the functions which are attached to a trace point.

Remove it and add a cant_sleep() check.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:31 +03:00
Thomas Gleixner f067b3a092 bpf: Update locking comment in hashtab code
The comment where the bucket lock is acquired says:

  /* bpf_map_update_elem() can be called in_irq() */

which is not really helpful and aside of that it does not explain the
subtle details of the hash bucket locks expecially in the context of BPF
and perf, kprobes and tracing.

Add a comment at the top of the file which explains the protection scopes
and the details how potential deadlocks are prevented.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:31 +03:00
Thomas Gleixner 789b06d8de bpf: Enforce preallocation for instrumentation programs on RT
Aside of the general unsafety of run-time map allocation for
instrumentation type programs RT enabled kernels have another constraint:

The instrumentation programs are invoked with preemption disabled, but the
memory allocator spinlocks cannot be acquired in atomic context because
they are converted to 'sleeping' spinlocks on RT.

Therefore enforce map preallocation for these programs types when RT is
enabled.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:30 +03:00
Thomas Gleixner dbbbf81012 bpf: Tighten the requirements for preallocated hash maps
The assumption that only programs attached to perf NMI events can deadlock
on memory allocators is wrong. Assume the following simplified callchain:

 kmalloc() from regular non BPF context
  cache empty
   freelist empty
    lock(zone->lock);
     tracepoint or kprobe
      BPF()
       update_elem()
        lock(bucket)
          kmalloc()
           cache empty
            freelist empty
             lock(zone->lock);  <- DEADLOCK

There are other ways which do not involve locking to create wreckage:

 kmalloc() from regular non BPF context
  local_irq_save();
   ...
    obj = slab_first();
     kprobe()
      BPF()
       update_elem()
        lock(bucket)
         kmalloc()
          local_irq_save();
           ...
            obj = slab_first(); <- Same object as above ...

So preallocation _must_ be enforced for all variants of intrusive
instrumentation.

Unfortunately immediate enforcement would break backwards compatibility, so
for now such programs still are allowed to run, but a one time warning is
emitted in dmesg and the verifier emits a warning in the verifier log as
well so developers are made aware about this and can fix their programs
before the enforcement becomes mandatory.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:30 +03:00
Thomas Gleixner 54c6cc4bbe sched: Provide cant_migrate()
Some code pathes rely on preempt_disable() to prevent migration on a non RT
enabled kernel. These preempt_disable/enable() pairs are substituted by
migrate_disable/enable() pairs or other forms of RT specific protection. On
RT these protections prevent migration but not preemption. Obviously a
cant_sleep() check in such a section will trigger on RT because preemption
is not disabled.

Provide a cant_migrate() macro which maps to cant_sleep() on a non RT
kernel and an empty placeholder for RT for now. The placeholder will be
changed to a proper debug check along with the RT specific migration
protection mechanism.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200214161503.070487511@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:30 +03:00
Sebastian Andrzej Siewior a7cda995da apparmor: use a locallock instead preempt_disable()
get_buffers() disables preemption which acts as a lock for the per-CPU
variable. Since we can't disable preemption here on RT, a local_lock is
lock is used in order to remain on the same CPU and not to have more
than one user within the critical section.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:30 +03:00
Mike Galbraith e776981391 cpuset: Convert callback_lock to raw_spinlock_t
The two commits below add up to a cpuset might_sleep() splat for RT:

8447a0fee9 cpuset: convert callback_mutex to a spinlock
344736f29b cpuset: simplify cpuset_node_allowed API

BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:995
in_atomic(): 0, irqs_disabled(): 1, pid: 11718, name: cset
CPU: 135 PID: 11718 Comm: cset Tainted: G            E   4.10.0-rt1-rt #4
Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0056.R01.1409242327 09/24/2014
Call Trace:
 ? dump_stack+0x5c/0x81
 ? ___might_sleep+0xf4/0x170
 ? rt_spin_lock+0x1c/0x50
 ? __cpuset_node_allowed+0x66/0xc0
 ? ___slab_alloc+0x390/0x570 <disables IRQs>
 ? anon_vma_fork+0x8f/0x140
 ? copy_page_range+0x6cf/0xb00
 ? anon_vma_fork+0x8f/0x140
 ? __slab_alloc.isra.74+0x5a/0x81
 ? anon_vma_fork+0x8f/0x140
 ? kmem_cache_alloc+0x1b5/0x1f0
 ? anon_vma_fork+0x8f/0x140
 ? copy_process.part.35+0x1670/0x1ee0
 ? _do_fork+0xdd/0x3f0
 ? _do_fork+0xdd/0x3f0
 ? do_syscall_64+0x61/0x170
 ? entry_SYSCALL64_slow_path+0x25/0x25

The later ensured that a NUMA box WILL take callback_lock in atomic
context by removing the allocator and reclaim path __GFP_HARDWALL
usage which prevented such contexts from taking callback_mutex.

One option would be to reinstate __GFP_HARDWALL protections for
RT, however, as the 8447a0fee9 changelog states:

The callback_mutex is only used to synchronize reads/updates of cpusets'
flags and cpu/node masks. These operations should always proceed fast so
there's no reason why we can't use a spinlock instead of the mutex.

Cc: stable-rt@vger.kernel.org
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:30 +03:00
Mike Galbraith 085e434453 drm/i915/gt: use a LOCAL_IRQ_LOCK in __timeline_mark_lock()
Quoting drm/i915/gt: Mark up the nested engine-pm timeline lock as irqsafe

    We use a fake timeline->mutex lock to reassure lockdep that the timeline
    is always locked when emitting requests. However, the use inside
    __engine_park() may be inside hardirq and so lockdep now complains about
    the mixed irq-state of the nested locked. Disable irqs around the
    lockdep tracking to keep it happy.

This lockdep appeasement breaks RT because we take sleeping locks between
__timeline_mark_lock()/unlock().  Use a LOCAL_IRQ_LOCK instead.

Signed-off-by: Mike Galbraith <efaukt@gmx.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:30 +03:00
Sebastian Andrzej Siewior 2896df71c2 drm/i915: Drop the IRQ-off asserts
The lockdep_assert_irqs_disabled() check is needless. The previous
lockdep_assert_held() check ensures that the lock is acquired and while
the lock is acquired lockdep also prints a warning if the interrupts are
not disabled if they have to be.
These IRQ-off asserts trigger on PREEMPT_RT because the locks become
sleeping locks and do not really disable interrupts.

Remove lockdep_assert_irqs_disabled().

Reported-by: Clark Williams <williams@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:30 +03:00
Sebastian Andrzej Siewior 9e0274e3eb drm/i915: Don't disable interrupts for intel_engine_breadcrumbs_irq()
The function intel_engine_breadcrumbs_irq() is always invoked from an interrupt
handler and for that reason it invokes (as an optimisation) only spin_lock()
for locking assuming that the interrupts are already disabled. The
function intel_engine_signal_breadcrumbs() is provided to disable
interrupts while the former function is invoked so that assumption is
also true for callers from preemptible context.

On PREEMPT_RT local_irq_disable() really disables interrupts and this
forbids to invoke spin_lock() which becomes a sleeping spinlock.

This is also problematic with `threadirqs' in conjunction with
irq_work. With force threading the interrupt handler, the handler is
invoked with disabled BH but with interrupts enabled. This is okay and
the lock itself is never acquired in IRQ context. This changes with
irq_work (signal_irq_work()) which _still_ invokes
intel_engine_breadcrumbs_irq() from IRQ context. Lockdep should see this
and complain.

Acquire the locks in intel_engine_breadcrumbs_irq() with _irqsave()
suffix and let all callers invoke intel_engine_breadcrumbs_irq()
directly instead using intel_engine_signal_breadcrumbs().

Reported-by: Clark Williams <williams@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:30 +03:00
Sebastian Andrzej Siewior 565003ea0b drm/i915: skip DRM_I915_LOW_LEVEL_TRACEPOINTS with NOTRACE
The order of the header files is important. If this header file is
included after tracepoint.h was included then the NOTRACE here becomes a
nop. Currently this happens for two .c files which use the tracepoitns
behind DRM_I915_LOW_LEVEL_TRACEPOINTS.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:30 +03:00
Sebastian Andrzej Siewior 3a963c9877 drm/i915: disable tracing on -RT
Luca Abeni reported this:
| BUG: scheduling while atomic: kworker/u8:2/15203/0x00000003
| CPU: 1 PID: 15203 Comm: kworker/u8:2 Not tainted 4.19.1-rt3 #10
| Call Trace:
|  rt_spin_lock+0x3f/0x50
|  gen6_read32+0x45/0x1d0 [i915]
|  g4x_get_vblank_counter+0x36/0x40 [i915]
|  trace_event_raw_event_i915_pipe_update_start+0x7d/0xf0 [i915]

The tracing events use trace_i915_pipe_update_start() among other events
use functions acquire spin locks. A few trace points use
intel_get_crtc_scanline(), others use ->get_vblank_counter() wich also
might acquire a sleeping lock.

Based on this I don't see any other way than disable trace points on RT.

Cc: stable-rt@vger.kernel.org
Reported-by: Luca Abeni <lucabe72@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:30 +03:00
Mike Galbraith 698f0c7eec drm,i915: Use local_lock/unlock_irq() in intel_pipe_update_start/end()
[    8.014039] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:918
[    8.014041] in_atomic(): 0, irqs_disabled(): 1, pid: 78, name: kworker/u4:4
[    8.014045] CPU: 1 PID: 78 Comm: kworker/u4:4 Not tainted 4.1.7-rt7 #5
[    8.014055] Workqueue: events_unbound async_run_entry_fn
[    8.014059]  0000000000000000 ffff880037153748 ffffffff815f32c9 0000000000000002
[    8.014063]  ffff88013a50e380 ffff880037153768 ffffffff815ef075 ffff8800372c06c8
[    8.014066]  ffff8800372c06c8 ffff880037153778 ffffffff8107c0b3 ffff880037153798
[    8.014067] Call Trace:
[    8.014074]  [<ffffffff815f32c9>] dump_stack+0x4a/0x61
[    8.014078]  [<ffffffff815ef075>] ___might_sleep.part.93+0xe9/0xee
[    8.014082]  [<ffffffff8107c0b3>] ___might_sleep+0x53/0x80
[    8.014086]  [<ffffffff815f9064>] rt_spin_lock+0x24/0x50
[    8.014090]  [<ffffffff8109368b>] prepare_to_wait+0x2b/0xa0
[    8.014152]  [<ffffffffa016c04c>] intel_pipe_update_start+0x17c/0x300 [i915]
[    8.014156]  [<ffffffff81093b40>] ? prepare_to_wait_event+0x120/0x120
[    8.014201]  [<ffffffffa0158f36>] intel_begin_crtc_commit+0x166/0x1e0 [i915]
[    8.014215]  [<ffffffffa00c806d>] drm_atomic_helper_commit_planes+0x5d/0x1a0 [drm_kms_helper]
[    8.014260]  [<ffffffffa0171e9b>] intel_atomic_commit+0xab/0xf0 [i915]
[    8.014288]  [<ffffffffa00654c7>] drm_atomic_commit+0x37/0x60 [drm]
[    8.014298]  [<ffffffffa00c6fcd>] drm_atomic_helper_plane_set_property+0x8d/0xd0 [drm_kms_helper]
[    8.014301]  [<ffffffff815f77d9>] ? __ww_mutex_lock+0x39/0x40
[    8.014319]  [<ffffffffa0053b3d>] drm_mode_plane_set_obj_prop+0x2d/0x90 [drm]
[    8.014328]  [<ffffffffa00c8edb>] restore_fbdev_mode+0x6b/0xf0 [drm_kms_helper]
[    8.014337]  [<ffffffffa00cae49>] drm_fb_helper_restore_fbdev_mode_unlocked+0x29/0x80 [drm_kms_helper]
[    8.014346]  [<ffffffffa00caec2>] drm_fb_helper_set_par+0x22/0x50 [drm_kms_helper]
[    8.014390]  [<ffffffffa016dfba>] intel_fbdev_set_par+0x1a/0x60 [i915]
[    8.014394]  [<ffffffff81327dc4>] fbcon_init+0x4f4/0x580
[    8.014398]  [<ffffffff8139ef4c>] visual_init+0xbc/0x120
[    8.014401]  [<ffffffff813a1623>] do_bind_con_driver+0x163/0x330
[    8.014405]  [<ffffffff813a1b2c>] do_take_over_console+0x11c/0x1c0
[    8.014408]  [<ffffffff813236e3>] do_fbcon_takeover+0x63/0xd0
[    8.014410]  [<ffffffff81328965>] fbcon_event_notify+0x785/0x8d0
[    8.014413]  [<ffffffff8107c12d>] ? __might_sleep+0x4d/0x90
[    8.014416]  [<ffffffff810775fe>] notifier_call_chain+0x4e/0x80
[    8.014419]  [<ffffffff810779cd>] __blocking_notifier_call_chain+0x4d/0x70
[    8.014422]  [<ffffffff81077a06>] blocking_notifier_call_chain+0x16/0x20
[    8.014425]  [<ffffffff8132b48b>] fb_notifier_call_chain+0x1b/0x20
[    8.014428]  [<ffffffff8132d8fa>] register_framebuffer+0x21a/0x350
[    8.014439]  [<ffffffffa00cb164>] drm_fb_helper_initial_config+0x274/0x3e0 [drm_kms_helper]
[    8.014483]  [<ffffffffa016f1cb>] intel_fbdev_initial_config+0x1b/0x20 [i915]
[    8.014486]  [<ffffffff8107912c>] async_run_entry_fn+0x4c/0x160
[    8.014490]  [<ffffffff81070ffa>] process_one_work+0x14a/0x470
[    8.014493]  [<ffffffff81071489>] worker_thread+0x169/0x4c0
[    8.014496]  [<ffffffff81071320>] ? process_one_work+0x470/0x470
[    8.014499]  [<ffffffff81076606>] kthread+0xc6/0xe0
[    8.014502]  [<ffffffff81070000>] ? queue_work_on+0x80/0x110
[    8.014506]  [<ffffffff81076540>] ? kthread_worker_fn+0x1c0/0x1c0

Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-rt-users <linux-rt-users@vger.kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2023-03-25 04:21:30 +03:00
Mike Galbraith 2e63c81011 drm,radeon,i915: Use preempt_disable/enable_rt() where recommended
DRM folks identified the spots, so use them.

Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-rt-users <linux-rt-users@vger.kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2023-03-25 04:21:30 +03:00
Sebastian Andrzej Siewior 5b484d670d lockdep: disable self-test
The self-test wasn't always 100% accurate for RT. We disabled a few
tests which failed because they had a different semantic for RT. Some
still reported false positives. Now the selftest locks up the system
during boot and it needs to be investigated…

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:30 +03:00
Josh Cartwright 33169b7fd8 lockdep: selftest: fix warnings due to missing PREEMPT_RT conditionals
"lockdep: Selftest: Only do hardirq context test for raw spinlock"
disabled the execution of certain tests with PREEMPT_RT, but did
not prevent the tests from still being defined.  This leads to warnings
like:

  ./linux/lib/locking-selftest.c:574:1: warning: 'irqsafe1_hard_rlock_12' defined but not used [-Wunused-function]
  ./linux/lib/locking-selftest.c:574:1: warning: 'irqsafe1_hard_rlock_21' defined but not used [-Wunused-function]
  ./linux/lib/locking-selftest.c:577:1: warning: 'irqsafe1_hard_wlock_12' defined but not used [-Wunused-function]
  ./linux/lib/locking-selftest.c:577:1: warning: 'irqsafe1_hard_wlock_21' defined but not used [-Wunused-function]
  ./linux/lib/locking-selftest.c:580:1: warning: 'irqsafe1_soft_spin_12' defined but not used [-Wunused-function]
  ...

Fixed by wrapping the test definitions in #ifndef CONFIG_PREEMPT_RT
conditionals.

Signed-off-by: Josh Cartwright <josh.cartwright@ni.com>
Signed-off-by: Xander Huff <xander.huff@ni.com>
Acked-by: Gratian Crisan <gratian.crisan@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:29 +03:00
Yong Zhang 7af057bf0d lockdep: selftest: Only do hardirq context test for raw spinlock
On -rt there is no softirq context any more and rwlock is sleepable,
disable softirq context test and rwlock+irq test.

Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Cc: Yong Zhang <yong.zhang@windriver.com>
Link: http://lkml.kernel.org/r/1334559716-18447-3-git-send-email-yong.zhang0@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2023-03-25 04:21:29 +03:00
Thomas Gleixner 73ccba39d9 lockdep: Make it RT aware
teach lockdep that we don't really do softirqs on -RT.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2023-03-25 04:21:29 +03:00
Priyanka Jain 03f93e45e8 net: Remove preemption disabling in netif_rx()
1)enqueue_to_backlog() (called from netif_rx) should be
  bind to a particluar CPU. This can be achieved by
  disabling migration. No need to disable preemption

2)Fixes crash "BUG: scheduling while atomic: ksoftirqd"
  in case of RT.
  If preemption is disabled, enqueue_to_backog() is called
  in atomic context. And if backlog exceeds its count,
  kfree_skb() is called. But in RT, kfree_skb() might
  gets scheduled out, so it expects non atomic context.

3)When CONFIG_PREEMPT_RT is not defined,
 migrate_enable(), migrate_disable() maps to
 preempt_enable() and preempt_disable(), so no
 change in functionality in case of non-RT.

-Replace preempt_enable(), preempt_disable() with
 migrate_enable(), migrate_disable() respectively
-Replace get_cpu(), put_cpu() with get_cpu_light(),
 put_cpu_light() respectively

Signed-off-by: Priyanka Jain <Priyanka.Jain@freescale.com>
Acked-by: Rajan Srivastava <Rajan.Srivastava@freescale.com>
Cc: <rostedt@goodmis.orgn>
Link: http://lkml.kernel.org/r/1337227511-2271-1-git-send-email-Priyanka.Jain@freescale.com

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2023-03-25 04:21:29 +03:00
Thomas Gleixner ad4a585864 random: Make it work on rt
Delegate the random insertion to the forced threaded interrupt
handler. Store the return IP of the hard interrupt handler in the irq
descriptor and feed it into the random generator as a source of
entropy.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2023-03-25 04:21:29 +03:00
Thomas Gleixner 320874063c x86: stackprotector: Avoid random pool on rt
CPU bringup calls into the random pool to initialize the stack
canary. During boot that works nicely even on RT as the might sleep
checks are disabled. During CPU hotplug the might sleep checks
trigger. Making the locks in random raw is a major PITA, so avoid the
call on RT is the only sensible solution. This is basically the same
randomness which we get during boot where the random pool has no
entropy and we rely on the TSC randomnness.

Reported-by: Carsten Emde <carsten.emde@osadl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2023-03-25 04:21:29 +03:00
Thomas Gleixner 721360ec3a panic: skip get_random_bytes for RT_FULL in init_oops_id
Disable on -RT. If this is invoked from irq-context we will have problems
to acquire the sleeping lock.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2023-03-25 04:21:29 +03:00
Sebastian Andrzej Siewior bb62447a8d crypto: cryptd - add a lock instead preempt_disable/local_bh_disable
cryptd has a per-CPU lock which protected with local_bh_disable() and
preempt_disable().
Add an explicit spin_lock to make the locking context more obvious and
visible to lockdep. Since it is a per-CPU lock, there should be no lock
contention on the actual spinlock.
There is a small race-window where we could be migrated to another CPU
after the cpu_queue has been obtain. This is not a problem because the
actual ressource is protected by the spinlock.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:29 +03:00
Sebastian Andrzej Siewior 1abf66c4fa crypto: limit more FPU-enabled sections
Those crypto drivers use SSE/AVX/… for their crypto work and in order to
do so in kernel they need to enable the "FPU" in kernel mode which
disables preemption.
There are two problems with the way they are used:
- the while loop which processes X bytes may create latency spikes and
  should be avoided or limited.
- the cipher-walk-next part may allocate/free memory and may use
  kmap_atomic().

The whole kernel_fpu_begin()/end() processing isn't probably that cheap.
It most likely makes sense to process as much of those as possible in one
go. The new *_fpu_sched_rt() schedules only if a RT task is pending.

Probably we should measure the performance those ciphers in pure SW
mode and with this optimisations to see if it makes sense to keep them
for RT.

This kernel_fpu_resched() makes the code more preemptible which might hurt
performance.

Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:29 +03:00
Sebastian Andrzej Siewior 3c2f62fa95 crypto: Reduce preempt disabled regions, more algos
Don Estabrook reported
| kernel: WARNING: CPU: 2 PID: 858 at kernel/sched/core.c:2428 migrate_disable+0xed/0x100()
| kernel: WARNING: CPU: 2 PID: 858 at kernel/sched/core.c:2462 migrate_enable+0x17b/0x200()
| kernel: WARNING: CPU: 3 PID: 865 at kernel/sched/core.c:2428 migrate_disable+0xed/0x100()

and his backtrace showed some crypto functions which looked fine.

The problem is the following sequence:

glue_xts_crypt_128bit()
{
	blkcipher_walk_virt(); /* normal migrate_disable() */

	glue_fpu_begin(); /* get atomic */

	while (nbytes) {
		__glue_xts_crypt_128bit();
		blkcipher_walk_done(); /* with nbytes = 0, migrate_enable()
					* while we are atomic */
	};
	glue_fpu_end() /* no longer atomic */
}

and this is why the counter get out of sync and the warning is printed.
The other problem is that we are non-preemptible between
glue_fpu_begin() and glue_fpu_end() and the latency grows. To fix this,
I shorten the FPU off region and ensure blkcipher_walk_done() is called
with preemption enabled. This might hurt the performance because we now
enable/disable the FPU state more often but we gain lower latency and
the bug is gone.

Reported-by: Don Estabrook <don.estabrook@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:29 +03:00
Peter Zijlstra d75c6cd91c x86: crypto: Reduce preempt disabled regions
Restrict the preempt disabled regions to the actual floating point
operations and enable preemption for the administrative actions.

This is necessary on RT to avoid that kfree and other operations are
called with preemption disabled.

Reported-and-tested-by: Carsten Emde <cbe@osadl.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2023-03-25 04:21:29 +03:00
Sebastian Andrzej Siewior 87f8ee14f4 irqwork: push most work into softirq context
Initially we defered all irqwork into softirq because we didn't want the
latency spikes if perf or another user was busy and delayed the RT task.
The NOHZ trigger (nohz_full_kick_work) was the first user that did not work
as expected if it did not run in the original irqwork context so we had to
bring it back somehow for it. push_irq_work_func is the second one that
requires this.

This patch adds the IRQ_WORK_HARD_IRQ which makes sure the callback runs
in raw-irq context. Everything else is defered into softirq context. Without
-RT we have the orignal behavior.

This patch incorporates tglx orignal work which revoked a little bringing back
the arch_irq_work_raise() if possible and a few fixes from Steven Rostedt and
Mike Galbraith,

[bigeasy: melt tglx's irq_work_tick_soft() which splits irq_work_tick() into a
          hard and soft variant]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:29 +03:00
Sebastian Andrzej Siewior 89b200404b net: dev: always take qdisc's busylock in __dev_xmit_skb()
The root-lock is dropped before dev_hard_start_xmit() is invoked and after
setting the __QDISC___STATE_RUNNING bit. If this task is now pushed away
by a task with a higher priority then the task with the higher priority
won't be able to submit packets to the NIC directly instead they will be
enqueued into the Qdisc. The NIC will remain idle until the task(s) with
higher priority leave the CPU and the task with lower priority gets back
and finishes the job.

If we take always the busylock we ensure that the RT task can boost the
low-prio task and submit the packet.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:28 +03:00
Thomas Gleixner 1af6dfa9c1 net: Use skbufhead with raw lock
Use the rps lock as rawlock so we can keep irq-off regions. It looks low
latency. However we can't kfree() from this context therefore we defer this
to the softirq and use the tofree_queue list for it (similar to process_queue).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2023-03-25 04:21:28 +03:00
Thomas Gleixner 65c332a043 debugobjects: Make RT aware
Avoid filling the pool / allocating memory with irqs off().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2023-03-25 04:21:28 +03:00
Thomas Gleixner 2cac4318fb net: Use cpu_chill() instead of cpu_relax()
Retry loops on RT might loop forever when the modifying side was
preempted. Use cpu_chill() instead of cpu_relax() to let the system
make progress.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2023-03-25 04:21:28 +03:00
Thomas Gleixner 15387f991a fs: namespace: Use cpu_chill() in trylock loops
Retry loops on RT might loop forever when the modifying side was
preempted. Use cpu_chill() instead of cpu_relax() to let the system
make progress.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2023-03-25 04:21:28 +03:00