Commit Graph

892180 Commits

Author SHA1 Message Date
Sebastian Andrzej Siewior 6967cfe198 Bluetooth: Acquire sk_lock.slock without disabling interrupts
[ Upstream commit e6da0edc24 ]

There was a lockdep which led to commit
   fad003b6c8 ("Bluetooth: Fix inconsistent lock state with RFCOMM")

Lockdep noticed that `sk->sk_lock.slock' was acquired without disabling
the softirq while the lock was also used in softirq context.
Unfortunately the solution back then was to disable interrupts before
acquiring the lock which however made lockdep happy.
It would have been enough to simply disable the softirq. Disabling
interrupts before acquiring a spinlock_t is not allowed on PREEMPT_RT
because these locks are converted to 'sleeping' spinlocks.

Use spin_lock_bh() in order to acquire the `sk_lock.slock'.

Cc: stable-rt@vger.kernel.org
Reported-by: Luis Claudio R. Goncalves <lclaudio@uudg.org>
Reported-by: kbuild test robot <lkp@intel.com> [missing unlock]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2023-03-25 04:21:36 +03:00
Sebastian Andrzej Siewior 58d58190a1 workqueue: Sync with upstream
This is an all-on-one patch reverting the following commits:
  workqueue: Don't assume that the callback has interrupts disabled
  sched/swait: Add swait_event_lock_irq()
  workqueue: Use swait for wq_manager_wait
  workqueue: Convert the locks to raw type

and introducing the following commits from upstream:
  workqueue: Use rcuwait for wq_manager_wait
  workqueue: Convert the pool::lock and wq_mayday_lock to raw_spinlock_t

as an replacement.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2023-03-25 04:21:36 +03:00
Matt Fleming f958e52689 signal: Prevent double-free of user struct
The way user struct reference counting works changed significantly with,

  fda31c5029 ("signal: avoid double atomic counter increments for user accounting")

Now user structs are only freed once the last pending signal is
dequeued. Make sigqueue_free_current() follow this new convention to
avoid freeing the user struct multiple times and triggering this
warning:

 refcount_t: underflow; use-after-free.
 WARNING: CPU: 0 PID: 6794 at lib/refcount.c:288 refcount_dec_not_one+0x45/0x50
 Call Trace:
  refcount_dec_and_lock_irqsave+0x16/0x60
  free_uid+0x31/0xa0
  __dequeue_signal+0x17c/0x190
  dequeue_signal+0x5a/0x1b0
  do_sigtimedwait+0x208/0x250
  __x64_sys_rt_sigtimedwait+0x6f/0xd0
  do_syscall_64+0x72/0x200
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Reported-by: Daniel Wagner <wagi@monom.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2023-03-25 04:21:36 +03:00
Sebastian Andrzej Siewior 5dd3c8665f mm/zswap: Use local lock to protect per-CPU data
This is an incremental update of the zswap patch. Addtional spots were
identified, which were lacking proper locking, during the rework of the
patch for upstream.
The complete patch description is available as commit
   79410590ae87e ("mm/zswap: Use local lock to protect per-CPU data")

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2023-03-25 04:21:36 +03:00
汪勇10269566 8ea11731c2 printk: Force a line break on pr_cont(" ")
Since the printk rework, pr_cont("\n") will not lead to a line break.
A new line will only be created if
- cpu != c->cpu_owner || !(flags & LOG_CONT)
- c->len + len > sizeof(c->buf)

Flush the buffer to enforce a new line on pr_cont().

[bigeasy: reword commit message ]

Signed-off-by: 汪勇10269566 <wang.yong12@zte.com.cn>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2023-03-25 04:21:36 +03:00
Kevin Hao 71174ad5ad mm: slub: Always flush the delayed empty slubs in flush_all()
After commit f0b231101c94 ("mm/SLUB: delay giving back empty slubs to
IRQ enabled regions"), when the free_slab() is invoked with the IRQ
disabled, the empty slubs are moved to a per-CPU list and will be
freed after IRQ enabled later. But in the current codes, there is
a check to see if there really has the cpu slub on a specific cpu
before flushing the delayed empty slubs, this may cause a reference
of already released kmem_cache in a scenario like below:
	cpu 0				cpu 1
  kmem_cache_destroy()
    flush_all()
                         --->IPI       flush_cpu_slab()
                                         flush_slab()
                                           deactivate_slab()
                                             discard_slab()
                                               free_slab()
                                             c->page = NULL;
      for_each_online_cpu(cpu)
        if (!has_cpu_slab(1, s))
          continue
        this skip to flush the delayed
        empty slub released by cpu1
    kmem_cache_free(kmem_cache, s)

                                       kmalloc()
                                         __slab_alloc()
                                            free_delayed()
                                            __free_slab()
                                            reference to released kmem_cache

Fixes: f0b231101c94 ("mm/SLUB: delay giving back empty slubs to IRQ enabled regions")
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: stable-rt@vger.kernel.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2023-03-25 04:21:35 +03:00
Liwei Song 4ed63caeb8 mm: Don't warn about atomic memory allocations during suspend
The ACPI code allocates larger amount of memory during resume. This
triggers a warning because the allocation happens with disabled
interrupts.
At this stage only one CPU is active so there should be no lock
contention. If SLUB needs to call into the buddy allocator for more
memory then it should not enable interrupts.

Limit the check to system state with more CPUs and scheduling and only
enable interrupts in SLUB at this stage.

Signed-off-by: Liwei Song <liwei.song@windriver.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
[bigeasy: commit description, allocate_slab() hunk]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:35 +03:00
Sebastian Andrzej Siewior 7f850a018e fs/dcache: Include swait.h header
Include the swait.h header so it compiles even if not all patches are
applied.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2023-03-25 04:21:35 +03:00
John Ogness cbf31f0ca6 printk: console must not schedule for drivers
Even though the printk kthread is always preemptible, it is still not
allowed to call cond_resched() from within console drivers. The
task may become non-preemptible in the console driver call chain. For
example, vt_console_print() takes a spinlock and then can call into
fbcon_redraw(), which can conditionally invoke cond_resched():

|BUG: sleeping function called from invalid context at kernel/printk/printk.c:2322
|in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 177, name: printk
|CPU: 0 PID: 177 Comm: printk Not tainted 5.6.2-00011-ga536059557f1d9 #1
|Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
|Call Trace:
| dump_stack+0x66/0x8b
| ___might_sleep+0x102/0x120
| console_conditional_schedule+0x24/0x30
| fbcon_redraw+0x96/0x1c0
| fbcon_scroll+0x556/0xd70
| con_scroll+0x147/0x1e0
| lf+0x9e/0xb0
| vt_console_print+0x253/0x3d0
| printk_kthread_func+0x1d5/0x3b0

Disable cond_resched() for the call into the console drivers.

Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2023-03-25 04:21:35 +03:00
Thomas Gleixner 7c4ece8254 Add localversion for -RT release
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2023-03-25 04:21:35 +03:00
Clark Williams 894114838d sysfs: Add /sys/kernel/realtime entry
Add a /sys/kernel entry to indicate that the kernel is a
realtime kernel.

Clark says that he needs this for udev rules, udev needs to evaluate
if its a PREEMPT_RT kernel a few thousand times and parsing uname
output is too slow or so.

Are there better solutions? Should it exist and return 0 on !-rt?

Signed-off-by: Clark Williams <williams@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
2023-03-25 04:21:35 +03:00
Ingo Molnar 77c67bcd8a genirq: Disable irqpoll on -rt
Creates long latencies for no value

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2023-03-25 04:21:35 +03:00
Thomas Gleixner f9d7455782 signals: Allow rt tasks to cache one sigqueue struct
To avoid allocation allow rt tasks to cache one sigqueue struct in
task struct.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2023-03-25 04:21:35 +03:00
Haris Okanovic fdeac550bc tpm_tis: fix stall after iowrite*()s
ioread8() operations to TPM MMIO addresses can stall the cpu when
immediately following a sequence of iowrite*()'s to the same region.

For example, cyclitest measures ~400us latency spikes when a non-RT
usermode application communicates with an SPI-based TPM chip (Intel Atom
E3940 system, PREEMPT_RT kernel). The spikes are caused by a
stalling ioread8() operation following a sequence of 30+ iowrite8()s to
the same address. I believe this happens because the write sequence is
buffered (in cpu or somewhere along the bus), and gets flushed on the
first LOAD instruction (ioread*()) that follows.

The enclosed change appears to fix this issue: read the TPM chip's
access register (status code) after every iowrite*() operation to
amortize the cost of flushing data to chip across multiple instructions.

Signed-off-by: Haris Okanovic <haris.okanovic@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:35 +03:00
Julia Cartwright d04a7ec191 squashfs: make use of local lock in multi_cpu decompressor
Currently, the squashfs multi_cpu decompressor makes use of
get_cpu_ptr()/put_cpu_ptr(), which unconditionally disable preemption
during decompression.

Because the workload is distributed across CPUs, all CPUs can observe a
very high wakeup latency, which has been seen to be as much as 8000us.

Convert this decompressor to make use of a local lock, which will allow
execution of the decompressor with preemption-enabled, but also ensure
concurrent accesses to the percpu compressor data on the local CPU will
be serialized.

Cc: stable-rt@vger.kernel.org
Reported-by: Alexander Stein <alexander.stein@systec-electronic.com>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Julia Cartwright <julia@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:35 +03:00
Mike Galbraith bc7cc87312 drivers/zram: Don't disable preemption in zcomp_stream_get/put()
In v4.7, the driver switched to percpu compression streams, disabling
preemption via get/put_cpu_ptr(). Use a per-zcomp_strm lock here. We
also have to fix an lock order issue in zram_decompress_page() such
that zs_map_object() nests inside of zcomp_stream_put() as it does in
zram_bvec_write().

Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
[bigeasy: get_locked_var() -> per zcomp_strm lock]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:35 +03:00
Mike Galbraith c6cc729f9a drivers/block/zram: Replace bit spinlocks with rtmutex for -rt
They're nondeterministic, and lead to ___might_sleep() splats in -rt.
OTOH, they're a lot less wasteful than an rtmutex per page.

Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:35 +03:00
Mike Galbraith caa6dbd8c9 connector/cn_proc: Protect send_msg() with a local lock on RT
|BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:931
|in_atomic(): 1, irqs_disabled(): 0, pid: 31807, name: sleep
|Preemption disabled at:[<ffffffff8148019b>] proc_exit_connector+0xbb/0x140
|
|CPU: 4 PID: 31807 Comm: sleep Tainted: G        W   E   4.8.0-rt11-rt #106
|Call Trace:
| [<ffffffff813436cd>] dump_stack+0x65/0x88
| [<ffffffff8109c425>] ___might_sleep+0xf5/0x180
| [<ffffffff816406b0>] __rt_spin_lock+0x20/0x50
| [<ffffffff81640978>] rt_read_lock+0x28/0x30
| [<ffffffff8156e209>] netlink_broadcast_filtered+0x49/0x3f0
| [<ffffffff81522621>] ? __kmalloc_reserve.isra.33+0x31/0x90
| [<ffffffff8156e5cd>] netlink_broadcast+0x1d/0x20
| [<ffffffff8147f57a>] cn_netlink_send_mult+0x19a/0x1f0
| [<ffffffff8147f5eb>] cn_netlink_send+0x1b/0x20
| [<ffffffff814801d8>] proc_exit_connector+0xf8/0x140
| [<ffffffff81077f71>] do_exit+0x5d1/0xba0
| [<ffffffff810785cc>] do_group_exit+0x4c/0xc0
| [<ffffffff81078654>] SyS_exit_group+0x14/0x20
| [<ffffffff81640a72>] entry_SYSCALL_64_fastpath+0x1a/0xa4

Since ab8ed95108 ("connector: fix out-of-order cn_proc netlink message
delivery") which is v4.7-rc6.

Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:35 +03:00
Thomas Gleixner 8ca7ff188f mips: Disable highmem on RT
The current highmem handling on -RT is not compatible and needs fixups.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2023-03-25 04:21:34 +03:00
Sebastian Andrzej Siewior a0aa1749a6 POWERPC: Allow to enable RT
Allow to select RT.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:34 +03:00
Sebastian Andrzej Siewior 17dfb2be0b powerpc/stackprotector: work around stack-guard init from atomic
This is invoked from the secondary CPU in atomic context. On x86 we use
tsc instead. On Power we XOR it against mftb() so lets use stack address
as the initial value.

Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:34 +03:00
Thomas Gleixner a45b2dd64c powerpc: Disable highmem on RT
The current highmem handling on -RT is not compatible and needs fixups.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2023-03-25 04:21:34 +03:00
Bogdan Purcareata d9914f69e3 powerpc/kvm: Disable in-kernel MPIC emulation for PREEMPT_RT
While converting the openpic emulation code to use a raw_spinlock_t enables
guests to run on RT, there's still a performance issue. For interrupts sent in
directed delivery mode with a multiple CPU mask, the emulated openpic will loop
through all of the VCPUs, and for each VCPUs, it call IRQ_check, which will loop
through all the pending interrupts for that VCPU. This is done while holding the
raw_lock, meaning that in all this time the interrupts and preemption are
disabled on the host Linux. A malicious user app can max both these number and
cause a DoS.

This temporary fix is sent for two reasons. First is so that users who want to
use the in-kernel MPIC emulation are aware of the potential latencies, thus
making sure that the hardware MPIC and their usage scenario does not involve
interrupts sent in directed delivery mode, and the number of possible pending
interrupts is kept small. Secondly, this should incentivize the development of a
proper openpic emulation that would be better suited for RT.

Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Bogdan Purcareata <bogdan.purcareata@freescale.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:34 +03:00
Sebastian Andrzej Siewior 05eaf518d6 powerpc/pseries/iommu: Use a locallock instead local_irq_save()
The locallock protects the per-CPU variable tce_page. The function
attempts to allocate memory while tce_page is protected (by disabling
interrupts).

Use local_irq_save() instead of local_irq_disable().

Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:34 +03:00
Sebastian Andrzej Siewior 6ffc2c3fa0 ARM64: Allow to enable RT
Allow to select RT.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:34 +03:00
Sebastian Andrzej Siewior 9982aa5284 ARM: Allow to enable RT
Allow to select RT.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:34 +03:00
Sebastian Andrzej Siewior fb6d3e5727 x86: Enable RT also on 32bit
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:34 +03:00
Benedikt Spranger f4787c6f97 clocksource: TCLIB: Allow higher clock rates for clock events
As default the TCLIB uses the 32KiHz base clock rate for clock events.
Add a compile time selection to allow higher clock resulution.

(fixed up by Sami Pietikäinen <Sami.Pietikainen@wapice.com>)

Signed-off-by: Benedikt Spranger <b.spranger@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2023-03-25 04:21:34 +03:00
Sebastian Andrzej Siewior 1f42e9afaf arm: at91: do not disable/enable clocks in a row
Currently the driver will disable the clock and enable it one line later
if it is switching from periodic mode into one shot.
This can be avoided and causes a needless warning on -RT.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:34 +03:00
Sebastian Andrzej Siewior f7183c56ce arm64: fpsimd: Delay freeing memory in fpsimd_flush_thread()
fpsimd_flush_thread() invokes kfree() via sve_free() within a preempt disabled
section which is not working on -RT.

Delay freeing of memory until preemption is enabled again.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:34 +03:00
Josh Cartwright 92fa9c766d KVM: arm/arm64: downgrade preempt_disable()d region to migrate_disable()
kvm_arch_vcpu_ioctl_run() disables the use of preemption when updating
the vgic and timer states to prevent the calling task from migrating to
another CPU.  It does so to prevent the task from writing to the
incorrect per-CPU GIC distributor registers.

On -rt kernels, it's possible to maintain the same guarantee with the
use of migrate_{disable,enable}(), with the added benefit that the
migrate-disabled region is preemptible.  Update
kvm_arch_vcpu_ioctl_run() to do so.

Cc: Christoffer Dall <christoffer.dall@linaro.org>
Reported-by: Manish Jaggi <Manish.Jaggi@caviumnetworks.com>
Signed-off-by: Josh Cartwright <joshc@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:34 +03:00
Josh Cartwright e1267649bf genirq: update irq_set_irqchip_state documentation
On -rt kernels, the use of migrate_disable()/migrate_enable() is
sufficient to guarantee a task isn't moved to another CPU.  Update the
irq_set_irqchip_state() documentation to reflect this.

Signed-off-by: Josh Cartwright <joshc@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:33 +03:00
Yadi.hu d284ad2bed ARM: enable irq in translation/section permission fault handlers
Probably happens on all ARM, with
CONFIG_PREEMPT_RT
CONFIG_DEBUG_ATOMIC_SLEEP

This simple program....

int main() {
   *((char*)0xc0001000) = 0;
};

[ 512.742724] BUG: sleeping function called from invalid context at kernel/rtmutex.c:658
[ 512.743000] in_atomic(): 0, irqs_disabled(): 128, pid: 994, name: a
[ 512.743217] INFO: lockdep is turned off.
[ 512.743360] irq event stamp: 0
[ 512.743482] hardirqs last enabled at (0): [< (null)>] (null)
[ 512.743714] hardirqs last disabled at (0): [<c0426370>] copy_process+0x3b0/0x11c0
[ 512.744013] softirqs last enabled at (0): [<c0426370>] copy_process+0x3b0/0x11c0
[ 512.744303] softirqs last disabled at (0): [< (null)>] (null)
[ 512.744631] [<c041872c>] (unwind_backtrace+0x0/0x104)
[ 512.745001] [<c09af0c4>] (dump_stack+0x20/0x24)
[ 512.745355] [<c0462490>] (__might_sleep+0x1dc/0x1e0)
[ 512.745717] [<c09b6770>] (rt_spin_lock+0x34/0x6c)
[ 512.746073] [<c0441bf0>] (do_force_sig_info+0x34/0xf0)
[ 512.746457] [<c0442668>] (force_sig_info+0x18/0x1c)
[ 512.746829] [<c041d880>] (__do_user_fault+0x9c/0xd8)
[ 512.747185] [<c041d938>] (do_bad_area+0x7c/0x94)
[ 512.747536] [<c041d990>] (do_sect_fault+0x40/0x48)
[ 512.747898] [<c040841c>] (do_DataAbort+0x40/0xa0)
[ 512.748181] Exception stack(0xecaa1fb0 to 0xecaa1ff8)

Oxc0000000 belongs to kernel address space, user task can not be
allowed to access it. For above condition, correct result is that
test case should receive a “segment fault” and exits but not stacks.

the root cause is commit 02fe2845d6 ("avoid enabling interrupts in
prefetch/data abort handlers"),it deletes irq enable block in Data
abort assemble code and move them into page/breakpiont/alignment fault
handlers instead. But author does not enable irq in translation/section
permission fault handlers. ARM disables irq when it enters exception/
interrupt mode, if kernel doesn't enable irq, it would be still disabled
during translation/section permission fault.

We see the above splat because do_force_sig_info is still called with
IRQs off, and that code eventually does a:

        spin_lock_irqsave(&t->sighand->siglock, flags);

As this is architecture independent code, and we've not seen any other
need for other arch to have the siglock converted to raw lock, we can
conclude that we should enable irq for ARM translation/section
permission exception.

Signed-off-by: Yadi.hu <yadi.hu@windriver.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:33 +03:00
Sebastian Andrzej Siewior 30af6ad285 arm: include definition for cpumask_t
This definition gets pulled in by other files. With the (later) split of
RCU and spinlock.h it won't compile anymore.
The split is done in ("rbtree: don't include the rcu header").

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:33 +03:00
Kurt Kanzenbach 9fb9b835bb tty: serial: pl011: explicitly initialize the flags variable
Silence the following gcc warning:

drivers/tty/serial/amba-pl011.c: In function ‘pl011_console_write’:
./include/linux/spinlock.h:260:3: warning: ‘flags’ may be used uninitialized in this function [-Wmaybe-uninitialized]
   _raw_spin_unlock_irqrestore(lock, flags); \
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/tty/serial/amba-pl011.c:2214:16: note: ‘flags’ was declared here
  unsigned long flags;
                ^~~~~

The code is correct. Thus, initializing flags to zero doesn't change the
behavior and resolves the warning.

Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:33 +03:00
Thomas Gleixner 6d505b3185 tty/serial/pl011: Make the locking work on RT
The lock is a sleeping lock and local_irq_save() is not the optimsation
we are looking for. Redo it to make it work on -RT and non-RT.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2023-03-25 04:21:33 +03:00
Thomas Gleixner a52d193315 tty/serial/omap: Make the locking RT aware
The lock is a sleeping lock and local_irq_save() is not the
optimsation we are looking for. Redo it to make it work on -RT and
non-RT.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2023-03-25 04:21:33 +03:00
Sebastian Andrzej Siewior dcc1d1f96e leds: trigger: disable CPU trigger on -RT
as it triggers:
|CPU: 0 PID: 0 Comm: swapper Not tainted 3.12.8-rt10 #141
|[<c0014aa4>] (unwind_backtrace+0x0/0xf8) from [<c0012788>] (show_stack+0x1c/0x20)
|[<c0012788>] (show_stack+0x1c/0x20) from [<c043c8dc>] (dump_stack+0x20/0x2c)
|[<c043c8dc>] (dump_stack+0x20/0x2c) from [<c004c5e8>] (__might_sleep+0x13c/0x170)
|[<c004c5e8>] (__might_sleep+0x13c/0x170) from [<c043f270>] (__rt_spin_lock+0x28/0x38)
|[<c043f270>] (__rt_spin_lock+0x28/0x38) from [<c043fa00>] (rt_read_lock+0x68/0x7c)
|[<c043fa00>] (rt_read_lock+0x68/0x7c) from [<c036cf74>] (led_trigger_event+0x2c/0x5c)
|[<c036cf74>] (led_trigger_event+0x2c/0x5c) from [<c036e0bc>] (ledtrig_cpu+0x54/0x5c)
|[<c036e0bc>] (ledtrig_cpu+0x54/0x5c) from [<c000ffd8>] (arch_cpu_idle_exit+0x18/0x1c)
|[<c000ffd8>] (arch_cpu_idle_exit+0x18/0x1c) from [<c00590b8>] (cpu_startup_entry+0xa8/0x234)
|[<c00590b8>] (cpu_startup_entry+0xa8/0x234) from [<c043b2cc>] (rest_init+0xb8/0xe0)
|[<c043b2cc>] (rest_init+0xb8/0xe0) from [<c061ebe0>] (start_kernel+0x2c4/0x380)

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:33 +03:00
Thomas Gleixner 8ac0c19511 jump-label: disable if stop_machine() is used
Some architectures are using stop_machine() while switching the opcode which
leads to latency spikes.
The architectures which use stop_machine() atm:
- ARM stop machine
- s390 stop machine

The architecures which use other sorcery:
- MIPS
- X86
- powerpc
- sparc
- arm64

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy: only ARM for now]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:33 +03:00
Sebastian Andrzej Siewior 3af424b7f4 tracing: make preempt_lazy and migrate_disable counter smaller
The migrate_disable counter should not exceed 255 so it is enough to
store it in an 8bit field.
With this change we can move the `preempt_lazy_count' member into the
gap so the whole struct shrinks by 4 bytes to 12 bytes in total.
Remove the `padding' field, it is not needed.
Update the tracing fields in trace_define_common_fields() (it was
missing the preempt_lazy_count field).

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:33 +03:00
Anders Roxell e56519fc8a arch/arm64: Add lazy preempt support
arm64 is missing support for PREEMPT_RT. The main feature which is
lacking is support for lazy preemption. The arch-specific entry code,
thread information structure definitions, and associated data tables
have to be extended to provide this support. Then the Kconfig file has
to be extended to indicate the support is available, and also to
indicate that support for full RT preemption is now available.

Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
2023-03-25 04:21:33 +03:00
Thomas Graziadei 96beb1321b powerpc: Fix lazy preemption for powerpc 32bit
The 32bit powerpc assembler implementation of the lazy preemption
set the _TIF_PERSYSCALL_MASK on the low word. This could lead to
modprobe segfaults and a kernel panic - not syncing: Attempt to
kill init! issue.

Fixed by shifting the mask by 16 bit using andis and lis.

Signed-off-by: Thomas Graziadei <thomas.graziadei@omicronenergy.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:33 +03:00
Thomas Gleixner 8991e6ef10 powerpc: Add support for lazy preemption
Implement the powerpc pieces for lazy preempt.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2023-03-25 04:21:33 +03:00
Thomas Gleixner c8ad6665ef arm: Add support for lazy preemption
Implement the arm pieces for lazy preempt.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2023-03-25 04:21:33 +03:00
Thomas Gleixner 637863f69e x86: Support for lazy preemption
Implement the x86 pieces for lazy preempt.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2023-03-25 04:21:33 +03:00
Thomas Gleixner 78ed450f36 sched: Add support for lazy preemption
It has become an obsession to mitigate the determinism vs. throughput
loss of RT. Looking at the mainline semantics of preemption points
gives a hint why RT sucks throughput wise for ordinary SCHED_OTHER
tasks. One major issue is the wakeup of tasks which are right away
preempting the waking task while the waking task holds a lock on which
the woken task will block right after having preempted the wakee. In
mainline this is prevented due to the implicit preemption disable of
spin/rw_lock held regions. On RT this is not possible due to the fully
preemptible nature of sleeping spinlocks.

Though for a SCHED_OTHER task preempting another SCHED_OTHER task this
is really not a correctness issue. RT folks are concerned about
SCHED_FIFO/RR tasks preemption and not about the purely fairness
driven SCHED_OTHER preemption latencies.

So I introduced a lazy preemption mechanism which only applies to
SCHED_OTHER tasks preempting another SCHED_OTHER task. Aside of the
existing preempt_count each tasks sports now a preempt_lazy_count
which is manipulated on lock acquiry and release. This is slightly
incorrect as for lazyness reasons I coupled this on
migrate_disable/enable so some other mechanisms get the same treatment
(e.g. get_cpu_light).

Now on the scheduler side instead of setting NEED_RESCHED this sets
NEED_RESCHED_LAZY in case of a SCHED_OTHER/SCHED_OTHER preemption and
therefor allows to exit the waking task the lock held region before
the woken task preempts. That also works better for cross CPU wakeups
as the other side can stay in the adaptive spinning loop.

For RT class preemption there is no change. This simply sets
NEED_RESCHED and forgoes the lazy preemption counter.

 Initial test do not expose any observable latency increasement, but
history shows that I've been proven wrong before :)

The lazy preemption mode is per default on, but with
CONFIG_SCHED_DEBUG enabled it can be disabled via:

 # echo NO_PREEMPT_LAZY >/sys/kernel/debug/sched_features

and reenabled via

 # echo PREEMPT_LAZY >/sys/kernel/debug/sched_features

The test results so far are very machine and workload dependent, but
there is a clear trend that it enhances the non RT workload
performance.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2023-03-25 04:21:32 +03:00
Thomas Gleixner f6da4ae1f3 mm/scatterlist: Do not disable irqs on RT
For -RT it is enough to keep pagefault disabled (which is currently handled by
kmap_atomic()).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2023-03-25 04:21:32 +03:00
Thomas Gleixner cd57734517 arm: Enable highmem for rt
fixup highmem for ARM.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2023-03-25 04:21:32 +03:00
Sebastian Andrzej Siewior d63ae71591 arm/highmem: Flush tlb on unmap
The tlb should be flushed on unmap and thus make the mapping entry
invalid. This is only done in the non-debug case which does not look
right.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:32 +03:00
Sebastian Andrzej Siewior 07acc78592 x86/highmem: Add a "already used pte" check
This is a copy from kmap_atomic_prot().

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:32 +03:00