Commit Graph

892207 Commits

Author SHA1 Message Date
Denis Drakhnia 2b0d079aa6 e2k: fix stat definition 2024-05-27 07:44:00 +03:00
Alibek Omarov c1ce54bceb Linux 5.4.193 with MCST patches (6.2) 2023-03-25 04:34:12 +03:00
Alibek Omarov aeb4689c20 Bundle lttng-modules-2.11.7 2023-03-25 04:23:17 +03:00
Tom Zanussi 34946aa335 Linux 5.4.193-rt74 REBASE
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
2023-03-25 04:21:38 +03:00
Tom Zanussi 9346d4234f eventfd: Fix stable-rt v5.4.182-rt71 conflict fixup issue
This fixes an issue in stable-rt release v5.4.182-rt71 where a hunk
from the context diff was inadvertently included in a conflict fixup
where it shouldn't have been.  Remove those lines that don't belong.

Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
2023-03-25 04:21:38 +03:00
Xie Yongji 3367ff1f04 aio: Fix incorrect usage of eventfd_signal_allowed()
[ Upstream commmit 4b3749865374899e115aa8c48681709b086fe6d3 ]

We should defer eventfd_signal() to the workqueue when
eventfd_signal_allowed() return false rather than return
true.

Fixes: b542e383d8c0 ("eventfd: Make signal recursion protection a task bit")
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Link: https://lore.kernel.org/r/20210913111928.98-1-xieyongji@bytedance.com
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
2023-03-25 04:21:38 +03:00
Thomas Gleixner e2ea925c8f eventfd: Make signal recursion protection a task bit
[ Upstream commit b542e383d8c005f06a131e2b40d5889b812f19c6 ]

The recursion protection for eventfd_signal() is based on a per CPU
variable and relies on the !RT semantics of spin_lock_irqsave() for
protecting this per CPU variable. On RT kernels spin_lock_irqsave() neither
disables preemption nor interrupts which allows the spin lock held section
to be preempted. If the preempting task invokes eventfd_signal() as well,
then the recursion warning triggers.

Paolo suggested to protect the per CPU variable with a local lock, but
that's heavyweight and actually not necessary. The goal of this protection
is to prevent the task stack from overflowing, which can be achieved with a
per task recursion protection as well.

Replace the per CPU variable with a per task bit similar to other recursion
protection bits like task_struct::in_page_owner. This works on both !RT and
RT kernels and removes as a side effect the extra per CPU storage.

No functional change for !RT kernels.

Reported-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lore.kernel.org/r/87wnp9idso.ffs@tglx

Signed-off-by: Tom Zanussi <zanussi@kernel.org>

 Conflicts:
	fs/aio.c
	include/linux/sched.h
2023-03-25 04:21:38 +03:00
Sebastian Andrzej Siewior 793390ed19 locking: Drop might_resched() from might_sleep_no_state_check()
[ Upstream 5.10 commit e88f48e796b2286b565ee95ca8c46f32e051cd8c ]

might_sleep_no_state_check() serves the same purpose as might_sleep()
except it is used before sleeping locks are acquired and therefore does
not check task_struct::state because the state is preserved.

That state is preserved in the locking slow path so we must not schedule
at the begin of the locking function because the state will be lost and
not preserved at that time.

Remove might_resched() from might_sleep_no_state_check() to avoid losing the
state before it is preserved.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
2023-03-25 04:21:37 +03:00
Sebastian Andrzej Siewior ac3887f881 fscache: Use only one fscache_object_cong_wait.
[ Upstream commit 514342eb43a760575d6d9a366506a41ab7ec4888 ]

This is an update of the original patch, removing put_cpu_var() which
was overseen in the initial patch.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
2023-03-25 04:21:37 +03:00
Sebastian Andrzej Siewior 0dc66825a3 fscache: Use only one fscache_object_cong_wait.
[ Upstream commit 74920695ab51a6d180dcd6554193cc8427758360 ]

In the commit mentioned below, fscache was converted from slow-work to
workqueue. slow_work_enqueue() and slow_work_sleep_till_thread_needed()
did not use a per-CPU workqueue. They choose from two global waitqueues
depending on the SLOW_WORK_VERY_SLOW bit which was not set so it always
one waitqueue.

I can't find out how it is ensured that a waiter on certain CPU is woken
up be the other side. My guess is that the timeout in schedule_timeout()
ensures that it does not wait forever (or a random wake up).

fscache_object_sleep_till_congested() must be invoked from preemptible
context in order for schedule() to work. In this case this_cpu_ptr()
should complain with CONFIG_DEBUG_PREEMPT enabled except the thread is
bound to one CPU.

wake_up() wakes only one waiter and I'm not sure if it is guaranteed
that only one waiter exists.

Replace the per-CPU waitqueue with one global waitqueue.

Fixes: 8b8edefa2f ("fscache: convert object to use workqueue instead of slow-work")
Reported-by: Gregor Beck <gregor.beck@gmail.com>
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
2023-03-25 04:21:37 +03:00
Sebastian Andrzej Siewior ce417fbc98 mm: Disable NUMA_BALANCING_DEFAULT_ENABLED and TRANSPARENT_HUGEPAGE on PREEMPT_RT
[ Upstream commit aae93144898af113331668f53f80cb83f5a07360 ]

TRANSPARENT_HUGEPAGE:
There are potential non-deterministic delays to an RT thread if a critical
memory region is not THP-aligned and a non-RT buffer is located in the same
hugepage-aligned region. It's also possible for an unrelated thread to migrate
pages belonging to an RT task incurring unexpected page faults due to memory
defragmentation even if khugepaged is disabled.

Regular HUGEPAGEs are not affected by this can be used.

NUMA_BALANCING:
There is a non-deterministic delay to mark PTEs PROT_NONE to gather NUMA fault
samples, increased page faults of regions even if mlocked and non-deterministic
delays when migrating pages.

[Mel Gorman worded 99% of the commit description].

Link: https://lore.kernel.org/all/20200304091159.GN3818@techsingularity.net/
Link: https://lore.kernel.org/all/20211026165100.ahz5bkx44lrrw5pt@linutronix.de/
Cc: stable-rt@vger.kernel.org
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Link: https://lore.kernel.org/r/20211028143327.hfbxjze7palrpfgp@linutronix.de
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
2023-03-25 04:21:37 +03:00
Sebastian Andrzej Siewior b9ba466b1b preempt: Move preempt_enable_no_resched() to the RT block
[ Upstream commit 1a45b3551ef852193c3d338888132c4925d0690d ]

preempt_enable_no_resched() should point to preempt_enable() on
PREEMPT_RT so nobody is playing any preempt tricks and enables
preemption without checking for the need-resched flag.

This was misplaced in v3.14.0-rt1 und remained unnoticed until now.

Point preempt_enable_no_resched() and preempt_enable() on RT.

Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
2023-03-25 04:21:37 +03:00
Sebastian Andrzej Siewior df6e402903 sched: Switch wait_task_inactive to HRTIMER_MODE_REL_HARD
[ Upstream commit 39609ed79d420e0b966e16a1d695733c2d3b9a7f ]

With PREEMPT_RT enabled all hrtimers callbacks will be invoked in
softirq mode unless they are explicitly marked as HRTIMER_MODE_HARD.
During boot kthread_bind() is used for the creation of per-CPU threads
and then hangs in wait_task_inactive() if the ksoftirqd is not
yet up and running.
The hang disappeared since commit
   26c7295be0 ("kthread: Do not preempt current task if it is going to call schedule()")

but enabling function trace on boot reliably leads to the freeze on boot
behaviour again.
The timer in wait_task_inactive() can not be directly used by an user
interface to abuse it and create a mass wake of several tasks at the
same time which would to long sections with disabled interrupts.
Therefore it is safe to make the timer HRTIMER_MODE_REL_HARD.

Switch the timer to HRTIMER_MODE_REL_HARD.

Cc: stable-rt@vger.kernel.org
Link: https://lkml.kernel.org/r/20210826170408.vm7rlj7odslshwch@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
2023-03-25 04:21:37 +03:00
Mike Galbraith 352375c53b mm, zsmalloc: Convert zsmalloc_handle.lock to spinlock_t
[ Upstream 5.10 commit f2d9006d27c9b12563b8e577951ff5021f3b36b2 ]

local_lock_t becoming a synonym of spinlock_t had consequences for the RT
mods to zsmalloc, which were taking a mutex while holding a local_lock,
inspiring a lockdep "BUG: Invalid wait context" gripe.

Converting zsmalloc_handle.lock to a spinlock_t restored lockdep silence.

Cc: stable-rt@vger.kernel.org
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
2023-03-25 04:21:37 +03:00
Gregor Beck a0f0e6701c fscache: fix initialisation of cookie hash table raw spinlocks
The original patch, 602660600bcd ("fscache: initialize cookie hash
table raw spinlocks"), subtracted 1 from the shift and so still left
some spinlocks uninitialized.  This fixes that.

[zanussi: Added changelog text]

Signed-off-by: Gregor Beck <gregor.beck@gmail.com>
Fixes: 602660600bcd ("fscache: initialize cookie hash table raw spinlocks")
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
2023-03-25 04:21:37 +03:00
Andrew Halaney 22562d5988 locking/rwsem-rt: Remove might_sleep() in __up_read()
[ Upstream commit b2ed0a4302faf2bb09e97529dd274233c082689b ]

There's no chance of sleeping here, the reader is giving up the
lock and possibly waking up the writer who is waiting on it.

Reported-by: Chunyu Hu <chuhu@redhat.com>
Signed-off-by: Andrew Halaney <ahalaney@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
2023-03-25 04:21:37 +03:00
Sebastian Andrzej Siewior f57378010e mm: slub: Don't resize the location tracking cache on PREEMPT_RT
[ Upstream commit 87bd0bf324f4c5468ea3d1de0482589f491f3145 ]

The location tracking cache has a size of a page and is resized if its
current size is too small.
This allocation happens with disabled interrupts and can't happen on
PREEMPT_RT.
Should one page be too small, then we have to allocate more at the
beginning. The only downside is that less callers will be visible.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
2023-03-25 04:21:37 +03:00
Sebastian Andrzej Siewior a43acd1e98 locking/rwsem-rt: Add __down_read_interruptible()
The stable backported a patch which adds __down_read_interruptible() for
the generic rwsem implementation.

Add RT's version __down_read_interruptible().

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:37 +03:00
Zanxiong Qiu c2014f70a6 mm/swap: use local lock in deactivate_page()
get_cpu_var() calls preempt_disable(), while on RT kernel,
pagevec_lru_move_fn() will call spinlock and might schedule
the context out and hence the schedule bug occurred, issue
is found on 5.4.70-rt40 and reproducable on 5.4.74-rt41.

32154a0abcc ("mm: Revert the DEFINE_PER_CPU_PAGEVEC implementation")
reverted the lock/unlock_swap_pvec function, however,
deactivate_page() part was missed at that time as it's newly
added in v5.4.

Link: https://lore.kernel.org/r/20201127135456.8145-1-zqiu2000@126.com
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Zanxiong Qiu <zqiu2000@126.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2023-03-25 04:21:37 +03:00
Sebastian Andrzej Siewior 18d5a2ded1 Revert "hrtimer: Allow raw wakeups during boot"
This change is no longer needed since commit
   26c7295be0 ("kthread: Do not preempt current task if it is going to call schedule()")

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2023-03-25 04:21:37 +03:00
Steven Rostedt (VMware) 1d895e6e82 Revert "net: Properly annotate the try-lock for the seqlock"
This reverts commit 3971227b5af04e6c34ef7b47b2ebe941727563a0.

Link: https://lore.kernel.org/r/20201116171958.2opbksmgbznrjxu2@linutronix.de

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2023-03-25 04:21:36 +03:00
Sebastian Andrzej Siewior 5564fa35b3 timers: Don't block on ->expiry_lock for TIMER_IRQSAFE
PREEMPT_RT does not spin and wait until a running timer completes its
callback but instead it blocks on a sleeping lock to prevent a deadlock.

This blocking can not be done for workqueue's IRQ_SAFE timer which will
be canceled in an IRQ-off region. It has to happen to in IRQ-off region
because changing the PENDING bit and clearing the timer must not be
interrupted to avoid a busy-loop.

The callback invocation of IRQSAFE timer is not preempted on PREEMPT_RT
so there is no need to synchronize on timer_base::expiry_lock.

Don't acquire the timer_base::expiry_lock for TIMER_IRQSAFE flagged
timer.
Add a lockdep annotation to ensure that this function is always invoked
in preemptible context on PREEMPT_RT.

Reported-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: stable-rt@vger.kernel.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2023-03-25 04:21:36 +03:00
Oleg Nesterov 6bd4eef1d8 ptrace: fix ptrace_unfreeze_traced() race with rt-lock
The patch "ptrace: fix ptrace vs tasklist_lock race" changed
ptrace_freeze_traced() to take task->saved_state into account, but
ptrace_unfreeze_traced() has the same problem and needs a similar fix:
it should check/update both ->state and ->saved_state.

Reported-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Fixes: "ptrace: fix ptrace vs tasklist_lock race"
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: stable-rt@vger.kernel.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2023-03-25 04:21:36 +03:00
Sebastian Andrzej Siewior bc0a27b923 mm/memcontrol: Disable preemption in __mod_memcg_lruvec_state()
The callers expect disabled preemption/interrupts while invoking
__mod_memcg_lruvec_state(). This works mainline because a lock of
somekind is acquired.

Use preempt_disable_rt() where per-CPU variables are accessed and a
stable pointer is expected. This is also done in __mod_zone_page_state()
for the same reason.

Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2023-03-25 04:21:36 +03:00
Sebastian Andrzej Siewior 13aea34e50 net: Properly annotate the try-lock for the seqlock
In patch
   ("net/Qdisc: use a seqlock instead seqcount")

the seqcount has been replaced with a seqlock to allow to reader to
boost the preempted writer.
The try_write_seqlock() acquired the lock with a try-lock but the
seqcount annotation was "lock".

Opencode write_seqcount_t_begin() and use the try-lock annotation for
lockdep.

Reported-by: Mike Galbraith <efault@gmx.de>
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2023-03-25 04:21:36 +03:00
Sebastian Andrzej Siewior 36c2e4d093 rwsem: Provide down_read_non_owner() and up_read_non_owner() for -RT
The rwsem implementation on -RT allows multiple reader and there is no
owner tracking anymore.
We can provide down_read_non_owner() and up_read_non_owner() by skipping
the owner check bits which are only available in the !RT implementation.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2023-03-25 04:21:36 +03:00
Ahmed S. Darwish 3de29a2a9e net: phy: fixed_phy: Remove unused seqcount
Commit bf7afb29d5 ("phy: improve safety of fixed-phy MII register
reading") protected the fixed PHY status with a sequence counter.

Two years later, commit d2b977939b ("net: phy: fixed-phy: remove
fixed_phy_update_state()") removed the sequence counter's write side
critical section -- neutralizing its read side retry loop.

Remove the unused seqcount.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from v5.8-rc1 commit 79cbb6bc33)
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2023-03-25 04:21:36 +03:00
Sebastian Andrzej Siewior 6967cfe198 Bluetooth: Acquire sk_lock.slock without disabling interrupts
[ Upstream commit e6da0edc24 ]

There was a lockdep which led to commit
   fad003b6c8 ("Bluetooth: Fix inconsistent lock state with RFCOMM")

Lockdep noticed that `sk->sk_lock.slock' was acquired without disabling
the softirq while the lock was also used in softirq context.
Unfortunately the solution back then was to disable interrupts before
acquiring the lock which however made lockdep happy.
It would have been enough to simply disable the softirq. Disabling
interrupts before acquiring a spinlock_t is not allowed on PREEMPT_RT
because these locks are converted to 'sleeping' spinlocks.

Use spin_lock_bh() in order to acquire the `sk_lock.slock'.

Cc: stable-rt@vger.kernel.org
Reported-by: Luis Claudio R. Goncalves <lclaudio@uudg.org>
Reported-by: kbuild test robot <lkp@intel.com> [missing unlock]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2023-03-25 04:21:36 +03:00
Sebastian Andrzej Siewior 58d58190a1 workqueue: Sync with upstream
This is an all-on-one patch reverting the following commits:
  workqueue: Don't assume that the callback has interrupts disabled
  sched/swait: Add swait_event_lock_irq()
  workqueue: Use swait for wq_manager_wait
  workqueue: Convert the locks to raw type

and introducing the following commits from upstream:
  workqueue: Use rcuwait for wq_manager_wait
  workqueue: Convert the pool::lock and wq_mayday_lock to raw_spinlock_t

as an replacement.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2023-03-25 04:21:36 +03:00
Matt Fleming f958e52689 signal: Prevent double-free of user struct
The way user struct reference counting works changed significantly with,

  fda31c5029 ("signal: avoid double atomic counter increments for user accounting")

Now user structs are only freed once the last pending signal is
dequeued. Make sigqueue_free_current() follow this new convention to
avoid freeing the user struct multiple times and triggering this
warning:

 refcount_t: underflow; use-after-free.
 WARNING: CPU: 0 PID: 6794 at lib/refcount.c:288 refcount_dec_not_one+0x45/0x50
 Call Trace:
  refcount_dec_and_lock_irqsave+0x16/0x60
  free_uid+0x31/0xa0
  __dequeue_signal+0x17c/0x190
  dequeue_signal+0x5a/0x1b0
  do_sigtimedwait+0x208/0x250
  __x64_sys_rt_sigtimedwait+0x6f/0xd0
  do_syscall_64+0x72/0x200
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Reported-by: Daniel Wagner <wagi@monom.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2023-03-25 04:21:36 +03:00
Sebastian Andrzej Siewior 5dd3c8665f mm/zswap: Use local lock to protect per-CPU data
This is an incremental update of the zswap patch. Addtional spots were
identified, which were lacking proper locking, during the rework of the
patch for upstream.
The complete patch description is available as commit
   79410590ae87e ("mm/zswap: Use local lock to protect per-CPU data")

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2023-03-25 04:21:36 +03:00
汪勇10269566 8ea11731c2 printk: Force a line break on pr_cont(" ")
Since the printk rework, pr_cont("\n") will not lead to a line break.
A new line will only be created if
- cpu != c->cpu_owner || !(flags & LOG_CONT)
- c->len + len > sizeof(c->buf)

Flush the buffer to enforce a new line on pr_cont().

[bigeasy: reword commit message ]

Signed-off-by: 汪勇10269566 <wang.yong12@zte.com.cn>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2023-03-25 04:21:36 +03:00
Kevin Hao 71174ad5ad mm: slub: Always flush the delayed empty slubs in flush_all()
After commit f0b231101c94 ("mm/SLUB: delay giving back empty slubs to
IRQ enabled regions"), when the free_slab() is invoked with the IRQ
disabled, the empty slubs are moved to a per-CPU list and will be
freed after IRQ enabled later. But in the current codes, there is
a check to see if there really has the cpu slub on a specific cpu
before flushing the delayed empty slubs, this may cause a reference
of already released kmem_cache in a scenario like below:
	cpu 0				cpu 1
  kmem_cache_destroy()
    flush_all()
                         --->IPI       flush_cpu_slab()
                                         flush_slab()
                                           deactivate_slab()
                                             discard_slab()
                                               free_slab()
                                             c->page = NULL;
      for_each_online_cpu(cpu)
        if (!has_cpu_slab(1, s))
          continue
        this skip to flush the delayed
        empty slub released by cpu1
    kmem_cache_free(kmem_cache, s)

                                       kmalloc()
                                         __slab_alloc()
                                            free_delayed()
                                            __free_slab()
                                            reference to released kmem_cache

Fixes: f0b231101c94 ("mm/SLUB: delay giving back empty slubs to IRQ enabled regions")
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: stable-rt@vger.kernel.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2023-03-25 04:21:35 +03:00
Liwei Song 4ed63caeb8 mm: Don't warn about atomic memory allocations during suspend
The ACPI code allocates larger amount of memory during resume. This
triggers a warning because the allocation happens with disabled
interrupts.
At this stage only one CPU is active so there should be no lock
contention. If SLUB needs to call into the buddy allocator for more
memory then it should not enable interrupts.

Limit the check to system state with more CPUs and scheduling and only
enable interrupts in SLUB at this stage.

Signed-off-by: Liwei Song <liwei.song@windriver.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
[bigeasy: commit description, allocate_slab() hunk]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:35 +03:00
Sebastian Andrzej Siewior 7f850a018e fs/dcache: Include swait.h header
Include the swait.h header so it compiles even if not all patches are
applied.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2023-03-25 04:21:35 +03:00
John Ogness cbf31f0ca6 printk: console must not schedule for drivers
Even though the printk kthread is always preemptible, it is still not
allowed to call cond_resched() from within console drivers. The
task may become non-preemptible in the console driver call chain. For
example, vt_console_print() takes a spinlock and then can call into
fbcon_redraw(), which can conditionally invoke cond_resched():

|BUG: sleeping function called from invalid context at kernel/printk/printk.c:2322
|in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 177, name: printk
|CPU: 0 PID: 177 Comm: printk Not tainted 5.6.2-00011-ga536059557f1d9 #1
|Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
|Call Trace:
| dump_stack+0x66/0x8b
| ___might_sleep+0x102/0x120
| console_conditional_schedule+0x24/0x30
| fbcon_redraw+0x96/0x1c0
| fbcon_scroll+0x556/0xd70
| con_scroll+0x147/0x1e0
| lf+0x9e/0xb0
| vt_console_print+0x253/0x3d0
| printk_kthread_func+0x1d5/0x3b0

Disable cond_resched() for the call into the console drivers.

Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2023-03-25 04:21:35 +03:00
Thomas Gleixner 7c4ece8254 Add localversion for -RT release
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2023-03-25 04:21:35 +03:00
Clark Williams 894114838d sysfs: Add /sys/kernel/realtime entry
Add a /sys/kernel entry to indicate that the kernel is a
realtime kernel.

Clark says that he needs this for udev rules, udev needs to evaluate
if its a PREEMPT_RT kernel a few thousand times and parsing uname
output is too slow or so.

Are there better solutions? Should it exist and return 0 on !-rt?

Signed-off-by: Clark Williams <williams@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
2023-03-25 04:21:35 +03:00
Ingo Molnar 77c67bcd8a genirq: Disable irqpoll on -rt
Creates long latencies for no value

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2023-03-25 04:21:35 +03:00
Thomas Gleixner f9d7455782 signals: Allow rt tasks to cache one sigqueue struct
To avoid allocation allow rt tasks to cache one sigqueue struct in
task struct.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2023-03-25 04:21:35 +03:00
Haris Okanovic fdeac550bc tpm_tis: fix stall after iowrite*()s
ioread8() operations to TPM MMIO addresses can stall the cpu when
immediately following a sequence of iowrite*()'s to the same region.

For example, cyclitest measures ~400us latency spikes when a non-RT
usermode application communicates with an SPI-based TPM chip (Intel Atom
E3940 system, PREEMPT_RT kernel). The spikes are caused by a
stalling ioread8() operation following a sequence of 30+ iowrite8()s to
the same address. I believe this happens because the write sequence is
buffered (in cpu or somewhere along the bus), and gets flushed on the
first LOAD instruction (ioread*()) that follows.

The enclosed change appears to fix this issue: read the TPM chip's
access register (status code) after every iowrite*() operation to
amortize the cost of flushing data to chip across multiple instructions.

Signed-off-by: Haris Okanovic <haris.okanovic@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:35 +03:00
Julia Cartwright d04a7ec191 squashfs: make use of local lock in multi_cpu decompressor
Currently, the squashfs multi_cpu decompressor makes use of
get_cpu_ptr()/put_cpu_ptr(), which unconditionally disable preemption
during decompression.

Because the workload is distributed across CPUs, all CPUs can observe a
very high wakeup latency, which has been seen to be as much as 8000us.

Convert this decompressor to make use of a local lock, which will allow
execution of the decompressor with preemption-enabled, but also ensure
concurrent accesses to the percpu compressor data on the local CPU will
be serialized.

Cc: stable-rt@vger.kernel.org
Reported-by: Alexander Stein <alexander.stein@systec-electronic.com>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Julia Cartwright <julia@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:35 +03:00
Mike Galbraith bc7cc87312 drivers/zram: Don't disable preemption in zcomp_stream_get/put()
In v4.7, the driver switched to percpu compression streams, disabling
preemption via get/put_cpu_ptr(). Use a per-zcomp_strm lock here. We
also have to fix an lock order issue in zram_decompress_page() such
that zs_map_object() nests inside of zcomp_stream_put() as it does in
zram_bvec_write().

Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
[bigeasy: get_locked_var() -> per zcomp_strm lock]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:35 +03:00
Mike Galbraith c6cc729f9a drivers/block/zram: Replace bit spinlocks with rtmutex for -rt
They're nondeterministic, and lead to ___might_sleep() splats in -rt.
OTOH, they're a lot less wasteful than an rtmutex per page.

Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:35 +03:00
Mike Galbraith caa6dbd8c9 connector/cn_proc: Protect send_msg() with a local lock on RT
|BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:931
|in_atomic(): 1, irqs_disabled(): 0, pid: 31807, name: sleep
|Preemption disabled at:[<ffffffff8148019b>] proc_exit_connector+0xbb/0x140
|
|CPU: 4 PID: 31807 Comm: sleep Tainted: G        W   E   4.8.0-rt11-rt #106
|Call Trace:
| [<ffffffff813436cd>] dump_stack+0x65/0x88
| [<ffffffff8109c425>] ___might_sleep+0xf5/0x180
| [<ffffffff816406b0>] __rt_spin_lock+0x20/0x50
| [<ffffffff81640978>] rt_read_lock+0x28/0x30
| [<ffffffff8156e209>] netlink_broadcast_filtered+0x49/0x3f0
| [<ffffffff81522621>] ? __kmalloc_reserve.isra.33+0x31/0x90
| [<ffffffff8156e5cd>] netlink_broadcast+0x1d/0x20
| [<ffffffff8147f57a>] cn_netlink_send_mult+0x19a/0x1f0
| [<ffffffff8147f5eb>] cn_netlink_send+0x1b/0x20
| [<ffffffff814801d8>] proc_exit_connector+0xf8/0x140
| [<ffffffff81077f71>] do_exit+0x5d1/0xba0
| [<ffffffff810785cc>] do_group_exit+0x4c/0xc0
| [<ffffffff81078654>] SyS_exit_group+0x14/0x20
| [<ffffffff81640a72>] entry_SYSCALL_64_fastpath+0x1a/0xa4

Since ab8ed95108 ("connector: fix out-of-order cn_proc netlink message
delivery") which is v4.7-rc6.

Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:35 +03:00
Thomas Gleixner 8ca7ff188f mips: Disable highmem on RT
The current highmem handling on -RT is not compatible and needs fixups.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2023-03-25 04:21:34 +03:00
Sebastian Andrzej Siewior a0aa1749a6 POWERPC: Allow to enable RT
Allow to select RT.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:34 +03:00
Sebastian Andrzej Siewior 17dfb2be0b powerpc/stackprotector: work around stack-guard init from atomic
This is invoked from the secondary CPU in atomic context. On x86 we use
tsc instead. On Power we XOR it against mftb() so lets use stack address
as the initial value.

Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:34 +03:00
Thomas Gleixner a45b2dd64c powerpc: Disable highmem on RT
The current highmem handling on -RT is not compatible and needs fixups.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2023-03-25 04:21:34 +03:00
Bogdan Purcareata d9914f69e3 powerpc/kvm: Disable in-kernel MPIC emulation for PREEMPT_RT
While converting the openpic emulation code to use a raw_spinlock_t enables
guests to run on RT, there's still a performance issue. For interrupts sent in
directed delivery mode with a multiple CPU mask, the emulated openpic will loop
through all of the VCPUs, and for each VCPUs, it call IRQ_check, which will loop
through all the pending interrupts for that VCPU. This is done while holding the
raw_lock, meaning that in all this time the interrupts and preemption are
disabled on the host Linux. A malicious user app can max both these number and
cause a DoS.

This temporary fix is sent for two reasons. First is so that users who want to
use the in-kernel MPIC emulation are aware of the potential latencies, thus
making sure that the hardware MPIC and their usage scenario does not involve
interrupts sent in directed delivery mode, and the number of possible pending
interrupts is kept small. Secondly, this should incentivize the development of a
proper openpic emulation that would be better suited for RT.

Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Bogdan Purcareata <bogdan.purcareata@freescale.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2023-03-25 04:21:34 +03:00